In today’s world, advanced computing technologies are in high demand in the workplace. Creating powerful algorithms that can help humanity is part of the research that uOttawa Computer Science Professor Diana Inkpen has carried out in collaboration with her students.
Professor Inkpen is developing the next generation of intelligent applications for computers and mobile devices. This knowledge can make a big difference during an election campaign or can help learn a new language more easily or even help find out what kind of actions to take in search and rescue operations.
Professor Diana Inkpen discusses data mining techniques.
How would you describe text mining techniques? What could be a relevant example?
Text mining is about the automatic extraction of useful information from large amount of texts, in an extent that humans are not able to reach. We use learning-based methods to show the program examples of information to be extracted and let the program figure out what kind of features of the text are associated with what kind of information. We construct the learning model based on the training examples; then, when new text comes in, the model is applied in order to extract new information. This could allow a constant monitoring process for a company or for someone interested in particular information. A good example would be a smartphone business company that is selling Android-based phones, which wants to know about its customer’s preferences. Do they prefer some other models from the competition? What are their preferences in terms of colors, features, speed? What are they saying on social media? We automatically classify these opinions about the product’s reputation as positive, negative, or no opinion. In this way, the program provides knowledge accumulated from large amounts of reviews and allows the company to draw conclusions about what actions and products improvements could be efficient.
Can you elaborate on how algorithms learn to read data?
Mainly, we implement information retrieval systems from text, video, music, or medical texts. For algorithms to read text data, we have to process the text to extracting some features out of it. Therefore, it is essential to know which part of a text is more relevant and which structures are more important (for example, verb-noun pairs). The more complex the data understanding and preparation phase, the better algorithms learn. Then, we add all the processed texts as training data for our algorithms. There are different types of algorithms that learn by reading this large amount of processed training example. Then the trained models can be used to extract information from new texts.
What are your other fields of interests?
Most of my work is in the field of text mining, but I also work with information retrieval using various types of multi-media data (video audio, speech) not only text. I also work with uOttawa’s Official Languages and Bilingualism Institute (OLBI) on a project about Terminology Extraction for which I am developing tools that assist a language learner to understand how to better use terms and expressions in a new language.
What would be the unique character of your work?
I try to find new learning algorithms based on different ways to do the learning. It is very difficult for a program to understand the meaning of texts. This is why we develop methods to simulate intelligence to some degree, in order to extract useful pieces of information from texts. I am also focusing on algorithms that are able to run on large amounts of data (big data) in efficient ways.
In the future, how would you like your work to impact the world?
In the future, I would like to develop the next generation of intelligent applications, such as smart phones and computers that can process a very large amount of information and help in decision making. I think we are already there. Many computer applications are already making our lives easier. I would also like to transfer this knowledge to my students and help them find the jobs they want, to encourage them to stay in the field and to continue to work toward the benefit of our society.