A new way of searching
I am very happy to share with you some results of the European Commission Living Knowledge (LK) project partnered by DRTC-ISI. The results have been already implemented by Yahoo! for new search techniques in 'Future Predictor', by University of Pavia, by SORA , Austria for Media Content Analysis and by University of Trento for faceted knowledge maps for building ENTITYPEDIA. A new way of searching Helped by the Indian Statistical Institute, European scientists have come up with a revolutionary new search technology. Inspired by a system for categorising books proposed by an Indian librarian more than 50 years ago, a team of EU-funded researchers has developed a new kind of Internet search that takes into account factors such as opinion, bias, context, time and location. The new technology, which could soon be in use commercially, can display trends in public opinion about a topic, company or person over time — and it can even be used to predict the future. “If you search for ‘climate’ on Google or another search engine, what you will get is basically a list of results featuring that word: there’s no categorisation, no specific order, no context. Current search engines do not take into account factors such as when the information was published, if there is a bias inherent in the content and structure, who published it and when,” says Fausto Giunchiglia, a professor of computer science at the University of Trento in Italy. But can search technology be enabled to identify and embrace diversity? Can a search engine tell you, for example, how public opinion about climate change has turned over the last decade? Or how hot the weather will be a century from now, by aggregating current and past estimates from different sources? Now it seems it can, thanks to a pioneering combination of modern science and a decades-old classification method, brought together by European researchers in the LivingKnowledge project. The team, co-ordinated by Giunchiglia, adopted a multidisciplinary approach to developing new search technology, drawing on fields as diverse as computer science, social science, semiotics and library science. In fact, the father of library science Sirkali Ramamrita Ranganathan, an Indian librarian, served as a source of inspiration. In the 1920s and 1930s, Ranganathan developed the first major analytico-synthetic, or faceted, classification system. Using this approach, objects — books, in the case of Ranganathan; web and database content, in the case of the LivingKnowlege team — are assigned multiple characteristics and attributes (facets), enabling the classification to be ordered in multiple ways, rather than in a single, predetermined, taxonomic order. Using the system, an article about the effects on agriculture of climate change written in Norway in 1990 might be classified as “Geography; Climate; Climate change; Agriculture; Research; Norway; 1990.” In order to understand the classification system better and implement it in search engine technology, the researchers turned to the Indian Statistical Institute, a project partner, which uses faceted classification on a daily basis. “Using their knowledge, we were able to turn Ranganathan’s pseudo-algorithm into a computer algorithm. The computer scientists were able to use it to mine data from the web, extract its meaning and context, assign facets to it, and use these to structure the information based on the dimensions of diversity,” Giunchiglia says. Researchers at the University of Pavia in Italy, another partner, drew on their expertise in extracting meaning from web content — not just from text and multimedia content, but also from the way the information is structured — in order to infer bias and opinions, adding another facet to the data. The technology was implemented in a testbed, now available as open source software, and used for trials based around two intriguing application scenarios. The Austrian social research institute SORA used the system to identify social trends and monitor public opinion. Used for media content analysis, the system could help a company understand the impact of a new advertising campaign, show how it has affected brand recognition over time and which social groups have been most receptive. Alternatively, a government might use the system to gauge public opinion about a new policy, or a politician could use it to respond in the most publicly acceptable way to a rival candidate’s claims. With Barcelona Media, a non-profit research foundation supported by Yahoo!, and the Netherlands-based Internet Memory Foundation, the scientists looked not only at current and past trends but extrapolated them and drew on forecasts extracted from existing data to try to predict the future. Their Future Predictor application is able to make searches based on questions such as “What will oil prices be in 2050?” or “How much will global temperatures rise over the next 100 years?” and find relevant information and forecasts from today’s web. For example, a search for the year 2034 turns up “space travel” as the most relevant topic indexed in today’s news. “More immediately, this application can detect trends even before these become apparent,” Giunchiglia explains. He points out that Google fundamentally changed the world by providing everyone with access to much of the world’s information. Currently, only humans can understand the meaning of all that data, so much so that information overload is a common problem. As we move into the age of big data, the meaning of the vast quantity of information needs to be understandable not just to humans but also to machines. The LivingKnowledge approach addresses that problem. “The future will be all about big data — we can’t say whether it will be good or bad, but it will certainly be different,” says Giunchiglia. Armed with the project’s Future Predictor, Giunchiglia is well equipped to make that prediction. With Regards, Amit Kumar Shaw, DRTC, Indian Statistical Institute. With Regards, Amit Kumar Shaw, DRTC, Indian Statistical Institute. -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
participants (1)
-
Amit Shaw