A new way of searching

12 Sep 2012

      I am very happy to share with you some results of the European

Commission Living Knowledge (LK) project partnered by DRTC-ISI.

The results have been already implemented by Yahoo! for new search

techniques in 'Future Predictor', by University of Pavia, by SORA ,

Austria for Media Content Analysis and by University of Trento for

faceted knowledge maps for building ENTITYPEDIA.

A new way of searching

Helped by the Indian Statistical Institute, European scientists have

come up with a revolutionary new search technology.

Inspired by a system for categorising books proposed by an Indian

librarian more than 50 years ago, a team of EU-funded researchers has

developed a new kind of Internet search that takes into account

factors such as opinion, bias, context, time and location. The new

technology, which could soon be in use commercially, can display

trends in public opinion about a topic, company or person over time —

and it can even be used to predict the future.

“If you search for ‘climate’ on Google or another search engine, what

you will get is basically a list of results featuring that word:

there’s no categorisation, no specific order, no context. Current

search engines do not take into account factors such as when the

information was published, if there is a bias inherent in the content

and structure, who published it and when,” says Fausto Giunchiglia, a

professor of computer science at the University of Trento in Italy.

But can search technology be enabled to identify and embrace

diversity? Can a search engine tell you, for example, how public

opinion about climate change has turned over the last decade? Or how

hot the weather will be a century from now, by aggregating current and

past estimates from different sources?

Now it seems it can, thanks to a pioneering combination of modern

science and a decades-old classification method, brought together by

European researchers in the LivingKnowledge project. The team,

co-ordinated by Giunchiglia, adopted a multidisciplinary approach to

developing new search technology, drawing on fields as diverse as

computer science, social science, semiotics and library science.

In fact, the father of library science Sirkali Ramamrita Ranganathan,

an Indian librarian, served as a source of inspiration. In the 1920s

and 1930s, Ranganathan developed the first major analytico-synthetic,

or faceted, classification system. Using this approach, objects —

books, in the case of Ranganathan; web and database content, in the

case of the LivingKnowlege team — are assigned multiple

characteristics and attributes (facets), enabling the classification

to be ordered in multiple ways, rather than in a single,

predetermined, taxonomic order. Using the system, an article about the

effects on agriculture of climate change written in Norway in 1990

might be classified as “Geography; Climate; Climate change;

Agriculture; Research; Norway; 1990.”

In order to understand the classification system better and implement

it in search engine technology, the researchers turned to the Indian

Statistical Institute, a project partner, which uses faceted

classification on a daily basis.

“Using their knowledge, we were able to turn Ranganathan’s

pseudo-algorithm into a computer algorithm. The computer scientists

were able to use it to mine data from the web, extract its meaning and

context, assign facets to it, and use these to structure the

information based on the dimensions of diversity,” Giunchiglia says.

Researchers at the University of Pavia in Italy, another partner, drew

on their expertise in extracting meaning from web content — not just

from text and multimedia content, but also from the way the

information is structured — in order to infer bias and opinions,

adding another facet to the data.

The technology was implemented in a testbed, now available as open

source software, and used for trials based around two intriguing

application scenarios.

The Austrian social research institute SORA used the system to

identify social trends and monitor public opinion. Used for media

content analysis, the system could help a company understand the

impact of a new advertising campaign, show how it has affected brand

recognition over time and which social groups have been most

receptive. Alternatively, a government might use the system to gauge

public opinion about a new policy, or a politician could use it to

respond in the most publicly acceptable way to a rival candidate’s

claims.

With Barcelona Media, a non-profit research foundation supported by

Yahoo!, and the Netherlands-based Internet Memory Foundation, the

scientists looked not only at current and past trends but extrapolated

them and drew on forecasts extracted from existing data to try to

predict the future. Their Future Predictor application is able to make

searches based on questions such as “What will oil prices be in 2050?”

or “How much will global temperatures rise over the next 100 years?”

and find relevant information and forecasts from today’s web. For

example, a search for the year 2034 turns up “space travel” as the

most relevant topic indexed in today’s news.

“More immediately, this application can detect trends even before

these become apparent,” Giunchiglia explains.

He points out that Google fundamentally changed the world by providing

everyone with access to much of the world’s information. Currently,

only humans can understand the meaning of all that data, so much so

that information overload is a common problem. As we move into the age

of big data, the meaning of the vast quantity of information needs to

be understandable not just to humans but also to machines. The

LivingKnowledge approach addresses that problem.

“The future will be all about big data — we can’t say whether it will

be good or bad, but it will certainly be different,” says Giunchiglia.

Armed with the project’s Future Predictor, Giunchiglia is well

equipped to make that prediction.

With Regards,
Amit Kumar Shaw,
DRTC, Indian Statistical Institute.

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

Amit Shaw

tags

participants (1)