Graziela Medeiros

June 30, 2009

Common Information Retrieval Myths

Filed under: Recuperação da Informação — grazielamedeiros @ 13:58

Interessante esse texto sobre  ‘mitos’ da recuperação da informação, tendo em vista que geralmente se faz confusão com o conceito desse termo. No texto foi destacada a importância de áreas como a Biblioteconomia e a Ciência da Informação (mito 2).  Vannevar Bush, importante autor nas referidas áreas também é citado (mito 5).

Os títulos dos ‘mitos’ se refere a tudo o que a recuperação da informação NÃO É. Apesar de estar em Inglês, o texto é objetivo e simples de entender.

1.Information retrieval is the same as Information Extraction

“Information Extraction is not Information Retrieval: Information Extraction differs from traditional techniques in that it does not recover from a collection a subset of documents which are hopefully relevant to a query, based on key-word searching (perhaps augmented by a thesaurus).

Instead, the goal is to extract from the documents (which may be in a variety of languages) salient facts about prespecified types of events, entities or relationships. These facts are then usually entered automatically into a database, which may then be used to analyse the data for trends, to give a natural language summary, or simply to serve for on-line access.” (GATE)

More on that here.

2. Information retrieval is a compter science discipline

No, not quite.
IR is interdisciplinary because of the many different problems which arise within it.
First off our data is usually in text format so we need the area of linguistics and cognitive psychology.

Then the data is stored somehow and is either structured or unstructured so we need information architecture, information science, library science to help with that.

The text and the query are analysed and rendered into a numeric format that a machine can inderstand so statistics come into play also.

We borrow ideas from Physics too and of course many mathematical concepts come into play.

Computer science as a whole is a mozaic of different disciplines.

3. Information retrieval is just for search engines

Search engines are a common example of an information retireval system, but online library catalogs (OPAC), commercial databases like Web of sciences (and many search engines), and even the entire www are all information retrieval systems.

4. Information retrieval’s biggest challenge is ranking documents

“Search is an unsolved problem. We have a good 90 to 95% of the solution, but there is a lot to go in the remaining 10%.” (Marissa Mayer)

She is quite right we had a deluge of work to do in this area still. We have invented the wheel and we have hooked 4 of them onto a box. We don’t have a Ferrari Enzo yet.

Some of the biggest challenges yet involve relevance and feedback, information extraction, multimedia retrieval, effective retrieval, rooting and filtering, interfaces and browsing, “Magic”, indexing and retrieval, distributed IR and integrated solutions.

The “Magic” issue (coined by Bruce Croft) concerns the vocabulary mismatch issues we have.

There is a list of Grand challenges for IR which is published and presented every year. This is the latest document. (PDF)

5. Google pioneered information retrieval

Google did arguably make the most commecially successful information retrieval system, but they were not the first to launch into IR.

In fact no search engine was.

In 1945 Vannevar Bush’s As We May Think appeared in Atlantic Monthly and in this article he described an information retrieval system. In the 1960’s Gerard Salton created the SMART (System for the Mechanical Analysis and Retrieval of Text) Information Retrieval System at Cornell University. One of the 1st papers was Melvin Earl (Bill) Maron and J. L. Kuhns’ “On relevance, probabilistic indexing, and information retrieval” in Journal of the ACM in 1960. In 1963 the Weinberg report “Science, Government and Information” gave a full explanation of the issues concerning the “crisis of scientific information.” – basically we couldn’t manage this huge corpus that we had gathered throughout the centuries.

Karen Spärck Jones researched relentlessly since the 1960’s computational linguistics and their application to IR at Cambridge. J. W. Sammon pioneered the vector model in 1968, and in the 1970’s NLM’s AIM-TWX, MEDLINE are the first ever online IR systems. Round about the same time Theodor Nelson starts introducing hypertext.

Fonte: Escrito por Marie-Claire Jenkins e publicado no site Search Engine People.

June 23, 2009

Interface de busca ‘All for good’

Filed under: Fontes de Informação,Recuperação da Informação — grazielamedeiros @ 12:23

Um grupo de engenheiros, designers e gerenentes de sistemas do Google e outras companhias começaram a trabalhar no All for Good, um novo serviço para ajudar você a encontrar atividades de vontulariado em sua comunidade e compartilhar esses eventos com seus amigos.

All for Good disponibiliza uma interface de busca única para encontrar sites de voluntariados como like United Way, VolunteerMatch, HandsOn Network and Reach Out and Read. Permite busca por categorias pré-determinadas (Education, Health, Nature, Hunger, Website, Seniors, Animals) ou de livre escolha.

O site ainda está em fase de teste.

Fonte: Blog oficial do Google.

Veja também o vídeo:

June 2, 2009

Wave: nova ferramenta de colaboração do google

Filed under: Recuperação da Informação — grazielamedeiros @ 11:10

Google Wave can make you more productive even when you’re having fun.
Take a sneak peek.

Learn how to put waves in your site and build wave extensions with the Google Wave APIs.

Google Wave uses an open protocol, so anyone can build their own wave system. 


Slide de uma aula de indexação ministrada para alunos de Graduação em Biblioteconomia da UFSC. Aborda as bases LISA e LILACS do Portal CAPES. Mostra como se dá o acesso as bases desde o formato impresso até o on-line. Apresenta os recursos de recuperação da informação.

Criação: Graziela Medeiros. Orientação: Lígia Café

May 31, 2009

An Introduction to MultiAgents Systems

Filed under: Fontes de Informação,Recuperação da Informação — grazielamedeiros @ 13:55

Livros sobre “Agentes inteligentes” disponível on-line. Todos os capítulos estão on-line, inclusive ppts sobre os capítulos. Idioma: Inglês.

Título: An Introduction to MultiAgents Systems: second Editions
Autor: Michael Woold

Descrição: Multiagent systems are a new paradigm for understanding and building distributed systems, where it is assumed that the computational components are autonomous: able to control their own behaviour in the furtherance of their own goals. The first edition of An Introduction to Multiagent Systems was the first contemporary textbook in the area, and became the standard undergraduate reference work for the field. This second edition has been extended with substantial new material on recent developments in the field, and has been revised and updated throughout. It provides a comprehensive, coherent, and readable introduction to the theory and practice of multiagent systems, while presenting a wealth of discussion topics and pointers into more advanced issues for those wanting to dig deeper.


Table of Contents:

Part I Setting the Scene

Chapter 1 Introduction

Part II Intelligent Autonomous Agents

Chapter 2 Intelligent Agents
Chapter 3 Deductive Reasoning Agents
Chapter 4 Practical Reasoning Agents
Chapter 5 Reactive and Hybrid Agents

Part III Communication and Cooperation

Chapter 6 Understanding Each Other
Chapter 7 Communicating
Chapter 8 Working Together
Chapter 9 Methodologies
Chapter 10 Applications

Part IV Multiagent Decision Making
Chapter 11 Multiagent Interactions
Chapter 12 Making Group Decisions
Chapter 13 Forming Coalitions
Chapter 14 Allocating Scarce Resources
Chapter 15 Bargaining
Chapter 16 Arguing
Chapter 17 Logical Foundations

Blog at