Multilingual and Contextual Information Retrieval
Project responsable Jacques Savoy
Abstract This research proposal focuses on three main objectives. First, we want to design, implement and evaluate information retrieval (IR) systems to work with various East European languages (non-English monolingual IR). More specifically, in this part we design and evaluate linguistic tools for new and less frequently spoken languages, such as Hungarian, Polish, Czech and Turkish. In this part we also translate a short query from one language to another language (most likely it will be English, the lingua franca, before accessing information written in the various other languages).

Second, we undertake a more elaborate investigation of contextual IR systems used to retrieve information in a specific domain (e.g., biomedicine, law, enterprise, webblog), instead of evaluating IR systems using newspaper test-collections. In this part of our project we investigate the most appropriate response to user information needs (varying from “classical” document searches to new requests such as known-item searches (“where is the last e-mail sent to Paul?”), pros/cons of a given argument, searches for an expert in a given domain based on e-mails or other enterprise intranet document repositories, etc.). Specific users specifications could also be considered through identifying document length (varying from a short bibliographic notice to a large novel), the level of information needed (whole document, paragraph, single sentence or short summary), and the degree of editorial control (from newspaper articles to e-mails or webblogs). In this second part we also investigate and evaluate the impact of orthographic and vocabulary variations as well as the influence of extra-document information (e.g., document contexts, temporal information, links between documents within web or legal corpuses).

Third, we integrate the above two research objectives into a common task, in order to perform searches in a multilingual collection, starting with relatively well edited web pages (e.g., information made available from the European governments when using the EuroGOV corpus), or even less structured and less “polished” web pages (e.g., webblogs written in at least three different languages) or enterprise e-mails.
Keywords Information retrieval (IR), multilingual IR (MLIR), contextual retrieval, cross-lingual IR (CLIR), web search, dedicated IR, digital library
Type of project Fundamental research project
Research area Informatique
Method of financing FNS - Encouragement de projets (Div. I-III)
Status Completed
Start of project 1-1-2007
End of project 31-3-2010
Overall budget 298'777.00
Contact Jacques Savoy