Information Retrieval Models for Distributed Collections
Project responsable |
Jacques Savoy
Kilian Stoffel |
Team member | Yves Rasolofo |
Abstract |
We have investigated two major issues in Distributed Information
Retrieval (DIR), namely: collection selection and search results
merging. While most published works on these two issues are based
on pre-stored metadata, the approaches described in this paper
involve extracting the required information at the time the query
isprocessed. In order to predict the relevance of collections to a
given query, we analyse a limited number of full documents (e.g.,
the top five documents) retrieved from each collection and then
consider term proximity within them. On the other hand, our merging
technique is rather simple since input only requires document scores
and lengths of results lists. Our experiments evaluate the retrieval
effectiveness of these approaches and compare them with centralised
indexing and various other DIR techniques (e.g., CORI [2][3][23]).
We conducted our experiments using two testbeds: one containing news articles extracted from four different sources (2 GB) and another containing 10 GB of Web pages. Our evaluations demonstrate that the retrieval effectiveness of our simple approaches is worth considering. |
Keywords |
Information retrieval, machine learning, distributed IR, digital libraries, uncertain reasoning |
Type of project | Fundamental research project |
Research area | Information retrieval |
Method of financing | FNS |
Status | Completed |
Start of project | 1-4-2000 |
End of project | 28-2-2003 |
Overall budget | 120389 |
Contact | Paul Cotofrei |