Statistical inference in retrieval effectiveness evaluation
Résumé |
Evaluation methodology, and particularly its statistical tests
associated, plays a central role in the information retrieval
domain which maintains a strong empirical tradition. In an effort
to evaluate the retrieval effectiveness of a search algorithm, this
paper focuses on the average precision over a set of fixed recall
values. After reviewing traditional evaluation methodology through
the use of examples, this study suggests applying another
statistical inference methodology called bootstrap, within which no
particular assumption is needed about the distribution of the
observations. Moreover, this scheme may be used to assert the
accuracy of virtually any statistic, to build approximate
confidence interval, and to verify whether a statistically
significant difference exists between two retrieval schemes, even
when dealing with a relatively small sample size. This study also
suggests selecting the sample median rather than the sample mean in
evaluating retrieval effectiveness where the justification for this
choice is based on the nature of the information retrieval data.
(C) 1997 Elsevier Science Ltd. |
Mots-clés |
INFORMATION-RETRIEVAL, RELEVANCE, ALGORITHM |
Citation | J. Savoy, "Statistical inference in retrieval effectiveness evaluation," Information Processing & Management, vol. 33, p. 495-512, 1997. |
Type | Article de périodique (Français) |
Date de publication | 1997 |
Nom du périodique | Information Processing & Management |
Volume | 33 |
Numéro | 4 |
Pages | 495-512 |