Balanced k-Nearest Neighbor Imputation
Résumé In order to overcome the problem of item nonresponse, random imputation methods are often used because they tend to preserve the distribution of the imputed variable. Among the random i.mputation methods, the random hot-deck has the interesting property of imputing observed values. A new random hot-deck imputation method is proposed. The key innovation of this method is that the selection of donors is viewed as a sampling problem and uses calibration and balanced sampling. This approach makes it possible to select donors such that if the auxiliary variables were imputed, their estimated totals would not change. As a consequence, very accurate and stable totals estimations can be obtained. Moreover, donors are selected in neighborhoods of recipients. In this way, the missing value of a recipient is replaced with an observed value of a similar unit. This second approach can greatly improve the quality of estimations. Finally, these two approaches imply underlying models and the method is resistent to model misspecification.
Mots-clés missing data, nonresponse, sampling, balanced sampling, calibration,
nearest neighbors
Citation Hasler, C., & Tillé, Y. (2016). Balanced k-Nearest Neighbor Imputation. Statistics, 105, 11-23.
Type Article de périodique (Anglais)
Date de publication 22-5-2016
Nom du périodique Statistics
Volume 105
Pages 11-23
URL https://www.researchgate.net/publication/271710133_Balanc...
Liée au projet Convention Université de Neuchâtel/Office fédéral de la s...