Options
Imputation of income variables in a survey context and estimation of variance for indicators of poverty and social exclusion
Auteur(s)
Maison d'édition
Neuchâtel
Date de parution
2014
Résumé
This Phd thesis proposes to develop a method of imputation for income variables allowing direct analysis of the distribution of such data, particularly the estimation of complex statistics such as indicators for poverty and social exclusion as well as the estimation of their precision.
In an introduction chapter we present the Swiss Survey on Income and Living Conditions (SILC) which we extensively used to illustrate our research.
In a first article accepted for publication, co-written with Dr. Lionel Qualité, we present an overview of the production methods at the Swiss Federal Office of Statistics (SFSO). Samples are selected with coordination so as to spread the survey burden over the population. We present the computation of extrapolation weights adapted to different cases and needs with its main steps. The SFSO relies on international recommendations for data editing and imputation, and contributes to their elaboration. The precision of estimators is consistently evaluated, according to the different treatments and methods involved in their construction.
In a second published article, co-written with Pr. Yves Tillé, we have used the generalized linearization technique based on the concept of influence function, as Osier (2009) has done, to estimate the variance of complex statistics such as Laeken indicators. Through simulations, we show that the use of Gaussian kernel estimation to estimate an income density function results in a strongly biased variance estimate. We propose two other density estimation methods that significantly reduce the observed bias.
In a working paper, we resume the idea presented by Deville and Särndal (1994) which consists in constructing an unbiased estimator of the variance of a total based solely on the information at our disposal (i.e. on the selected sample and the subset of respondents) in the case of regression imputation. While these authors dealt with a conventional total of a variable of interest, we reproduce a similar development in the case where the considered total is one of the linearized variable of quantiles. We show by means of simulations on real survey data that regression imputation can have an important impact on the bias and variance estimations of social inequality indicators. This leads us to a method capable of taking into account the variance due to imputation in addition to the one due to the sampling design in the cases of quantiles.
In a submitted article, we present our new imputation method for income variables. Empirical studies have shown that the generalized beta distribution of the second kind (GB2) fits income data very well. We present a parametric method of imputation relying on weights stemming from generalized calibration. A GB2 distribution is fitted on the income distribution in order to determine whether these weights can compensate even for nonignorable nonresponse that affects the variable of interest. The success of the operation greatly depends on the choice of auxiliary and instrumental variables used for calibration, which we discuss. We validate our imputation system on SILC data and compare it to imputations performed through the use of IVEware software. We have made great efforts to estimate variances through linearization, taking all the steps of our procedure into account.
The last part of this Phd thesis discusses additional material which we could not include in the other chapters. Namely we give some more insights into the GB2 distribution, study the possibility of using Durbin-Wu-Hausman tests in the framework of generalized calibration and present a way of forming imputation classes for an income variable.
In an introduction chapter we present the Swiss Survey on Income and Living Conditions (SILC) which we extensively used to illustrate our research.
In a first article accepted for publication, co-written with Dr. Lionel Qualité, we present an overview of the production methods at the Swiss Federal Office of Statistics (SFSO). Samples are selected with coordination so as to spread the survey burden over the population. We present the computation of extrapolation weights adapted to different cases and needs with its main steps. The SFSO relies on international recommendations for data editing and imputation, and contributes to their elaboration. The precision of estimators is consistently evaluated, according to the different treatments and methods involved in their construction.
In a second published article, co-written with Pr. Yves Tillé, we have used the generalized linearization technique based on the concept of influence function, as Osier (2009) has done, to estimate the variance of complex statistics such as Laeken indicators. Through simulations, we show that the use of Gaussian kernel estimation to estimate an income density function results in a strongly biased variance estimate. We propose two other density estimation methods that significantly reduce the observed bias.
In a working paper, we resume the idea presented by Deville and Särndal (1994) which consists in constructing an unbiased estimator of the variance of a total based solely on the information at our disposal (i.e. on the selected sample and the subset of respondents) in the case of regression imputation. While these authors dealt with a conventional total of a variable of interest, we reproduce a similar development in the case where the considered total is one of the linearized variable of quantiles. We show by means of simulations on real survey data that regression imputation can have an important impact on the bias and variance estimations of social inequality indicators. This leads us to a method capable of taking into account the variance due to imputation in addition to the one due to the sampling design in the cases of quantiles.
In a submitted article, we present our new imputation method for income variables. Empirical studies have shown that the generalized beta distribution of the second kind (GB2) fits income data very well. We present a parametric method of imputation relying on weights stemming from generalized calibration. A GB2 distribution is fitted on the income distribution in order to determine whether these weights can compensate even for nonignorable nonresponse that affects the variable of interest. The success of the operation greatly depends on the choice of auxiliary and instrumental variables used for calibration, which we discuss. We validate our imputation system on SILC data and compare it to imputations performed through the use of IVEware software. We have made great efforts to estimate variances through linearization, taking all the steps of our procedure into account.
The last part of this Phd thesis discusses additional material which we could not include in the other chapters. Namely we give some more insights into the GB2 distribution, study the possibility of using Durbin-Wu-Hausman tests in the framework of generalized calibration and present a way of forming imputation classes for an income variable.
Notes
, Doctorat, Neuchâtel, Faculté des Sciences Economiques
Type de publication
Resource Types::text::thesis::doctoral thesis