Voici les éléments 1 - 10 sur 10
Vignette d'image
Publication
Accès libre

An Efficient Approach for Statistical Matching of Survey Data Trough Calibration, Optimal Transport and Balanced Sampling

2022, Jauslin, Raphaël, Tillé, Yves

Statistical matching aims to integrate two statistical sources. These sources can be two samples or a sample and the entire population. If two samples have been selected from the same population and information has been collected on different variables of interest, then it is interesting to match the two surveys to analyse, for example, contingency tables or correlations. In this paper, we propose an efficient method for matching two samples that may each contain a weighting scheme. The method matches the records of the two sources. Several variants are proposed in order to create a directly usable file integrating data from both information sources.

Pas de vignette d'image disponible
Publication
Métadonnées seulement

Balanced k-Nearest Neighbor Imputation

2016-5-22, Hasler, Caren, Tillé, Yves

In order to overcome the problem of item nonresponse, random imputation methods are often used because they tend to preserve the distribution of the imputed variable. Among the random i.mputation methods, the random hot-deck has the interesting property of imputing observed values. A new random hot-deck imputation method is proposed. The key innovation of this method is that the selection of donors is viewed as a sampling problem and uses calibration and balanced sampling. This approach makes it possible to select donors such that if the auxiliary variables were imputed, their estimated totals would not change. As a consequence, very accurate and stable totals estimations can be obtained. Moreover, donors are selected in neighborhoods of recipients. In this way, the missing value of a recipient is replaced with an observed value of a similar unit. This second approach can greatly improve the quality of estimations. Finally, these two approaches imply underlying models and the method is resistent to model misspecification.

Vignette d'image
Publication
Accès libre

Measuring inequality in finite population sampling

2012, Langel, Matti, Tillé, Yves

Ce document se concentre sur l’estimation des mesures d’inégalité à l’aide de données d’enquête. La méthodologie proposée permet de tenir compte du caractère non-linéaire des mesures d’inégalité ainsi que de la complexité de la stratégie d’échantillonnage. Le premier chapitre est dédié à la présentation et à la définition des concepts principaux de l’étude quantitative des inégalités et de la théorie des sondages. Dans le second chapitre, plusieurs indices d’inégalité sont comparés au sein d’une étude empirique réalisée à l’aide de données réelles. La recherche se centre ensuite vers trois mesures d’inégalités spécifiques : le Quintile share ratio (QSR), l’indice de Gini et l’indice de Zenga. Ainsi, dans le troisième chapitre, nous montrons que la variance du QSR peut être estimée par linéarisation sans avoir recours à un lissage par noyau et qu’une simple transformation permet d’améliorer le taux de couverture de l’intervalle de confiance. Les deux chapitres suivants abordent les travaux de Corrado Gini sous un angle particulier, notamment à travers des réflexions historiques sur l’échantillonnage équilibré dont il a été l’un des pionniers, et sur l’estimation de variance de l’indice d’inégalité qui porte son nom. L’ultime chapitre est dédié à la présentation d’une mesure moins connue, l’indice de Zenga, pour laquelle nous proposons un estimateur de variance., This document focuses on the estimation of inequality measures for complex survey data. The proposed methodology takes into account both the complexity of these generally non-linear functions of interest and the complexity of the sampling strategy. The first chapter is dedicated to the presentation and definition of the main concepts used in both inequality and survey sampling theory. In the second chapter, a variety of inequality indices are compared in an empirical study on a real set of income data. Research is then directed towards three specific inequality measures: the Quintile share ratio (QSR), the Gini index and Zenga’s new inequality index. The third chapter shows that the variance of the QSR can be estimated by means of the linearization approach without applying a kernel smoothing, and that a simple transformation enhances the coverage rate of the confidence interval. The two following chapters discuss the work of Corrado Gini from an unusual angle. For instance, both balanced sampling (of which he is a pioneer) and variance estimation for the inequality measure that bears his name are discussed in a historical perspective. Zenga’s new inequality index is presented in the last chapter and a variance estimator is proposed.

Vignette d'image
Publication
Accès libre

Optimal allocation in balanced sampling

, Tillé, Yves, Favre, Anne-Catherine

The development of new sampling methods allows the selection of large balanced samples. In this paper we propose a method for computing optimal inclusion probabilities for balanced samples. Next, we show that the optimal Neyman allocation is a particular case of this method.

Vignette d'image
Publication
Accès libre

A General Result For Selecting Balanced Unequal Probability Samples From a Stream

2019-8-1, Tillé, Yves

Probability sampling methods were developed in the framework of survey statistics. Recently sampling methods are the subject of a renewed interest for the reduction of the size of large data sets. A particular application is sampling from a data stream. The stream is supposed to be so huge that the data cannot be saved. When a new unit appears, the decision to conserve it or not must be taken directly without examining all the units that already appeared in the stream. In this paper, we examine the existing possible methods for sampling with unequal probabilities from a stream. Next we propose a general result about sampling in several phases from a balanced sample that enables us to propose several new solutions for sampling and multi-phase sampling from a stream. Several new applications of this general result are developed.

Vignette d'image
Publication
Accès libre

New methods to handle nonresponse in surveys

2015, Hasler, Caren, Tillé, Yves

Ce document porte sur la nonréponse dans les enquêtes par échantillonnage. Principalement, des méthodes de traitement de la nonréponse dans des enquêtes complexes sont proposées. Le premier chapitre de ce document introduit des concepts relatifs à l'échantillonnage et à la nonréponse. Le second chapitre propose un algorithme d'échantillonnage équilibré pour des populations hautement stratifiées. Le troisième chapitre de ce document propose une méthode d'imputation par donneur dont la sélection se fait par échantillonnage équilibré combiné à une approche nonparamétrique. Cette méthode nécessite l'utilisation de l'algorithme faisant l'objet du second chapitre. Le chapitre qui suit présente une méthode d'imputation nonparamétrique basée sur les modèles de régression additifs. Finalement, le cinquième chapitre propose trois procédures de repondération pour le traitement de la nonréponse non-ignorable applicable lorsque les valeurs prises par la variable d'intérêt proviennent d'une densité mélange., This document focuses on nonresponse in sample surveys. Mainly, methods to handle nonresponse in complex surveys are proposed. The first chapter of this document introduces concepts and notation of survey sampling and nonresponse. The second chapter proposes an algorithm for stratified balanced sampling for populations with large numbers of strata. The third chapter of this document presents a hot-deck imputation method which combines balanced sampling and a nonparametric approach. This method uses the algorithm presented in the second chapter. The next chapter presents a nonparametric method of imputation for item nonresponse in surveys based on additive regression models. Finally, the fifth chapter proposes three reweighting procedures for handling nonignorable nonresponse in surveys providing that the values of the variable of interest are obtained from a mixture distribution.

Vignette d'image
Publication
Accès libre

Corrado Gini, a pioneer in balanced sampling and inequality theory

2011-3-14, Langel, Matti, Tillé, Yves

This paper attempts to make the link between two of Corrado Gini’s contributions to statistics: the famous inequality measure that bears his name and his work in the early days of balanced sampling. Some important notions of the history of sampling such as representativeness, randomness, and purposive selection are clarified before balanced sampling is introduced. The Gini index is described, as well as its estimation and variance estimation in the sampling framework. Finally, theoretical grounds and some simulations on real data show how some well used auxiliary information and balanced sampling can enhance the accuracy of the estimation of the Gini index.

Vignette d'image
Publication
Accès libre

A Simple and Efficient Way of Rounding Calibration Weights

2019, Tillé, Yves

Sartore et al. (2019) have proposed a method to round calibration weights to integer values. Their method is based on a discrete coordinate descent algorithm. We propose a much simpler method based on balanced sampling that achieves the same aim. This method provides random, unbiased and balanced rounded weights.\\ Keywords: balanced sampling, cube method, calibration, weights.

Pas de vignette d'image disponible
Publication
Métadonnées seulement

Doubly balanced spatial sampling with spreading and restitution of auxiliary totals

2013-3, Grafström, A., Tillé, Yves

A new spatial sampling method is proposed in order to achieve a double property of balancing. The sample is spatially balanced or well spread so as to avoid selecting neighbouring units. Moreover, the method also enables to satisfy balancing equations on auxiliary variables available on all the sampling units because the Horvitz–Thompson estimator is almost equal to the population totals for these variables. The method works with any definition of distance in a multidimensional space and supports the use of unequal inclusion probabilities. The algorithm is simple and fast. Examples show that the method succeeds in using more information than the local pivotal method, the cube method and the Generalized Random Tessellation Stratified sampling method, and thus performs better. An estimator of the variance for this sampling design is proposed in order to lead to an inference that takes the effect of the sampling design into account.

Vignette d'image
Publication
Accès libre

Doubly balanced spatial sampling with spreading and restitution of auxiliary totals

, Grafström, Anton, Tillé, Yves

A new spatial sampling method is proposed in order to achieve a double property of balancing. The sample is spatially balanced or well spread so as to avoid selecting neighbouring units. Moreover, the method also enables to satisfy balancing equations on auxiliary variables available on all the sampling units because the Horvitz–Thompson estimator is almost equal to the population totals for these variables. The method works with any definition of distance in a multidimensional space and supports the use of unequal inclusion probabilities. The algorithm is simple and fast. Examples show that the method succeeds in using more information than the local pivotal method, the cube method and the Generalized Random- Tessellation Stratified sampling method, and thus performs better. An estimator of the variance for this sampling design is proposed in order to lead to an inference that takes the effect of the sampling design into account.