Options
Tillé, Yves
Nom
Tillé, Yves
Affiliation principale
Site web
Fonction
Professeur ordinaire
Email
yves.tille@unine.ch
Identifiants
Résultat de la recherche
Voici les éléments 1 - 7 sur 7
- PublicationAccès libreHandling auxiliary variables in survey sampling and nonresponse(2019)
; Ce manuscrit est consacré à l’utilisation d’informations auxiliaires en échantillonnage et en non-réponse. Nous nous intéressons à l'intégration de variables auxiliaires dans les méthodes d'échantillonnage et au traitement de la non-réponse afin d'améliorer l'efficacité et la précision des enquêtes. Nous traitons également du calcul de la précision d'estimateurs. En effet, les variances deviennent rapidement difficiles à calculer lorsque les méthodes d’estimation sont sophistiquées. La thèse est organisée comme suit. Le premier chapitre consiste en une introduction à quelques concepts d’échantillonnage et de non-réponse. Dans le deuxième chapitre, nous développons un plan d'échantillonnage pour un inventaire forestier afin de satisfaire un certain nombre d'exigences. L’échantillon doit optimiser le travail des équipes au sol tout en assurant la sélection de tous les types d’arbres. Pour atteindre les objectifs, un plan d'échantillonnage équilibré et stratifié est utilisé dans un échantillon à deux degrés. Dans le troisième chapitre, nous discutons du calcul de la variance dans le cas d'une intersection entre deux échantillons indépendants. La variance et son estimateur peuvent être décomposés conditionnellement à un échantillon ou conditionnellement à l'autre. Dans des situations spécifiques, comme dans le cas de la non-réponse, il en résulte des simplifications bien pratiques. Le quatrième chapitre présente une méthode de linéarisation pour l'estimation de la variance en présence de non-réponse. Dans le cinquième chapitre, une méthode d'imputation pour une non-réponse en fromage suisse est développée. Cette méthode d'imputation utilise un plan d'échantillonnage équilibré et stratifié., This manuscript is dedicated to the use of auxiliary information in survey sampling and nonresponse. We are interested in the integration of auxiliary variables in sampling methods and in the treatment of nonresponse to improve the efficiency and the precision of surveys. We also deal with the calculation of the precision of estimators. Indeed, variances rapidly become difficult to calculate when the estimation methods are sophisticated. The thesis is organized as follows. The first chapter consists in an introduction to some concepts of survey sampling and nonresponse. In the second chapter, we develop a sampling design for a forest inventory in order to satisfy a number of requirements. The sample needs to optimize the work of the ground teams while ensuring the selection of every type of trees. To meet the objectives, stratified balanced sampling is used in a two-stage sample. In the third chapter, we discuss the calculation of the variance when two independent samples intersect. The variance and its estimator can be decomposed conditionally to one sample or conditionally to the other one. In specific situations, as in the nonresponse case, it results in convenient simplifications. The fourth chapter presents a linearization method for the estimation of the variance in the presence of nonresponse. In the fifth chapter, an imputation method for Swiss cheese nonresponse is developed. This imputation method uses stratified balanced sampling. - PublicationAccès libreFast Balanced Sampling for Highly Stratified Population(2014-6)
; Balanced sampling is a very efficient sampling design when the variable of interest is correlated to the auxiliary variables on which the sample is balanced. Chauvet (2009) proposed a procedure to select balanced samples in a stratified population. Unfortunately, Chauvet's procedure can be slow when the number of strata is very large. In this paper, we propose a new algorithm to select balanced samples in a stratified population. This new procedure is at the same time faster and more accurate than Chauvet's. Balanced sampling can then be applied on a highly stratified population when only a few units are selected in each stratum. This algorithm turns out to be valuable for many applications. For instance, it can improve the quality of the estimates produced by multistage surveys for which only one or two primary sampling units are selected in each stratum. Moreover, this algorithm may be used to treat nonresponse. - PublicationAccès libreEvaluation and development of strategies for sample coordination and statistical inference in finite population sampling(2009)
; Cette thèse de doctorat se concentre sur deux sujets importants de la théorie des sondages. La première partie traite du problème du fondement de l'inférence statistique en populations finies. La seconde partie traite de la question de la coordination d'échantillons dans le temps. La thèse est basée sur quatre articles, dont trois ont été déjà publiés dans des revues internationales et le quatrième a été soumis pour publication. Dans les premières chapitres de la thèse, on discute de l'optimalité de stratégies composées d'un plan d'échantillonnage et d'un estimateur. On démontre que la stratégie qui consiste à utiliser l'échantillonnage équilibré avec des probabilités proportionnelles aux erreurs du modèle linéaire, et l'estimateur de Horvitz-Thompson est optimale sous le plan et sous le modèle. En suite, on montre que cette stratégie est toujours robuste et efficace dans le cas où le modèle s'avère faux en prenant un exemple sous le modèle polynomial. Les dernières chapitres traitent un premier temps de la coordination d'échantillons stratifiés, des méthodes existante dont on compare la qualité de coordination et l'optimalité à l'aide d'une étude de simulation. On propose de nouvelles méthodes basées sur des microstrates et on teste, à nouveau par simulations, leur validité. Enfin, on a réalisé une étude plus fondamentale de l'échantillonnage répété dans le temps. On y présente les plans longitudinaux les plus connus. On note qu'il y a un antagonisme entre une bonne coordination et le choix libre d'un plan transversal. On propose également une nouvelle méthode qui peut remédier à ce problème., This Ph.D. thesis concentrates on two important subjects in survey sampling theory. One is the problem of the foundation for statistical inference in finite population sampling, and the other is the problem of coordination of samples over time. The thesis is based on four articles. Three of them are already published and the last one is submitted for publication. First, we show that the model-based and design-based inferences can be reconciliated if we search for an optimal strategy rather than just an optimal estimator, a strategy being a pair composed of a sampling design and an estimator. If we accept the idea that balanced samples are randomly selected, e.g. by the cube method, then we show that, under the linear model, an optimal strategy consists of a balanced sampling design with inclusion probabilities that are proportional to the standard deviations of the errors of the model and the Horvitz-Thompson estimator. Moreover, if the heteroscedasticity of the model is "fully explainable" by the auxiliary variables, then the best linear unbiased estimator and the Horvitz-Thompson estimator are equal. We construct a single estimator for both the design and model variance. The inference can thus be valid under the sampling design and under the model. Finally, we show that this strategy is robust and efficient when the model is misspecified. Coordination of probabilistic samples is a challenging theoretical problem faced by statistical institutes. One of their aims is to maximize or minimize the overlap between several samples drawn successively in a population that changes over time. In order to do that, a dependence between the samples must be introduced. Several methods for coordinating stratified samples have already been developed. Using simulations, we compare their optimality and quality of coordination. We present new methods based on Permanent Random Numbers (PRNs) and microstrata which have the advantage of allowing us to choose between positive or negative coordination with each of the previous samples. Simulations are run to test the validity of each of them. Another aim of sampling coordination is to obtain good estimates for each wave while spreading the response burden across the entire population. We review the existing solutions. We compute their corresponding longitudinal designs and discuss their properties. We note that there is an antagonism between a good rotation and control over the cross-sectional sampling design. In order to reach a compromise between the quality of coordination and the freedom of choice of the cross-sectional design, we propose an algorithm that uses a new method of longitudinal sampling. - PublicationAccès libreFast balanced sampling for highly stratified populationBalanced sampling is a very efficient sampling design when the variable of interest is correlated to the auxiliary variables on which the sample is balanced. A procedure to select balanced samples in a stratified population has previously been proposed. Unfortunately, this procedure becomes very slow as the number of strata increases and it even fails to select samples for some large numbers of strata. A new algorithm to select balanced samples in a stratified population is proposed. This new procedure is much faster than the existing one when the number of strata is large. Furthermore, this new procedure makes it possible to select samples for some large numbers of strata, which was impossible with the existing method. Balanced sampling can then be applied on a highly stratified population when only a few units are selected in each stratum. Finally, this algorithm turns out to be valuable for many applications as, for instance, for the handling of nonresponse
- PublicationAccès libreCorrado Gini, a pioneer in balanced sampling and inequality theoryThis paper attempts to make the link between two of Corrado Gini’s contributions to statistics: the famous inequality measure that bears his name and his work in the early days of balanced sampling. Some important notions of the history of sampling such as representativeness, randomness, and purposive selection are clarified before balanced sampling is introduced. The Gini index is described, as well as its estimation and variance estimation in the sampling framework. Finally, theoretical grounds and some simulations on real data show how some well used auxiliary information and balanced sampling can enhance the accuracy of the estimation of the Gini index.
- PublicationAccès libreVariance approximation under balanced sampling
;Deville, Jean-ClaudeA balanced sampling design has the interesting property that Horvitz–Thompson estimators of totals for a set of balancing variables are equal to the totals we want to estimate, therefore the variance of Horvitz–Thompson estimators of variables of interest are reduced in function of their correlations with the balancing variables. Since it is hard to derive an analytic expression for the joint inclusion probabilities, we derive a general approximation of variance based on a residual technique. This approximation is useful even in the particular case of unequal probability sampling with fixed sample size. Finally, a set of numerical studies with an original methodology allows to validate this approximation. - PublicationAccès libreOptimal sampling and estimation strategies under the linear modelIn some cases model-based and model-assisted inferences can lead to very different estimators. These two paradigms are not so different if we search for an optimal strategy rather than just an optimal estimator, a strategy being a pair composed of a sampling design and an estimator. We show that, under a linear model, the optimal model-assisted strategy consists of a balanced sampling design with inclusion probabilities that are proportional to the standard deviations of the errors of the model and the Horvitz–Thompson estimator. If the heteroscedasticity of the model is ‘fully explainable’ by the auxiliary variables, then this strategy is also optimal in a model-based sense. Moreover, under balanced sampling and with inclusion probabilities that are proportional to the standard deviation of the model, the best linear unbiased estimator and the Horvitz–Thompson estimator are equal. Finally, it is possible to construct a single estimator for both the design and model variance. The inference can thus be valid under the sampling design and under the model.