Options
Graf, Monique
Résultat de la recherche
SGB R-package Simplicial Generalized Beta Regression
2019-5-13, Graf, Monique
Main properties and regression procedures using a generalization of the Dirichlet distribution called Simplicial Generalized Beta distribution. It is a new distribution on the simplex (i.e. on the space of compositions or positive vectors with sum of components equal to 1). The Dirichlet distribution can be constructed from a random vector of independent Gamma variables divided by their sum. The SGB follows the same construction with generalized Gamma instead of Gamma variables. The Dirichlet exponents are supplemented by an overall shape parameter and a vector of scales. The scale vector is itself a composition and can be modeled with auxiliary variables through a log-ratio transformation. Graf, M. (2017, ISBN: 978-84-947240-0-8). See also the vignette enclosed in the package.
Une interprétation de la pseudo-vraisemblance
2018-10-26, Graf, Monique
Considérons un modèle statistique de super-population dans lequel une variable d'intérêt connue sur une population de taille $N$ est considérée comme un ensemble de $N$ réalisations aléatoires indépendantes du modèle. La log-vraisemblance au niveau de la population s'écrit alors comme une somme. Si on ne dispose que d'un échantillon, tiré selon un plan de sondage à probabilités inégales, la log-pseudo-vraisemblance est l'estimateur de Horvitz-Thompson de la log-vraisemblance de la population. En général, les poids sont multipliés par un facteur de normalisation, de telle sorte qu'ils somment à la taille de l'échantillon. Dans le cas d'un seul niveau, cela ne change pas la valeur des paramètres estimés. Le problème du choix des facteur de normalisation dans les plans en grappes a été abondamment traité dans la littérature, sans aboutir à des directives claires. On propose de calculer ces facteurs de telle sorte que la pseudo-vraisemblance soit une vraisemblance au sens propre.
SGB R-package Simplicial Generalized Beta Regression
2018, Graf, Monique
Package SGB contains a generalization of the Dirichlet distribution, called the Simplicial Generalized Beta (SGB). It is a new distribution on the simplex (i.e. on the space of compositions or positive vectors with sum of components equal to 1). The Dirichlet distribution can be constructed from a random vector of independent Gamma variables divided by their sum. The SGB follows the same construction with generalized Gamma instead of Gamma variables. The Dirichlet exponents are supplemented by an overall shape parameter and a vector of scales. The scale vector is itself a composition and can be modeled with auxiliary variables through a log-ratio transformation.
Discretizing a compound distribution with application to categorical modelling
2017-2-17, Graf, Monique, Nedyalkova, Desislava
Many probability distributions can be represented as compound distributions. Consider some parameter vector as random. The compound distribution is the expected distribution of the variable of interest given the random parameters. Our idea is to define a partition of the domain of definition of the random parameters, so that we can represent the expected density of the variable of interest as a finite mixture of conditional densities. We then model the mixture probabilities of the conditional densities using information on population categories, thus modifying the original overall model. We thus obtain specific models for sub-populations that stem from the overall model. The distribution of a sub-population of interest is thus completely specified in terms of mixing probabilities. All characteristics of interest can be derived from this distribution and the comparison between sub-populations easily proceeds from the comparison of the mixing probabilities. A real example based on EU-SILC data is given. Then the methodology is investigated through simulation.
A generalized mixed model for skewed distributions applied to small area estimation
2019, Graf, Monique
Models with random (or mixed) effects are commonly used for panel data, in microarrays, small area estimation and many other applications.When the variable of interest is continuous, normality is commonly assumed, either in the original scale or after some transformation. However, the normal distribution is not always well suited for modeling data on certain variables, such as those found in Econometrics, which often show skewness even at the log scale. Finding the correct transformation to achieve normality is not straightforward since the true distribution is not known in practice. As an alternative, we propose to consider a much more flexible distribution called generalized beta of the second kind (GB2). The GB2 distribution contains four parameters with two of them controlling the shape of each tail, which makes it very flexible to accommodate different forms of skewness. Based on a multivariate extension of the GB2 distribution, we propose a new model with random effects designed for skewed response variables that extends the usual log-normal-nested error model. Under this new model, we find empirical best predictors of linear and nonlinear characteristics, including poverty indicators, in small areas. Simulation studies illustrate the good properties, in terms of bias and efficiency, of the estimators based on the proposed multivariate GB2 model. Results from an application to poverty mapping in Spanish provinces also indicate efficiency gains with respect to the conventional log-normalnested error model used for poverty mapping.
A generalized mixed model for skewed distributions applied to small area estimation
2018-6-22, Graf, Monique, Marin, Juan Miguel, Molina, Isabel
Models with random (or mixed) effects are commonly used for panel data, in microarrays, small area estimation and many other applications. When the variable of interest is continuous, normality is commonly assumed, either in the original scale or after some transformation. However, the normal distribution is not always well suited for modeling data on certain variables, such as those found in Econometrics, which often show skewness even at the log scale. Finding the correct transformation to achieve normality is not straightforward since the true distribution is not known in practice. As an alternative, we propose to consider a much more flexible distribution called generalized beta of the second kind (GB2). The GB2 distribution contains four parameters with two of them controlling the shape of each tail, which makes it very flexible to accommodate different forms of skewness. Based on a multivariate extension of the GB2 distribution, we propose a new model with random effects designed for skewed response variables that extends the usual log-normal nested error model. Under this new model, we find empirical best predictors of linear and nonlinear characteristics, including poverty indicators, in small areas. Simulation studies illustrate the good properties, in terms of bias and efficiency, of the estimators based on the proposed multivariate GB2 model. Results from an application to poverty mapping in Spanish provinces also indicate efficiency gains with respect to the conventional log-normal nested error model used for poverty mapping.
A distribution on the simplex of the Generalized Beta type
2018, Graf, Monique
Consider a random vector with positive components following a compound distribution where the compounding parameter multiplies fixed scale parameters. The closed random vector is the vector divided by the sum of its components. We explicit on what conditions the distribution of the closed random vector does not depend on the mixing distribution. When the original vector has independent generalized Gamma components, it is shown that the unrelatedness of the distribution of the closed random vector to the compounding distribution depends on the parameters of the generalized Gamma. This fact is exemplified with the multivariate Generalized Beta distribution of the second kind (MGB2) in which the compounding parameter follows an inverse Gamma distribution. We call the most general distribution of the closed random vector, for which the compounding parameter has no influence, the simplicial Generalized Beta (SGB). Some properties and moments of the SGB are derived. Conditional moments given a sub-composition give a way to impute missing parts when knowing a sub-composition only. Maximum likelihood estimators of the parameters are obtained. The method is applied to several examples.
Regression for Compositions based on a Generalization of the Dirichlet Distribution
2019, Graf, Monique
Consider a positive random vector following a compound distribution where the compounding parameter multiplies non-random scale parameters. The associated composition is the vector divided by the sum of its components. The conditions under which the composition depends on the distribution of the compounding parameter are given. When the original vector follows a compound distribution based on independent Generalized Gamma components, the Simplicial Generalized Beta (SGB) is the most general distribution of the composition that is invariant with respect to the distribution of the compounding parameter. Some properties and moments of the SGB are derived. Conditional moments given a sub-composition give a way to impute missing parts when knowing a sub-composition only. Distributional checks are made possible through the marginal distributions of functions of the parts that should be Beta distributed. A multiple SGB regression procedure is set up and applied to data from the United Kingdom Time Use survey.
A distribution on the simplex of the Generalized Beta type
2018-5-18, Graf, Monique
Consider a random vector with positive components following a compound distribution where the mixing parameter multiplies fixed scale parameters. The closed random vector - or composition - is the vector divided by the sum of its components. We explicit on what conditions the distribution of the closed random vector does not depend on the mixing distribution. When the original vector has independent generalized Gamma components, it is shown that the invariance of the distribution of the closed random vector with respect to the mixing distribution depends on the parameters of the generalized Gamma components. This fact is exemplified with the multivariate Generalized Beta distribution of the second kind (MGB2) in which the mixing parameter follows an inverse Gamma distribution. We call the most general distribution of the closed random vector, for which the mixing parameter has no influence, the simplicial Generalized Beta (SGB). Some properties and moments of the SGB are derived. Conditional moments given a sub-composition give a way to impute missing parts when knowing a sub-composition only. Maximum likelihood estimators of the parameters are obtained. The method is applied to several examples.
Weighted distributions
2018, Graf, Monique
In a super-population statistical model, a variable of interest, defined on a finite population of size N, is considered as a set of N independent realizations of the model. The log-likelihood at the population level is then written as a sum. If only a sample is observed, drawn according to a design with unequal inclusion probabilities, the log-pseudo-likelihood is the Horvitz-Thompson estimate of the population log-likelihood. In general, the extrapolation weights are multiplied by a normalization factor, in such a way that normalized weights sum to the sample size. In a single level design, the value of estimated model parameters are unchanged by the scaling of weights, but it is in general not the case for multi-level models. The problem of the choice of the normalization factors in cluster sampling has been largely addressed in the literature, but no clear recommendations have been issued. It is proposed here to compute the factors in such a way that the pseudo-likelihood becomes a proper likelihood. The super-population model can be written equivalently for the variable of interest or for a transformation of this variable. It is shown that the pseudo-likelihood is not invariant by transformation of the variable of interest.