Voici les éléments 1 - 10 sur 38
  • Publication
    Accès libre
    The simplicial generalized beta distribution. R-package and applications
    (2019-6-8)
    A generalization of the Dirichlet and the scaled Dirichlet distributions is given by the simplicial generalized Beta, SGB (Graf, 2017). In the Dirichlet and the scaled Dirichlet distributions, the shape parameters are modeled with auxiliary variables (Maier, 2015, R-package DirichletReg) and Monti et al. (2011), respectively. On the other hand, in the ordinary logistic normal regression, it is the scale composition that is made dependent on auxiliary variables. The modeling of scales seems easier to interpret than the modeling of shapes. Thus in the SGB regression: - The scale compositions are modeled in the same way as for the logistic normal regression, i.e. each auxiliary variable generates D parameters, where D is the number of parts. - The D Dirichlet shape parameters, one for each part in the compositions, are estimated as well. - An additional overall shape parameter is introduced in the SGB that proves to have important properties in relation with non essential zeros. - Use of survey weights is an option. - Imputation of missing parts is possible. An application to the United Kingdom Time Use Survey (Gershuny and Sullivan, 2017) shows the power of the method. The R-package SGB (Graf, 2019) makes the method accessible to users. See the package vignette for more information and examples.
  • Publication
    Métadonnées seulement
    SGB R-package Simplicial Generalized Beta Regression
    (Institut de statistique Université de Neuchâtel, 2019-5-13)
    Main properties and regression procedures using a generalization of the Dirichlet distribution called Simplicial Generalized Beta distribution. It is a new distribution on the simplex (i.e. on the space of compositions or positive vectors with sum of components equal to 1). The Dirichlet distribution can be constructed from a random vector of independent Gamma variables divided by their sum. The SGB follows the same construction with generalized Gamma instead of Gamma variables. The Dirichlet exponents are supplemented by an overall shape parameter and a vector of scales. The scale vector is itself a composition and can be modeled with auxiliary variables through a log-ratio transformation. Graf, M. (2017, ISBN: 978-84-947240-0-8). See also the vignette enclosed in the package.
  • Publication
    Métadonnées seulement
    A generalized mixed model for skewed distributions applied to small area estimation
    Models with random (or mixed) effects are commonly used for panel data, in microarrays, small area estimation and many other applications.When the variable of interest is continuous, normality is commonly assumed, either in the original scale or after some transformation. However, the normal distribution is not always well suited for modeling data on certain variables, such as those found in Econometrics, which often show skewness even at the log scale. Finding the correct transformation to achieve normality is not straightforward since the true distribution is not known in practice. As an alternative, we propose to consider a much more flexible distribution called generalized beta of the second kind (GB2). The GB2 distribution contains four parameters with two of them controlling the shape of each tail, which makes it very flexible to accommodate different forms of skewness. Based on a multivariate extension of the GB2 distribution, we propose a new model with random effects designed for skewed response variables that extends the usual log-normal-nested error model. Under this new model, we find empirical best predictors of linear and nonlinear characteristics, including poverty indicators, in small areas. Simulation studies illustrate the good properties, in terms of bias and efficiency, of the estimators based on the proposed multivariate GB2 model. Results from an application to poverty mapping in Spanish provinces also indicate efficiency gains with respect to the conventional log-normalnested error model used for poverty mapping.
  • Publication
    Métadonnées seulement
    Regression for Compositions based on a Generalization of the Dirichlet Distribution
    (Université de Neuchâtel Institut de statistique, 2019)
    Consider a positive random vector following a compound distribution where the compounding parameter multiplies non-random scale parameters. The associated composition is the vector divided by the sum of its components. The conditions under which the composition depends on the distribution of the compounding parameter are given. When the original vector follows a compound distribution based on independent Generalized Gamma components, the Simplicial Generalized Beta (SGB) is the most general distribution of the composition that is invariant with respect to the distribution of the compounding parameter. Some properties and moments of the SGB are derived. Conditional moments given a sub-composition give a way to impute missing parts when knowing a sub-composition only. Distributional checks are made possible through the marginal distributions of functions of the parts that should be Beta distributed. A multiple SGB regression procedure is set up and applied to data from the United Kingdom Time Use survey.
  • Publication
    Métadonnées seulement
    Une interprétation de la pseudo-vraisemblance
    (2018-10-26)
    Considérons un modèle statistique de super-population dans lequel une variable d'intérêt connue sur une population de taille $N$ est considérée comme un ensemble de $N$ réalisations aléatoires indépendantes du modèle. La log-vraisemblance au niveau de la population s'écrit alors comme une somme. Si on ne dispose que d'un échantillon, tiré selon un plan de sondage à probabilités inégales, la log-pseudo-vraisemblance est l'estimateur de Horvitz-Thompson de la log-vraisemblance de la population. En général, les poids sont multipliés par un facteur de normalisation, de telle sorte qu'ils somment à la taille de l'échantillon. Dans le cas d'un seul niveau, cela ne change pas la valeur des paramètres estimés. Le problème du choix des facteur de normalisation dans les plans en grappes a été abondamment traité dans la littérature, sans aboutir à des directives claires. On propose de calculer ces facteurs de telle sorte que la pseudo-vraisemblance soit une vraisemblance au sens propre.
  • Publication
    Métadonnées seulement
    A generalized mixed model for skewed distributions applied to small area estimation
    (2018-6-22) ;
    Marin, Juan Miguel
    ;
    Molina, Isabel
    Models with random (or mixed) effects are commonly used for panel data, in microarrays, small area estimation and many other applications. When the variable of interest is continuous, normality is commonly assumed, either in the original scale or after some transformation. However, the normal distribution is not always well suited for modeling data on certain variables, such as those found in Econometrics, which often show skewness even at the log scale. Finding the correct transformation to achieve normality is not straightforward since the true distribution is not known in practice. As an alternative, we propose to consider a much more flexible distribution called generalized beta of the second kind (GB2). The GB2 distribution contains four parameters with two of them controlling the shape of each tail, which makes it very flexible to accommodate different forms of skewness. Based on a multivariate extension of the GB2 distribution, we propose a new model with random effects designed for skewed response variables that extends the usual log-normal nested error model. Under this new model, we find empirical best predictors of linear and nonlinear characteristics, including poverty indicators, in small areas. Simulation studies illustrate the good properties, in terms of bias and efficiency, of the estimators based on the proposed multivariate GB2 model. Results from an application to poverty mapping in Spanish provinces also indicate efficiency gains with respect to the conventional log-normal nested error model used for poverty mapping.
  • Publication
    Métadonnées seulement
    A distribution on the simplex of the Generalized Beta type
    (2018-5-18)
    Consider a random vector with positive components following a compound distribution where the mixing parameter multiplies fixed scale parameters. The closed random vector - or composition - is the vector divided by the sum of its components. We explicit on what conditions the distribution of the closed random vector does not depend on the mixing distribution. When the original vector has independent generalized Gamma components, it is shown that the invariance of the distribution of the closed random vector with respect to the mixing distribution depends on the parameters of the generalized Gamma components. This fact is exemplified with the multivariate Generalized Beta distribution of the second kind (MGB2) in which the mixing parameter follows an inverse Gamma distribution. We call the most general distribution of the closed random vector, for which the mixing parameter has no influence, the simplicial Generalized Beta (SGB). Some properties and moments of the SGB are derived. Conditional moments given a sub-composition give a way to impute missing parts when knowing a sub-composition only. Maximum likelihood estimators of the parameters are obtained. The method is applied to several examples.
  • Publication
    Métadonnées seulement
    SGB R-package Simplicial Generalized Beta Regression
    (Université de Neuchâtel Institut de statistique, 2018)
    Package SGB contains a generalization of the Dirichlet distribution, called the Simplicial Generalized Beta (SGB). It is a new distribution on the simplex (i.e. on the space of compositions or positive vectors with sum of components equal to 1). The Dirichlet distribution can be constructed from a random vector of independent Gamma variables divided by their sum. The SGB follows the same construction with generalized Gamma instead of Gamma variables. The Dirichlet exponents are supplemented by an overall shape parameter and a vector of scales. The scale vector is itself a composition and can be modeled with auxiliary variables through a log-ratio transformation.
  • Publication
    Métadonnées seulement
    A distribution on the simplex of the Generalized Beta type
    Consider a random vector with positive components following a compound distribution where the compounding parameter multiplies fixed scale parameters. The closed random vector is the vector divided by the sum of its components. We explicit on what conditions the distribution of the closed random vector does not depend on the mixing distribution. When the original vector has independent generalized Gamma components, it is shown that the unrelatedness of the distribution of the closed random vector to the compounding distribution depends on the parameters of the generalized Gamma. This fact is exemplified with the multivariate Generalized Beta distribution of the second kind (MGB2) in which the compounding parameter follows an inverse Gamma distribution. We call the most general distribution of the closed random vector, for which the compounding parameter has no influence, the simplicial Generalized Beta (SGB). Some properties and moments of the SGB are derived. Conditional moments given a sub-composition give a way to impute missing parts when knowing a sub-composition only. Maximum likelihood estimators of the parameters are obtained. The method is applied to several examples.
  • Publication
    Métadonnées seulement
    Weighted distributions
    (Université de Neuchâtel Institut de statistique, 2018)
    In a super-population statistical model, a variable of interest, defined on a finite population of size N, is considered as a set of N independent realizations of the model. The log-likelihood at the population level is then written as a sum. If only a sample is observed, drawn according to a design with unequal inclusion probabilities, the log-pseudo-likelihood is the Horvitz-Thompson estimate of the population log-likelihood. In general, the extrapolation weights are multiplied by a normalization factor, in such a way that normalized weights sum to the sample size. In a single level design, the value of estimated model parameters are unchanged by the scaling of weights, but it is in general not the case for multi-level models. The problem of the choice of the normalization factors in cluster sampling has been largely addressed in the literature, but no clear recommendations have been issued. It is proposed here to compute the factors in such a way that the pseudo-likelihood becomes a proper likelihood. The super-population model can be written equivalently for the variable of interest or for a transformation of this variable. It is shown that the pseudo-likelihood is not invariant by transformation of the variable of interest.