Modelling unobserved heterogeneity in distribution - Finite mixtures of the Johnson family of distributions [ Working Paper 14-17 - 31/08/2017]

Information
Classification

This paper proposes a new model to account for unobserved heterogeneity in empirical modelling. The model extends the well-known Finite Mixture (or Latent Class) Model by using the Johnson family of distributions for the component densities. Due to the great variety of distributional shapes that can be assumed by the Johnson family, the method does not impose the usual a priori assumptions regarding the type of densities that are mixed.

PDF & Download

Publication (en - 441 Kb)

Authors

Peter Willemé (A)

A : Author, C : Contributor

Publication type

Working Papers

The Working Paper presents a study or analysis conducted by the Federal Planning Bureau on its own initiative.

With the increasing availability of micro data in many fields of applied research, finite mixture models (FMM) are becoming increasingly popular as a tool to model unobserved heterogeneity between subjects. FMMs (also known as Latent Class Models, LCM) are based on the assumption that the observations in a sample derive from an (unknown) number of heterogeneous subgroups or classes, and allow for the estimation of the parameters by subgroup. They have been used in economics to analyze health care utilization and expenditures, labour supply, productivity analysis, and market segmentation, among other topics. The models are also used extensively in applied research areas such as biology, psychology, biostatistics etc. The unobserved heterogeneity modeled with FFMs usually pertains to the mean of the distribution, although the variance has also been modelled (sometimes implicitly, as in the case of the gamma distribution). The current practice in applied economic research amounts to choosing a distributional form (normal, lognormal, gamma, Poisson, etc.) for the components, usually based on
a priori considerations regarding the support and the shape of the population distribution.

A drawback of this approach is that it places a priori restrictions on the nature of the unobserved heterogeneity in at least two ways. First, the choice of the distribution is in general somewhat arbitrary and as a rule not tested against a more general (unrestricted) alternative. Second, while the ‘true’ number of latent classes is in principle unknown, it is also routinely assumed that the mixed distributions are of one kind. That is, the mixed components only differ from each other in terms of the parameters of the chosen distribution but not in terms of the probability density functions themselves.

This paper addresses these problems by lifting some of these implicit assumptions. This is achieved by postulating a flexible form for the component distributions. Several such flexible forms have been proposed and studied long ago, such as the Pearson and Johnson families, among others. Both families share the property that they can assume a wide variety of shapes depending on the value of their four parameters. In fact, most commonly used distributions are special cases of both families. The paper outlines an algorithm that can be used to estimate the parameters of a mixture of Johnson distributions and provides a proof of principle that the method is feasible and a potential improvement over current latent class modelling practice.

The method has been tested using data generated from different distributions, chosen to cover a wide range of combinations of skewness and kurtosis. The first results are encouraging. The method generally converges in about the same number of iterations as standard models that mix normal or gamma distributions. More importantly, when the data are generated from mixed distributions that differ substantially from the standard assumptions (identical component densities and ‘regular’ skewness and kurtosis values), the mixture of Johnson distributions generally fits the data better than the standard models.

The method has not yet been tested for a mixture of regression models, which will be an obvious next step in turning it into a practical research tool.