poLCA
Polytomous Variable Latent Class Analysis
 
Drew A. Linzer      Jeffrey Lewis
Department of Political Science      Department of Political Science
Emory University      University of California, Los Angeles

OVERVIEW

poLCA is a software package for the estimation of latent class models and latent class regression models for polytomous outcome variables, implemented in the R statistical computing environment.

Latent class analysis (also known as latent structure analysis) can be used to identify clusters of similar "types" of individuals or observations from multivariate categorical data, estimating the characteristics of these latent groups, and returning the probability that each observation belongs to each group. These models are also helpful in investigating sources of confounding and nonindependence among a set of categorical variables, as well as for density estimation in cross-classification tables. Typical applications include the analysis of opinion surveys; rater agreement; lifestyle and consumer choice; and other social and behavioral phenomena.

The basic latent class model is a finite mixture model in which the component distributions are assumed to be multi-way cross-classification tables with all variables mutually independent. The model stratifies the observed data by a theoretical latent categorical variable, attempting to eliminate any spurious relationships between the observed variables. The latent class regression model makes it possible for the researcher to further estimate the effects of covariates (or "concomitant" variables) on predicting latent class membership.

poLCA uses expectation-maximization and Newton-Raphson algorithms to find maximum likelihood estimates of the parameters of the latent class and latent class regression models.

DOCUMENTATION

Download user's manual (PDF). The package is also documented internally upon installation.

poLCA is distributed through the Comprehensive R Archive Network, CRAN, and appears in Task Views for Cluster Analysis & Finite Mixture Models, Multivariate Statistics, and Psychometric Models and Methods.

ACCESS

To install the package directly through R, select Packages > Install package(s)... and choose a nearby CRAN mirror. Then select poLCA from the list of packages and click OK.

To download the package for manual installation, select:

Package source: poLCA_1.3.1.tar.gz
MacOS X binary: poLCA_1.3.1.tgz
Windows binary: poLCA_1.3.1.zip
Once the installation is complete, type library(poLCA) in R to load the package into memory.

CITATION INFORMATION

Users of poLCA are requested to cite the software package as:
Linzer, Drew A. and Jeffrey Lewis. 2011. "poLCA: Polytomous Variable Latent Class Analysis." R package version 1.3.1. http://userwww.service.emory.edu/~dlinzer/poLCA.
and
Linzer, Drew A. and Jeffrey Lewis. 2011. "poLCA: an R Package for Polytomous Variable Latent Class Analysis." Journal of Statistical Software. 42(10): 1-29. http://www.jstatsoft.org/v42/i10
poLCA is provided free of charge, subject to version 2 of the GPL or any later version.

CONTACT

Please direct all inquiries, comments, and reports of bugs to dlinzer@emory.edu.

VERSION HISTORY

1.3.1: Updated to reflect publication in Journal of Statistical Software. (May 22, 2011)

1.3: Addition of supplementary functions to aid in interpretation of model results following estimation: calculation of posterior probabilities, predicted cell percentages, and entropy of the fitted model, as well as plotting of estimated model parameters. (April 4, 2011)

1.2: New functionality to output cell frequencies predicted by the latent class model in tabular form. More aggressive error checking on input data, to ensure that manifest variables are entered properly as integers from one to the maximum number of outcomes for each variable. (May 11, 2010)

1.1: Adds additional user control over ordering of latent classes and printing of model results. New functionality to automatically estimate the latent class model multiple times to locate the global maximum likelihood solution. (November 1, 2007)

1.0: Provides standard errors for all model parameters, and covariance matrix for regression model coefficients. Also allows users to specify the starting parameters for the estimation algorithm, to aid in convergence and increase control over model output. (April 4, 2007)

0.9: First public release. (June 1, 2006)