Compositional Data Analysis

Posted: 10 Mar 2021

See all articles by Michael Greenacre

Michael Greenacre

Universitat Pompeu Fabra - Faculty of Economic and Business Sciences

Date Written: March 2021

Abstract

Compositional data are nonnegative data carrying relative, rather than absolute, information—these are often data with a constant-sum constraint on the sample values, for example, proportions or percentages summing to 1% or 100%, respectively. Ratios between components of a composition are important since they are unaffected by the particular set of components chosen. Logarithms of ratios (logratios) are the fundamental transformation in the ratio approach to compositional data analysis—all data thus need to be strictly positive, so that zero values present a major problem. Components that group together based on domain knowledge can be amalgamated (i.e., summed) to create new components, and this can alleviate the problem of data zeros. Once compositional data are transformed to logratios, regular univariate and multivariate statistical analysis can be performed, such as dimension reduction and clustering, as well as modeling. Alternative methodologies that come close to the ideals of the logratio approach are also considered, especially those that avoid the problem of data zeros, which is particularly acute in large bioinformatic data sets.

Suggested Citation

Greenacre, Michael John, Compositional Data Analysis (March 2021). Annual Review of Statistics and Its Application, Vol. 8, Issue 1, pp. 271-299, 2021, Vol. 8, Issue 1, pp. 271-299, Available at SSRN: https://ssrn.com/abstract=3800686 or http://dx.doi.org/10.1146/annurev-statistics-042720-124436

Michael John Greenacre (Contact Author)

Universitat Pompeu Fabra - Faculty of Economic and Business Sciences ( email )

Ramon Trias Fargas 25-27
Barcelona, 08005
Spain
34 93 542 25 51 (Phone)
34 93 542 17 46 (Fax)

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Abstract Views
80
PlumX Metrics