Measures of Fit in Multiple Correspondence Analysis of Crisp and Fuzzy Coded Data

13 Pages Posted: 19 Mar 2008

See all articles by Zerrin Asan

Zerrin Asan

affiliation not provided to SSRN

Michael Greenacre

Universitat Pompeu Fabra - Faculty of Economic and Business Sciences

Date Written: March 2008

Abstract

When continuous data are coded to categorical variables, two types of coding are possible: crisp coding in the form of indicator, or dummy, variables with values either 0 or 1; or fuzzy coding where each observation is transformed to a set of "degrees of membership" between 0 and 1, using co-called membership functions. It is well known that the correspondence analysis of crisp coded data, namely multiple correspondence analysis, yields principal inertias (eigenvalues) that considerably underestimate the quality of the solution in a low-dimensional space. Since the crisp data only code the categories to which each individual case belongs, an alternative measure of fit is simply to count how well these categories are predicted by the solution. Another approach is to consider multiple correspondence analysis equivalently as the analysis of the Burt matrix (i.e., the matrix of all two-way cross-tabulations of the categorical variables), and then perform a joint correspondence analysis to fit just the off-diagonal tables of the Burt matrix - the measure of fit is then computed as the quality of explaining these tables only. The correspondence analysis of fuzzy coded data, called "fuzzy multiple correspondence analysis", suffers from the same problem, albeit attenuated. Again, one can count how many correct predictions are made of the categories which have highest degree of membership. But here one can also defuzzify the results of the analysis to obtain estimated values of the original data, and then calculate a measure of fit in the familiar percentage form, thanks to the resultant orthogonal decomposition of variance. Furthermore, if one thinks of fuzzy multiple correspondence analysis as explaining the two-way associations between variables, a fuzzy Burt matrix can be computed and the same strategy as in the crisp case can be applied to analyse the off-diagonal part of this matrix. In this paper these alternative measures of fit are defined and applied to a data set of continuous meteorological variables, which are coded crisply and fuzzily into three categories. Measuring the fit is further discussed when the data set consists of a mixture of discrete and continuous variables.

Keywords: Data coding, defuzzification, fuzzy coding, indicator matrix, joint correspondence analysis, measure of fit, multiple correspondence analysis, Burt matrix

JEL Classification: C19, C88

Suggested Citation

Asan, Zerrin and Greenacre, Michael John, Measures of Fit in Multiple Correspondence Analysis of Crisp and Fuzzy Coded Data (March 2008). Available at SSRN: https://ssrn.com/abstract=1107815 or http://dx.doi.org/10.2139/ssrn.1107815

Zerrin Asan

affiliation not provided to SSRN ( email )

Michael John Greenacre (Contact Author)

Universitat Pompeu Fabra - Faculty of Economic and Business Sciences ( email )

Ramon Trias Fargas 25-27
Barcelona, 08005
Spain
34 93 542 25 51 (Phone)
34 93 542 17 46 (Fax)

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
134
Abstract Views
905
rank
275,517
PlumX Metrics