Master Thesis Lab

Statistics & Methods Centre - Exploratory analysis

Exploratory multivariate interdependence techniques aim to unravel relationships between variables and/or subjects without explicitly assuming specific distributions for the variables. The idea is to describe the patterns in the data without making (very) strong assumptions about the variables. In a sense the techniques are primarily treating samples as populations. Generalisability should come from replication rather than assumptions about sampling.
The references for the books mentioned in this section can be found on the Courses and textbooks page


Principal component analysis: Numerical variables
Principal component (or components) analysis is (1) a multivariate technique to investigate the correlations among a (large) number of variables; (2) A multivariate technique to identify linear combinations of a set of variables which explain as much of the variance in the data as possible. There several other definitions and uses.
In textbooks and computer programs Principal component analysis is generally (and sometimes confusingly) discussed together and intertwined with exploratory factor analysis. The indicated references also explain the differences between principal component analysis and factor analysis
Basic reading
Field, Chapter 15, section 15.3
Hair et al., Chapter 3: Factor analysis
Meyers et al., Chapter 12A: Principal components and Factor analysis, and Chapter 12B: Principal components and factor analysis using SPSS
Advanced reading
Joliffe, I.T. (2002). Principal component analysis (2nd edition). New York: Springer
Software
SPSS => Analyze => Data reduction => Factor (=> Extraction to see default is indeed Principal components)
Annotated output for SPSS - UCLA-ATS site
Annotated output for SPSS - Andy Field site
Reporting Principal component analysis in publications
Reporting - examples in journal articles

Top


Principal component analysis: Categorical, ordinal, and numerical variables
Categorical (or nonlinear) principal component analysis is a multivariate technique which extends standard principal component analysis to variables with all kinds of measurement levels and can also handle nonlinear relationships between numerical variables. The SPSS program CAtPCA is very useful to make biplots, also in the case of numerical variables.
Basic reading
No standard English language extbook exists which treats this subject.
De Heus, P. Van der Leeden, R., & Gazendam, B. (1995). Toegepaste data-analyse Chapter xx. Utrecht: Lemma
Advanced reading
Linting, M. et al. (2007). Nonlinear principal components analysis:Introduction and applications. Psychological Methods, 12, 336-358.
Meulman, J.J., Van der Kooij, A.J., & Heiser, W.J. (2004). Princnipal component analysis with nonlinear optimal scaling transformations for ordinal and nominal data. In Kaplan, D. (Ed.), Handbook of quantitative methodology for the social sciences (pp.49-70). London: Sage.
Meulman, J.J., Heiser, W.J., & SPSS Inc. (2004). SPSS Categories 13.0, Chapter 3. Chicago, IL:SPSS
Software
SPSS => Analyze => Data reduction => Optimal scaling => Some variable(s) not multiple nominal
Annotated output for SPSS in the SPSS Manual: Meulman, Heiser, & SPSS Inc.
Reporting Categorical principal component analysis in publications
Reporting - examples in journal articles

Top


Correspondence analysis
Correspondence analysis is a technique which is applied to a contingency table (cross table) and it is the intention to represent the row categories and column categories as well as possible in a plot such that the interaction between rows and columns can be examined.
Basic reading
Hair et al. , Chapter 9 (p. 663-674, 691-697).
Advanced reading
Greenacre, M.J. (1985). Theory and applications of correspondence analysis. Academic Press
Greenacre, M.J. (2007). Correspondence analysis in practice (2nd edition). Chapman and Hall
Software
SPSS => Analyze => Data reduction => Correspondence analysis
Annotated output for SPSS in Meulman, Heiser, & SPSS Inc. (2004). SPSS Categories 14.0 (Chapter 5: Correspondence analysis). Chicago: SPSS
Reporting Correspondence analysis in publications
Reporting - examples in journal articles

Top


Multidimensional scaling - MDS
Multidmensional scaling refers to a collection of techniques which aim to portray (dis)similarity between objects (however defined) in a graph such that the distances in the graph correspond as well as possible with the observed (dis)similarities. Different MDS techniques can have somewhat different definitions from the one given here.
Basic reading
Hair et al., Chapter 9 (p. 629-662, 679-690).
Advanced reading
Borg, I. & Groenen, P.J.F. Modern multidimensional scaling: Theory and applications. New York: Springer.
Software
SPSS => Analyze => Scaling => Multidimensional scaling (ProxScal)
Annotated output for SPSS in Meulman, Heiser, & SPSS Inc. (2004). SPSS Categories 14.0 (Chapter 7: Multidimensional scaling (ProxScal). Chicago: SPSS
Reporting Multidimensional scaling in publications
Reporting - examples in journal articles

Top


Cluster analysis
Cluster analysis consists of a group of methods aimed at grouping or clustering objects such that objects or subjects in a group share many of the characteristics measured. Clustering is generally done on objects rather than variables, even though the latter is possible.
Basic reading
Hair et al., Chapter 8: Cluster analysis.
Advanced reading
Everitt, B.S., Landau, S., & Leese, M. (2001). Cluster analysis (4th edition). London: Hodder Arnold.
Software
SPSS => Analyze => Classify => TwoStep clustering
SPSS => Analyze => Classify => K-means clustering
SPSS => Analyze => Classify => Hierarchical clustering
Annotated output for SPSS
Reporting Cluster analysis in publications
Reporting - examples in journal articles

Top