Exercise 9.2 “Correspondence Analysis on color association tables: Here is an example of data collected by looking at the number of Google hits resulting from queries of pairs of words. The numbers in Table 9.4 [not reproduced] are to be multiplied by 1000. For instance, the combination of the words “quiet” and “blue” returned 2,150,000 hits. Perform a correspondence analysis of these data. What do you notice when you look at the two-dimensional biplot?
Chapter 9 covers multivariate methods for heterogenous data. It builds on methods covered in Chapter 7, like dimension reduction, by extending these ideas to more complex, heterogenous data.
The vocabulary words for Chapter 9 are:
multidimensional scaling (MDS) a linear dimension reduction method applied in cases where distances between observations are available clusters in the context of data analysis, data points that group together robust in the context of a statistical method, a ‘sturdy’ estimator that is not heavily influenced by outliers outlier a single data point with large distances to other data points, thus potentially dominating and skewing the analysis breakdown point a measure of the robustness of an estimator; larger values indicate more robust estimators non-metric multidimensional scaling (NMDS) a robust ordination method which attempts to embed data points in a new space while maintaining their respective order to one another metadata information, data, or descriptions that characterize other data batch effects hidden factors that affect the data but are not documented; e.