Chapter 7

Exercise solution for Chapter 7

Exercise 7.4 from Modern Statistics for Modern Biology Let’s revisit the Hiiragi data and compare the weighted and unweighted approaches. 7.4a Make a correlation circle for the unweighted Hiiragi data xwt. Which genes have the best projections on the first principal plane (best approximation)? 7.4b Make a biplot showing the labels of the extreme gene-variables that explain most of the variance in the first plane. Add the the sample-points.

Vocabulary for Chapter 7

Chapter 7 covers multivariate analysis, with a focus on principal component analysis and dimension reduction in general. principal component analysis (PCA) an unsupervised ordination method used to reduce the dimensionality of data by creating scores that maximize the explained variation in the data matrix a two dimensional arrangement of rows and columns used to store data mass spectroscopy a measurement procedure based on the mass-to-charge ratio of ions, often used to measure metabolite abundance correlation coefficient a measure of how two variables co-vary, reported as a single summary value centering subtracting the mean of the data so the new mean is 0 scaling / standardizing dividing data values by the data’s standard deviation so the new standard deviation is 1 data simplification a broadly applicable term referring to the process of summarizing or reducing the dimensions of multivariate data dimension reduction summarizing data to reduce the number of variables for downstream analyses principal scores a normally distributed z-score assigned to each subject that corresponds with the specific ordering and weighting of original variables within a given principal component unsupervised learning a machine learning method used to find patterns in the data without a priori variable ranking or labeling status in the context of variables in a statistical learning algorithm, a ranking or labeling of variables (e.