Chapter 12

Exercise Solution for Chapter 12

Exercise 12.2 from Modern Statistics for Modern Biologists Use glmnet for a prediction of a continous variable, i.e., for regression. Use the prostate cancer data from Chapter 3 of (Hastie, Tibshirani, and Friedman 2008). The data are available in the CRAN package ElemStatLearn. Explore the effects of using Ridge versus Lasso penalty. Here are the packages that need to be installed. library(dplyr) library(ggplot2) library(glmnet) # perform generalize linear models library(GGally) # used for ggpairs function library(superheat) # used to show correlation between variables Data for the exercise The ElemStatPackage isn’t on CRAN anymore.

Vocabulary for Chapter 12

Chapter 12 covers supervised learning and the statistics of predicting categorical variables. Also discussed are the issues of overfitting and generalizability and how to “train” statistical models. The vocabulary words for Chapter 12 are: predictors characteristics measured for an observation that may be useful in predicting the target variable overfitting the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit additional data or predict future obervations reliably generalization refers to how well the concepts learned by a machine learning model apply to specific examples not seen by the model when it was learning statistical learning framework for machine learning drawing from the fields of statistics and functional analysis.