Vocabulary for Chapter 12
Chapter 12 covers supervised learning and the statistics of predicting categorical variables. Also discussed are the issues of overfitting and generalizability and how to “train” statistical models.
The vocabulary words for Chapter 12 are:
predictors | characteristics measured for an observation that may be useful in predicting the target variable |
overfitting | the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit additional data or predict future obervations reliably |
generalization | refers to how well the concepts learned by a machine learning model apply to specific examples not seen by the model when it was learning |
statistical learning | framework for machine learning drawing from the fields of statistics and functional analysis. Deals with the problem of finding a predictive function based on data |
objective response | in the context of supervised learning, a measurable response |
kernel methods | class of algorithms for pattern analysis, whose best known member is the support vector machine (SVM). These use kernel functions, which enable them to operate in a high-dimensional, implicit feature space without ever computing the coordinates of the data in that space |
regression | statistical method that attempts to determine the strength and character of the relationship between one dependent variable (usually denoted by Y) and a series of other variables (known as independent variables) |
classification | the process of grouping observations in a dataset by their similarities in terms of measured characteristics |
linear discriminant analysis (LDA) | a common technique used both for supervised learning classification and as a pre-processing dimension reduction step that finds a linear combination of features to help in classification |
misclassification rate (MCR) | in regards to statistical learning, the fraction of times the prediction is wrong, specifically when relating to classification of models |
leave-one out cross-validation | k-fold cross validation taken to its logical extreme, with K equal to N, the number of data points in the set |
k-fold cross-validation | a technique where observations are repeatedly split into a training set of size around n(k-1)/k and a test set of size of around n/k. Mainly used in prediction when one wants to estimate how accurately a predictive model will perform in practice |
curse of dimensionality | refers to the fact that high-dimensional spaces are very hard, if not impossible, to sample thoroughly, because data in any particular region becomes very sparse as dimensions increase |
confusion table | in the field of machine learning and specifically the problem of statistical classification, a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one. By counting the number of observations truly within each class versus the number predicted by the model to be in each class |
sensitvity | true positivity rate or recall, measures the proportion of actual positives that are correctly identified as such |
specificity | true negative rate, the probability, measures the proportion of actual negatives that are correctly identified as negative |
receiver operating characteristic(ROC)/precision recall curve | a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied |
Jaccard index | a statistic used in quantifying the similarities between sample sets, which is formally defined as the size of the intersection between two sets divided by the size of the union of the sets |
mean-squared error (MSE) | the average squared error |
risk function/cost function/objective function | the function that you optimize during the training of a predictive model (e.g., the maximum likelihood function for a classic regression model) |
bias | a measure of how different the average of all the different estimates is from the truth |
variance | how much an individual estimate might scatter from the average value |
penalization | a tool to actively control and exploit the variance-bias tradeoff |
regularization | a method used to to ensure stable estimates by helping to prevent overfitting of the model to the training data |
logistic regression | a statistical model that in its basic form uses a logistic function to model a binary dependent variable. A binary logistic model has a dependent variable with two possible values (e.g, healthy/sick) which are represented by indicator variables (0,1) |
penalty function | a term added to the objective function that consists of a penalty parameter multiplied by a measure of violation of the constraints |
ridge regression | a method of regression in which the cost function is altered by adding a penalty equivalent to square of the magnitude of the coefficients. Doing this shrinks coefficients and helps reduce model complexity and mutli-collinearity |
lasso | in the context of statistical regression modeling, Least Absolute Shrinkage and Selection Operator that is used in one type of regression modeling to reduce over-fitting and select useful features of hte data for predicting the outcome |
elastic net | in the fitting of linear or logistic regression models, a regularized regression method that linearly combines the L1 and L2 penalties of the lasso and ridge methods |
ExperimentalHub | in the context of Bioconductor, provides a central location where curated data from experiments, publications or training courses can be accessed |
kingdom | the second highest taxonomic rank, just below domain |
phylum | a level of classification of taxonomic rank below kingdom and above class |
species | the basic unit of classification and a taxonomic rank of an organism, as well as unit of biodiversity |
diagnostic plots | statistical influence plots that help to visualize how well a model fits the data (e.g. Normal Q-Q, Residuals vs Fitted) |
tuning parameters | parameters that control the strength of the penalty term in certain types of regression algorithms (e.g., ridge and lasso regression), controlling the amount of shrinkage (where parameter estimates are shrunk towards a central point, like the mean) when fitting the mode |
p-value hacking | manipulation of the data until finding a statistic that yields a desired result |
workflow | in the context of a computational analysis, the chaining of software tools together in a series of steps that operate on data |
scale invariance | a feature of objects or laws that do not change if scales, length, energy, or other variables, are multiplied by a common factor, and thus represent universality |
Sources consulted or cited
Some of the definitions above are based in part or whole on listed definitions in the following source:
Holmes and Huber, 2019. Modern Statistics for Modern Biology. Cambridge University Press, Cambridge, United Kingdom.
https://machinelearningmastery.com/overfitting-and-underfitting-with-machine-learning-algorithms/
https://www.cs.cmu.edu/~schneide/tut5/node42.html