Vocabulary for Chapter 4

Chapter 4 covers how to generate both finite and infinite mixture models from various distributions. It introduces a number of terms relating to these models. The vocabulary words for Chapter 4 are:

finite mixture in the context of statistic, when the distribution of interest is a combination of a few different probability distributions
infinite mixture in the context of statistic, when the distribution of interest is a combination of many probability distributions (as many or more probability distributions as observations)
mixture model a model for a combination of two or more different probability distributions
probability density function a function giving the relative likelihood that a continuous random variable is equal to a given value. When this function is integrated over the sample space, it equals 1.
bimodal distribution a distribution comprised of two modes
expectation-maximization (EM) algorithm an algorithm that allows for parameter estimation in probabilistic models with incomplete data
data augmentation adding variables that are not measured (latent variables) to the data
latent variables variables not measured in the data
bivariate distribution a combined distribution made of two random variables
mixture fraction a fraction used to describe the inhomogeneity in the mixture composition
identifiability an issue where there can be several explanations for the same observed values; occurs when there are too many degrees of freedom in parameters
marginal likelihood the sum of the marginal distributions
expectation function a function that calculates the average of all possible values of the group that an observation belongs to
maximization step a step to optimize the parameters of a model
soft averaging the process in which observations are not assigned to groups, rather they are added to multiple groups by using probabilities of memberships as weights
model averaging the process of using several models and combining them together into a weighted model
zero-inflated data data that contains a large number of zero counts
ChIP-Seq data sequencing data that identifies DNA binding sites for proteins
chromosome a DNA molecule that contains the genetic material of an organism
binding site in the context of molecular biology, a specific region to which a macromolecule binds
deoxyribonucleotide monophosphate a single phosphate group in a unit of DNA
gene expression measurement the measurement of a functional gene product (i.e., protein or RNA)
microarray a laboratory tool used to detect gene expression
promoter in the context of genetics, a region of DNA that initiates transcription of a gene
point mass a finite probabiliity concentrated at a point in the proability mass distribution at which there is a discontinuous segment in probability density function
sampling distribution the probability distribution calculated from a random sample
empirical cumulative distribution function (ECDF) a step distribution function based on empirical data measurements
density in the context of probability distributions, the derivitive of the distribution function
bootstrap an approximation of the true sampling distribution; created by drawing new samples from the empirical distribution of the original sample
non-parametric method a statistical method that does not make assumptions about population distribution or sample size
nonparametric bootstrap an approximation of the true sampling distribution not based off of a specific assumption or a particular model
Laplace distribution a distribution that shows differences between two independent variates with identical exponential distributions
gamma distribution a distribution that is positively valued and continuous with two parameters: shape and scale
negative binomial distribution/ gamma-Poisson distubtion the probability distribution of the number of failures before the kth success in a sequence of Bernoulli trials
dispersion the amount by which a set of observations deviate from their mean
variance-stabilizing transformations transformations designed to give approximate independence between mean and variance
heteroscedasticity the variance of the data is different in different regions of the data
delta method a calculus procedure that uses random variables to approximate the expected value and variance of a function

Sources consulted or cited

Some of the definitions above are based in part or whole on listed definitions in the following sources.

  • Holmes and Huber, 2019. Modern Statistics for Modern Biology. Cambridge University Press, Cambridge, United Kingdom.
  • Everitt and Skrondal, 2010. The Cambridge Dictionary of Statistics (Fourth Edition). Cambridge University Press, Cambridge, United Kingdom.
  • Zero-Inflated Poisson Regression. Institute for Digital Research and Education Statistical Consulting. https://stats.idre.ucla.edu/r/dae/zip/.
  • Berrar, 2019. Introduction to Non-parametric Bootstrap. Research Gate. https://www.researchgate.net/
  • Do and Batzoglou, 2008. What is the expectaion maximization algorithm?. Nature Biotechnology.
  • Wikipedia: The Free Encylcopedia. https://en.wikipedia.org/wiki/Main_Page
  • Google Oxford American Dictionary. https://www.google.com
  • d’Auzay, et al., 2019. Statistics of progress variable and mixture fraction gradients in an open turbulent jet spray flame. Fuel.
  • Brownlee, 2019. A Gentle Introduction to Expectation-Maximization (EM Algorithm). Machine Learning Mastery. https://www.machinelearningmastery.com
  • Non-parametric Methods. R tutorial. https://www.r-tutor.com
  • Precise analysis of DNA–protein binding sequences. Illumina. https://www.illumina.com
  • Microarray. Nature. https://www.nature.com

Practice

Avatar
Amy Fox
Graduate Student in Microbiology, Immunology, and Pathology

She's interested in using computational tools to answer biological questions.

Related