Vocabulary for Chapter 1

Last updated on Jan 13, 2020 6 min read vocabulary, Chapter 1

Chapter 1 covers generative modeling for discrete data. It introduces a number of terms covering probablity and statistical modeling, as well as a few biological terms. The vocabulary words for Chapter 1 are:


probability model	A mathematical description of the possible outcomes of an experiment and the probability of each of those outcomes.
vector	In programming, a one-dimensional array of data, all with the same data type.
discrete event	In statistics, an event that can take a finite or countable number of values (e.g., number of deaths in a community by day).
categorical variable	A variable that can belong to one of a finite set of levels.
levels	In the context of a categorical variable, the set of values to which the variable can be assigned.
factor	In the context of statistical programming, a data type that can take one of a limited number of possible values (e.g., sex, nationality).
exchangeable	A property of a vector of random variables that implies the order in which the variables appear in the vector doesn’t matter.
sufficient statistic	A (summary) statistic that contains all the information about the model parameters that is in the original, uncondensed form of the data.
Bernoulli distribution	A probability distribution describing a random variable that can take on two possible outcomes (e.g., win / loss).
parameter	A numerical value that describes a population.
complementary	A description of two events who are mutually exclusive and whose probabilities sum to one (i.e., either one event or the other is guaranteed to happen, but not both).
binomial random variable	A variable whose values occur according to a binomial probability distribution.
probability mass distribution	A function giving the probability that a discrete random variable is equal to a given value.
Poisson distribution	A probability distribution for count data that has support on the non-negative integers. This distribution is also used to approximate a binomial distribution when the probability of success is small and the number of trials is large.
epitope / antigen determinent	Site on a macromolecular antigen to which an antibody binds. This is the part of an antigen that is recognized by the immune system.
Enzyme-linked immunosorbent assay (ELISA)	An assay that is used to detect specific epitopes at different positions along a protein.
conditional on	Given
cumulative distribution function	A function giving the probability that a random variable is less than any specified value.
extreme value analysis	Analysis focused on the behavior of the very large or the very small outcomes of a random distribution, allowing an exploration of the probability of rare events.
rare event	Something that occurs with a very low probability.
rank statistic	A data vector sorted least to greatest.
Monte Carlo method	A method that uses computer simulation from a generative model to determine probabilities of events.
probability or generative modeling	A method of modeling where all the parameters are known and the mathematical theory allows us to work by deduction.
deduction	A top-down method of reasoning, starting from a theory or principle rather than from data.
statistical modeling	A method of modeling where the distribution of the data is not known.
fit	In the context of statistical modeling, estimating the parameters of a model based on observed data.
multinomial	A generalization of the binomial distribution to cases where there are a finite set of possible outcomes (e.g., a roll of a die).
power / true positive rate	The probability of detecting something if it is there.
null hypothesis	Often, a hypothesis of “no association” that is used as a counterpart to a more interesting alternative hypothesis in hypothesis testing.
matrix	In programming, a two-dimensional array of data, all with the same data type.
expected value	The average (mean) value of a random variable.
variability / spread / dispersion	In statistics, the amount by which a set of observations deviate from their mean.
statistic	A numerical characteristic of a sample and known constants (i.e. no unknown parameters).
null distribution	The probability distribution under the null hypothesis.
alternative	In the context of a generating process and hypothesis testing, the generating process that is considered in comparison to the generating process under the null hypothesis.
chi-squared distribution	A distribution on the non-negative real numbers that is often used in assessing goodness-of-fit (e.g. models fit to contingency tables).
p-value	The probability of seeing the observed data or something more extreme under the generative model associated with the null hypothesis.
probability density function	A function giving the relative likelihood that a continuous random variable is equal to a given value. When this function is integrated over the sample space, it equals 1.
default	In the context of arguments to an R function, the value that is used if no custom value is specified.
C. elegans genome nucleotide frequency	How often adenine, cytosine, guanine, and thymine occur in the DNA of a roundwork often used in scientific research.
Bioconductor	Open-source software that provides contributed programs for bioinformatic data analysis.
codon	A three-nucleotide sequence that specifies the amino acid to be created next (or to start or stop synthesis).
DNA read	An inferred sequence of base pairs for a single DNA fragment, based on sequencing.
nucleotide	In the context of DNA, one of four compounds (adenine (A); cytosince (C); guanine (G); and tymine (T)) that make up the basic information unit.
genome	An organism’s complete set of DNA, including all of its genes.
replication cycle	In biology, the process that begins with the infection of a host cell by a virus and ends with the release of mature progeny virus particles.
point mutation	A change, addition, or deletion of a single nucleotide in a gene sequence.
genotype	The genetic make-up of an individual’s cells, including how the individual’s genetic make-up differs from others’.
diploid	Having genetic material in two complete sets of chromosomes, from two parents.
protein	A compound made up of amino acids; one of the four types of macromolecules that make up living organisms.
antibody	A type of protein made by certain white blood cells in response to an antigen.
antigen	A foreign substance in the body to which the immune system reacts.

Sources consulted or cited

Some of the definitions above are based in part or whole on listed definitions in the following sources.

Holmes and Huber, 2019. Modern Statistics for Modern Biology. Cambridge University Press, Cambridge, United Kingdom.
Everitt and Skrondal, 2010. The Cambridge Dictionary of Statistics (Fourth Edition). Cambridge University Press, Cambridge, United Kingdom.
Bioconductor: Open Source Software for Bioinformatics. https://www.bioconductor.org/
Wikipedia: The Free Encyclopedia. https://en.wikipedia.org/wiki/Main_Page
NIH Genetics Home Reference. https://ghr.nlm.nih.gov/
NCI Dictionary of Cancer Terms. https://www.cancer.gov/publications/dictionaries/cancer-terms

Practice

vocabulary Chapter 1

Brooke Anderson

Assistant Professor of Epidemiology

My research interests include environmental epidemiology, particularly on health impacts related to climate-related disasters, and R programming.