Vocabulary for Chapter 2, Part 2
These sections introduced Markov chains and the Bayesian paradigm. Markov chain transitions were used to model dependencies along DNA sequences. The vocabulary terms are:
Markov chain | a sequence where given the current state, the next state is conditionally independent of all previous states |
Bayesian paradigm | approaching statistics from the perspective that probability can be viewed as a degree of belief in an event |
Beta distribution | a probability distribution defined on the interval [0, 1] often used to model probabilities in Bayesian statistics |
Exponential distribution | a probability distribution defined on the positive real numbers that can be used to model the time between events in a Poisson point process |
Prior | a probability distribution describing our knowledge of a hypothesis/parameter before incorporating new data |
Posterior | a probability distribution describing our knowledge of a hypothesis/parameter after incorporating new data |
Haplotype | a collection of DNA sequence variants (e.g., alleles) that are spatially close on a chromosome, are usually inherited together, and thus are genetically linked |
Marginal distribution | the distribution of a sub-collection of variables after integrating out the remaining variables in the collection. |
Monte Carlo integration | a technique for numerical integration where the value of an integral is estimated by simulating data |
Quantile-quantile plot (QQ-plot) | a plot comparing the quantiles from one distribution (often a theoretical distribution) to the quantiles of another distribution (often from a sample) |
Maximum a posteriori (MAP) estimate | the mode of the posterior distribution associated with the quantity of interest |
Escherichia coli | facultative anaerobic, rod-shaped, coliform bacterium commonly found in the lower intestine of warm-blooded organisms |
Epigenetics | the study of heritable phenotype changes that do not involve alterations in the DNA sequence |
Log-likelihood ratio | the log of the likelihood under one set of assumptions divided by the likelihood under another set of assumptions |
Bimodality | when a distribution has two modes |
Mixture | in the context of statistics, when the distribution of interest is a combination of two or more different probability distributions |
Codon | A three-nucleotide sequence that specifies the amino acid to be created next (or to start or stop synthesis) |
Codon bias | the differences in how often each spelling of an amino acid occurs in coding DNA |
Genetic code | the set of instructions in a gene that tell the cell how to make a specific protein |
Sources consulted or cited
Some of the definitons above are based in part or whole on listed definitions in the following sources:
- Holmes and Huber, 2019. Modern Statistics for Modern Biology. Cambridge University Press, Cambridge, United Kingdom.
- Wikipedia: The Free Encyclopedia. http://en.wikipedia.org/wiki/Main_Page
- NIH Genetics Home Reference. https://ghr.nlm.nih.gov/
- NCBI Genetics Review. https://www.ncbi.nlm.nih.gov