Vocabulary for Chapter 2, Part 2
These sections introduced Markov chains and the Bayesian paradigm. Markov chain transitions were used to model dependencies along DNA sequences. The vocabulary terms are:
| Markov chain | a sequence where given the current state, the next state is conditionally independent of all previous states |
| Bayesian paradigm | approaching statistics from the perspective that probability can be viewed as a degree of belief in an event |
| Beta distribution | a probability distribution defined on the interval [0, 1] often used to model probabilities in Bayesian statistics |
| Exponential distribution | a probability distribution defined on the positive real numbers that can be used to model the time between events in a Poisson point process |
| Prior | a probability distribution describing our knowledge of a hypothesis/parameter before incorporating new data |
| Posterior | a probability distribution describing our knowledge of a hypothesis/parameter after incorporating new data |
| Haplotype | a collection of DNA sequence variants (e.g., alleles) that are spatially close on a chromosome, are usually inherited together, and thus are genetically linked |
| Marginal distribution | the distribution of a sub-collection of variables after integrating out the remaining variables in the collection. |
| Monte Carlo integration | a technique for numerical integration where the value of an integral is estimated by simulating data |
| Quantile-quantile plot (QQ-plot) | a plot comparing the quantiles from one distribution (often a theoretical distribution) to the quantiles of another distribution (often from a sample) |
| Maximum a posteriori (MAP) estimate | the mode of the posterior distribution associated with the quantity of interest |
| Escherichia coli | facultative anaerobic, rod-shaped, coliform bacterium commonly found in the lower intestine of warm-blooded organisms |
| Epigenetics | the study of heritable phenotype changes that do not involve alterations in the DNA sequence |
| Log-likelihood ratio | the log of the likelihood under one set of assumptions divided by the likelihood under another set of assumptions |
| Bimodality | when a distribution has two modes |
| Mixture | in the context of statistics, when the distribution of interest is a combination of two or more different probability distributions |
| Codon | A three-nucleotide sequence that specifies the amino acid to be created next (or to start or stop synthesis) |
| Codon bias | the differences in how often each spelling of an amino acid occurs in coding DNA |
| Genetic code | the set of instructions in a gene that tell the cell how to make a specific protein |
Sources consulted or cited
Some of the definitons above are based in part or whole on listed definitions in the following sources:
- Holmes and Huber, 2019. Modern Statistics for Modern Biology. Cambridge University Press, Cambridge, United Kingdom.
- Wikipedia: The Free Encyclopedia. http://en.wikipedia.org/wiki/Main_Page
- NIH Genetics Home Reference. https://ghr.nlm.nih.gov/
- NCBI Genetics Review. https://www.ncbi.nlm.nih.gov