Vocabulary for Chapter 10

Chapter 10 discusses the use of networks and trees to visualize biological data. It covers the main components of each and how different data sets can be appropriately transformed into specific networks and trees based on what you are trying to present. The vocabulary words for Chapter 10 are:

Graph A structure formed by a set of nodes or vertices and a set of edges between these vertices
Adjacency matrix The matrix representation of edges of a graph with as many rows as nodes in the graph
Network A weighted, directed graph
Sparse In the context of graphs, a term to describe a graph when the number of edges is similar to the number of nodes
Dense In the context of graphs, a term to describe a graph when the number of edges is (approximately) a quadratic function of the nodes
Arrows/directed edges Graph edges that directionally connect nodes
Annotation variables Graph visualization characteristics that help to demonstrate strength of a link in a graph by changing the width of the edge or covariates associated to the size or color of the node
Graph layouts Different ways to plot a graph, either for aesthetic or practical reasons
Binary data Data in which each observation can take only one of two values (e.g., 0 or 1)
Differentially expressed genes A term to describe changes in gene expression levels between different experimental groups
Bipartite graph A graph where each edge connects a node
Overrepresented or enriched In the context of gene expresion, a term to describe increased expression of a gene / set of genes of interest
Gene Ontology (GO) A resource aimed to unify the vocabulary to describe genes and gene product attributes across all species
Fisher’s exact test / Hypergeometric testing Two-way table testing used to account for the fact that some categories are extremely numerous and others are rarer
Known skeleton graph A graph that projects significance scores such as p-values
Perturbation In the context of a network, an alteration of the function of a biological system, induced by external or internal mechanisms
Hotspots In the context of a graph, areas with high event density
Rooted binary tree A data tree in which each node has at most two children
Cycles In the context of graphs, another word for loops: either self-loops or ones that go through several vertices
Ancestral taxa Correspond to inner nodes and are inferred from the contemporaneous data on the tips
Contemporaneous In the context of data for phylogenetic tress, organisms at terminal nodes that exist at the same time and are related to each other, thus revealing information about their common ancestors
OTUs (Operational Taxonomic Units) A method of clustering organisms based on DNA similarity of a certain taxonomic marker gene (Tips of the tree)
Parameter In the context of statistics, numerical value that describes a population
Gene trees Structures produced when different genes show differences in their evolutionary histories
Markov chain A sequence where given the current state, the next state is conditionally independent of all previous states
Molecular clock hypothesis A term that describes a technique that uses the mutation rate to infer what happened to a species historically
Non-identifiability The inability to distinguish a parameterization of a model based on observed data
Time homogeneity In the context of Markov chains, the state of the mutation rate being constant across history
Generator In the context of Markov chains, the instantaneous change probability matrix describing transitions between steps of the chain
Transition matrix A matrix that contains all probabilities of any state changes for a Markov chain
Parsimony tree A structure created using nonparametric method that minimizes the number of changes necessary to explain the data
Maximum likelihood tree A structure created using a parametric method that uses efficient optimization algorithms to maximize the likelihood of a tree under the model assumptions.
Bayesian posterior distributions for trees A method that uses MCMC to find posterior distributions of the phylogenies
Distance-based methods Semi-parametric methods similar to the hierarchical clustering algorithms but use the parametric evolutionary models
Aligning Arranging different sequences of DNA, RNA, or protein together to identify similarities or differences between them
indel (inserted-deleted) event Insertion or deletion of bases in the genome of an organism
Filtering operations In the context of low-quality rRNA reads, the removal of low-quality reads and trimming of remaining reads to a consistent length
Interactive In the context of a plot, enabling direct actions on a graphical plot to change elements and link multiple plots
Spanning tree A tree that goes through all points at least once
Minimum spanning tree (MST) Given distances between vertices, a tree that spans all the points and has the minimum total length
Jitter To slightly move coordinates on a graph to avoid too much overlapping
Undirected network A term describing a graph without arrows between nodes
Associated A term indicating that two variables are related
Friedman-Rafsky tests Tests for two/multiple sample segregation on a minimum spanning tree
Pure edges In the context of graphs, edges whose two nodes have the same level of the factor variable
Microbiome The aggregate of all microbiota that reside on or within an organism tissues and biofluids along with the corresponding anatomical sites in which they reside
Exponential random graph models (ERGMs) Models that can be used to predict vertex covariates
Protein interaction networks In the context of graphing, a way to visualize observed protein-protein interactions
Phylogenetic tree A tree used to visualize evolutionary relationships among species
Strain In the context of a virus, a genetic variant or subtype
Taxa A group of one or more populations of an organism making up a single unit, typically disected to the level of genus and species
Protein A compound made up of amino acids; one of the four types of macromolecules that make up living organisms.

Sources consulted or cited

Some of the definitions above are based in part or whole on listed definitions in the following sources.

  • Holmes and Huber, 2019. Modern Statistics for Modern Biology. Cambridge University Press, Cambridge, United Kingdom.
  • Everitt and Skrondal, 2010. The Cambridge Dictionary of Statistics (Fourth Edition). Cambridge University Press, Cambridge, United Kingdom.
  • Wikipedia: The Free Encyclopedia. http://en.wikipedia.org/wiki/Main_Page

Practice

Avatar
Sarah Cooper
Graduate Student in MIP

Sarah Cooper is a PhD graduate student at Colorado State University. She is researching the immunopathology of M. tuberculosis in animal models.

Related