This exhibition is part of the international initiative Mathematics of Planet Earth (www.mpe2013). It is organized by the International Centre for Theoretical Sciences (ICTS) and Center for Applicable Mathematics (CAM) of Tata Institute of Fundamental Research (TIFR). The exhibition is based on the four major themes: Waves, Networks, Optimization and Structures. It will be held at the Visvesvaraya Industrial & Technological Museum in Bangalore, India.
Origin of life and evolution: phylogenetic tree
Module: Constructing phylogenetic tree
The uniqueness of DNA (RNA) from one lineage to another allows to reconstuct the evolutionary history through a phylogenetic tree and get information on the past evolutionary processes. A phylogenetic tree constructed from sequences of human influenza virus (an epidemic simulated accross a small community) seeded in a community by an individual named -1 in the transmission network (see following image). The phylogenetic tree is composed of branches (edges) and nodes, where branches connect nodes (node: points at which two or more branches diverge). It has internal and external branches and nodes (terminal), and internal node corresponds to the hypothetical last common ancestor (LCA) of everything arising from it. Terminal nodes correspond to the sequences (DNA/RNA) from which the tree was constructed. The lengths of the branches correspond to sequence difference between the two nodes they connect. The phylogenetic tree is constructed by considering the substitution of nucleotides of the DNA as continoustime Markov process through base replacement at each site of the DNA and genetic distance between sequences.
Mathematical background: Markov process & autonomous dynamic systems
The DNA (RNA) evolution can be seen as a continuous-time Markov process since at each site of the DNA (RNA) must be one of the four bases (Guanine, Cytosine, Adenine and Thymine or Uracile if it is RNA). Therefore, the variation of DNA (RNA) from one lineage to another is associated to the probability of replacement of bases at each site. Thus, with a DNA sequence of a fixed length L (it has L sites)evolving in time by base replacement. Assuming that the processes followed by the L sites are Markovian independent, identically distributed and that the process is constant over time. For a given site of the DNA, if we denote P(t) = (p_A(t), p_G(t), p_C(t), p_T(t)), the probabilities of states A, G, C and T at time t.
For each nucleotide, its frequency at time t+dt is equal to the frequency of the nucleotide at time t, minus the frequency of the its lost plus the frequency of the newly created in that small time interval. Thus, the changes in the probability distributions (p_A(t), p_G(t), p_C(t), p_T(t)) for small time is given by P(t+dt) = P(t) + QP(t)dt, where Q is a matrix of substitution rates between bases. That leads to an autonomous system P’(t) = QP(t), which is an ordinary differential equation. Most of the evolutionary models are based on the above mathematics and they differe only in terms of assumptions and values of the rate matrix.
Thus, taking into account the susbtitution of nucleotides, the amount of sequence divergence provides information about the number of changes that have occurred along the path separating the sequences, and subsequently the lengths of branches of the phylogenetic tree.
Phylogenetic tree of influenza virus
A phylogenetic tree constructed from sequences of human influenza virus (an epidemic simulated accross a small community) seeded in a community by an individual named -1 in the transmission network (see following image). The phylogenetic tree is composed of branches (edges) and nodes, where branches connect nodes (node: points at which two or more branches diverge). It has internal and external branches and nodes (terminal), and internal node corresponds to the hypothetical last common ancestor (LCA) of everything arising from it. Terminal nodes correspond to the sequences (DNA/RNA) from which the tree was constructed. The lengths of the branches correspond to sequence difference between the two nodes they connect. The phylogenetic tree is constructed by considering the substitution of nucleotides of the DNA as continoustime Markov process through base replacement at each site of the DNA and genetic distance between sequences.
Small transmission network of influenza virus
A transmission network of human influenza virus simulated in a small community. The root sequence was a virus downloaded from the Influenza Virus Resource on NCBI website.