Diese Ausstellung ist Teil der internationalen Initiative Mathematics of Planet Earth (www.mpe2013). Sie wird vom International Centre for Theoretical Sciences (ICTS) und dem Center for Applicable Mathematics (CAM) of Tata Institute of Fundamental Research (TIFR) organisiert. Die Ausstellung behandelt vier große Themenkomplexe: Wellen, Netzwerke, Optimierung und Strukturen. Sie findet im Visvesvaraya Industrial & Technological Museum in Bangalore, Indien, statt.

# Origin of life and evolution: phylogenetic tree

**Module:** Constructing phylogenetic tree

The uniqueness of DNA (RNA) from one lineage to another allows to reconstuct the evolutionary history through a phylogenetic tree and get information on the past evolutionary processes. A phylogenetic tree constructed from sequences of human influenza virus (an epidemic simulated accross a small community) seeded in a community by an individual named -1 in the transmission network (see following image). The phylogenetic tree is composed of branches (edges) and nodes, where branches connect nodes (node: points at which two or more branches diverge). It has internal and external branches and nodes (terminal), and internal node corresponds to the hypothetical last common ancestor (LCA) of everything arising from it. Terminal nodes correspond to the sequences (DNA/RNA) from which the tree was constructed. The lengths of the branches correspond to sequence difference between the two nodes they connect. The phylogenetic tree is constructed by considering the substitution of nucleotides of the DNA as continoustime Markov process through base replacement at each site of the DNA and genetic distance between sequences.

**Mathematical background:** Markov process & autonomous dynamic systems

The DNA (RNA) evolution can be seen as a continuous-time Markov process since at each site of the DNA (RNA) must be one of the four bases (Guanine, Cytosine, Adenine and Thymine or Uracile if it is RNA). Therefore, the variation of DNA (RNA) from one lineage to another is associated to the probability of replacement of bases at each site. Thus, with a DNA sequence of a fixed length *L * (it has *L* sites)evolving in time by base replacement. Assuming that the processes followed by the *L* sites are Markovian independent, identically distributed and that the process is constant over time. For a given site of the DNA, if we denote P(t) = (p_A(t), p_G(t), p_C(t), p_T(t)), the probabilities of states A, G, C and T at time *t*.

For each nucleotide, its frequency at time t+dt is equal to the frequency of the nucleotide at time t, minus the frequency of the its lost plus the frequency of the newly created in that small time interval. Thus, the changes in the probability distributions (p_A(t), p_G(t), p_C(t), p_T(t)) for small time is given by P(t+dt) = P(t) + QP(t)dt, where Q is a matrix of substitution rates between bases. That leads to an autonomous system P’(t) = QP(t), which is an ordinary differential equation. Most of the evolutionary models are based on the above mathematics and they differe only in terms of assumptions and values of the rate matrix.

Thus, taking into account the susbtitution of nucleotides, the amount of sequence divergence provides information about the number of changes that have occurred along the path separating the sequences, and subsequently the lengths of branches of the phylogenetic tree.

## Phylogenetic tree of influenza virus

A phylogenetic tree constructed from sequences of human influenza virus (an epidemic simulated accross a small community) seeded in a community by an individual named -1 in the transmission network (see following image). The phylogenetic tree is composed of branches (edges) and nodes, where branches connect nodes (node: points at which two or more branches diverge). It has internal and external branches and nodes (terminal), and internal node corresponds to the hypothetical last common ancestor (LCA) of everything arising from it. Terminal nodes correspond to the sequences (DNA/RNA) from which the tree was constructed. The lengths of the branches correspond to sequence difference between the two nodes they connect. The phylogenetic tree is constructed by considering the substitution of nucleotides of the DNA as continoustime Markov process through base replacement at each site of the DNA and genetic distance between sequences.

## Small transmission network of influenza virus

A transmission network of human influenza virus simulated in a small community. The root sequence was a virus downloaded from the Influenza Virus Resource on NCBI website.