Exercise 3 - Model comparison

Inferring phylogenies using maximum likelihood

In this tutorial you will be guided in using PhyML and its extension, CodonPhyML, to solve common phylogenetic problems. For some of the following exercises there might be more than one single solution.


Goal: Observing the effect of substitution models the final inferred tree topology.

In this exercise you are asked to infer the phylogenetic tree on the same dataset using different substitution models (and their variations). Use now GTR+Gamma and JC+Gamma, GTR, HKY and JC.


Datasets
Dataset file:
First Run

Run PhyML with the substitution model set to GTR, estimating the nucleotide frequencies empirically from the dataset, and executing the tree search optimisation routines.

In addition, set the following options:

1. GTR
No extra option
2. GTR+Gamma
  • Gamma distribution with 4 classes
  • Estimating Gamma shape parameter
3. GTR+I
  • Adding Invariable sites option
4. GTR+Gamma+I
  • Gamma distribution with 4 classes
  • Estimating Gamma shape parameter
  • Adding Invariable sites option
Second Run

Run PhyML with the substitution model set to HKY, estimating the transition/transversion ratio, estimating the nucleotide frequencies empirically from the dataset, and executing the tree search optimisation routines.

In addition, set the following options:

1. HKY
No extra option
2. HKY+Gamma
  • Gamma distribution with 4 classes
  • Estimating Gamma shape parameter
3. HKY+I
  • Adding Invariable sites option
4. HKY+Gamma+I
  • Gamma distribution with 4 classes
  • Estimating Gamma shape parameter
  • Adding Invariable sites option
Third Run

Run PhyML with the substitution model set to JC and executing the tree search routines.

In addition, set the following options:

1. JC
No extra option
2. JC+Gamma
  • Gamma distribution with 4 classes
  • Estimating Gamma shape parameter
3. JC+I
  • Adding Invariable sites option
4. JC+Gamma+I
  • Gamma distribution with 4 classes
  • Estimating Gamma shape parameter
  • Adding Invariable sites option
Informations
You will run PhyML at maximum 12 times (remember that some combinations have been run previously in exercise 1 and 2)

Tasks
  1. Which model is the best (including HKY+Gamma), based on the AIC criterion?
  2. What about adding invariant sites (+I)?

This exercise was prepared by Maria Anisimova