Exercise 6 - Inferring ML phylogenies with codon models

Inferring phylogenies using maximum likelihood

In this tutorial you will be guided in using PhyML and its extension, CodonPhyML, to solve common phylogenetic problems. For some of the following exercises there might be more than one single solution.

Goal: Inferring ML phylogenies with codon models using CodonPhyML

For this task, use CodonPhyML (the menu mode) to analyse your dataset (data should be protein-coding DNA). The menu interface is very similar to PhyML except CodonPhyML includes codon models and some additional amino acid models (eg, PCA models by Zoller and Schneider; antibody model AB by Mirsky et al 2015; models for ordered/disordered proteins by Szalkowski and Anisimova 2011).

Dataset file:

Fist Step

Choose to work with a protein-coding DNA dataset (codon sequences, eg, one of datasets).

The dataset should not include any stop codon, therefore we must modify the dataset deleting the last 3 columns (the last codon) from the MSA. In order to do so, you can open primates-nt.phy with AliView and after selecting the last 3 columns press delete on the keyboard. Save the new file as Phylip with the following name primates-nt-nostop.phy.

Second Step

Analyze the data using codon models M0, M0+Gamma and M5; amino acid models LG, WAG, LG+Gamma and WAG+Gamma; and GTR+Gamma. The Log-likelihods for codon models and amino acid models are comparable. For GTR+Gamma model, the converted log-likelihood score is also available so the comparison across DNA, AA and codon models can be made.

You will run CodonPhyML at maximum 8 times

  1. Based on AIC, which model fits your dataset best?
  2. Are the trees inferred using the best AA and best codon model different?
  3. Are they different to the tree inferred using the DNA model?

This exercise was prepared by Maria Anisimova