Exercise 1 - Simple codon model

We will use codeml program from PAML by Ziheng Yang. Use the command line mode for the tasks below. First, you need to understand which control file options to use. Next, try to reproduce the same analyses with codeml</code>.

You will need a dataset of homologous protein-coding DNA sequences (starting with the 1st codon position and ending with the 3rd). We will use data from published articles and will regenerate published results:


The simple codon model with constant ω.

  1. Choose a dataset from the publication above and fit model M0 - the most simple codon model with constant ω over time and sites. Run model M0 twice: first with branch lengths fixed to those in the tree file, and once with branch lengths estimated by ML.

  2. Compare the optimised log-likelihoods for the two runs? In which case is it higher? Why?

  3. Next, study the output file for the run with estimated branch lengths: Do you observe the codon frequency bias?

  4. Study the statistics of nucleotide usage for different codon positions. Which position displays the most bias? Why? What is the ML estimate of the transition-transversion ratio κ? What is the ML estimate of the ω-ratio? How do you interpret these ML estimates?


Please refer to PaML/codeml documentation available here