Exercise 2 - Branch models

We will use codeml program from PAML by Ziheng Yang. Use the command line mode for the tasks below. First, you need to understand which control file options to use. Next, try to reproduce the same analyses with codeml.

You will need a dataset of homologous protein-coding DNA sequences (starting with the 1st codon position and ending with the 3rd). We will use data from published articles and will regenerate published results:

  • Branch models: Yang, Z. 1998. Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. Mol. Biol. Evol. 15:568-573.
    Data 1: lysozymeSmall.nucTree 1: lysozymeSmall.trees


Branch models.

  1. Use the small lysozyme example to fit free-ratio model for branches. Are results consistent with those presented in the publication above?

  2. Label the branch leading to colobine clade and fit the 2-ratio branch model to your data.

  3. In addition, label the branch leading to hominoids clade (use different label) and fit the 3-ratio branch model to your data.

  4. Based on LRTs, what model fits your data the best (among 2-ratio, 3-ratio and free-ratio models)? What are the degrees of freedoms for each comparison?

  5. What can you tell about the evolution of your gene from the ML estimates under this best model?</p></li>


Please refer to PaML/codeml documentation available here