Solution 3 - Model comparison

Inferring phylogenies using maximum likelihood
Observing the effect of substitution models the final inferred tree topology.


Goals

In this exercise you are asked to compare different substitution models (and their variations) on the same dataset.


Datasets
Dataset file:

Execution
Model Log-likelihood Parameters AIC
JC -6379.66383 37 12833.32766
JC_I -6311.4165 38 12698.833
JC_G -6304.38084 38 12684.76168
JC_G_I -6304.35263 39 12686.70526
HKY -6251.43051 41 12584.86102
HKY_I -6180.09689 42 12444.19378
HKY_G -6172.58045 42 12429.1609
HKY_G_I -6172.52896 43 12431.05792
GTR -6241.48788 45 12572.97576
GTR_I -6171.17472 46 12434.34944
GTR_G -6163.87291 46 12419.74582
GTR_G_I -6172.52896 47 12439.05792

The number of parameters are computed in the following way for a rooted
tree:

substitution models: JC = 0 parameters | HKY = 4 parameters | GTR = 8
parameters
invariant sites = +1 parameter
gamma rates = +1 parameter

We compute the AIC value as follows:


Questions

1. Which model is the best (including HKY+Gamma), based on the AIC criterion?

We select the model with the lowest AIC value. In this case, GTR + Gamma


This exercise can also be solved using jmodeltest which performs automatically phylogenetic tree reconstructions with every model and extra parameters (i.e. invariant sites, gamma).


Tasks
  1. Which model is the best (including HKY+Gamma), based on the AIC criterion?
  2. What about adding invariant sites (+I)?