# Relaxed phylogenetics and dating with confidence drummond

### Your IP has been blocked

AJ Drummond, MA Suchard, D Xie, A Rambaut Relaxed phylogenetics and dating with confidence. AJ Drummond, SYW Ho, MJ Phillips, A Rambaut. Keywords: Bayesian phylogenetic inference, molecular clock dating, MCMC, MrBayes et al., ; Thorne and Kishino, ; Drummond et al., ; Lepage et al., ). Both dating .. Relaxed phylogenetics and dating with confidence. Divergence time estimation, molecular dating methods, rate heterogeneity, review. INTRODUCTION . (Drummond & Rambaut, ). Cu rrent version. R () Relaxed phylogenetics and dating with confidence. submitted.

- We have detected suspicious activity from your IP address.
- Relaxed Phylogenetics and Dating with Confidence
- Background

Save the XML file as Heterochronous. Hit Run to start the analysis. The run should take about minutes. While waiting for your results, you can start preparing the XML file for the homochronous data. Analysing the results Load the file into Tracer to check mixing and the parameter estimates. Loading the log file into Tracer. First thing you may notice is that most of the parameters do have low ESS effective sample size below marked in red Figure This is because our chain did not run long enough.

However, the estimates we obtained with a chain of length 5'' are very similar to those obtained with a longer chain. Click on clockRate and then click on Trace to examine the trace of the parameter Figure The trace of the clock rate parameter. Note that even though the parameter has a low ESS, the chain appears to have passed the burn-in phase and seems to be sampling from across the posterior without getting stuck in any local optima.

This is not a proof that the run is mixing well, however it gives us a good intuition that the parameter will have a good ESS value if we run the chain for longer. You should always examine the parameter traces to check convergence; a high ESS value is not proof that a run has converged to the true posterior. If you like, you can compare your results with the example results we obtained with identical settings and a chain of 30, Do the parameter traces look better?

Examine the posterior estimates for the becomeUninfectiousRate, samplingProportion and clockRate in Tracer. Do the estimates look realistic? Are they different from the priors we set and if so, how? The estimated posterior distribution for the becomeUninfectiousRate has a median of This is a lot more specific than the prior we set, which allowed for a much longer infectious period.

The estimates also agree with what we know about Influenza A. In this case there was enough information in the sequencing data to estimate a more specific becoming uninfectious rate.

If we had relied more on our prior knowledge we could have set a tighter prior on the becomeUninfectiousRate parameter, which may have helped the run to converge faster, by preventing it from sampling unrealistic parameter values.

However, if you are unsure about a parameter it is always better to set more diffuse priors.

Estimated posterior distribution for the becoming uninfectious rate. This a lot lower than the mean we set for the prior on the sampling proportion 0. Therefore our prior estimate of the sampling proportion was much too high. Consequently, we see that the number of cases is also much higher than we initially thought. We assumed that there are around 1, cases when we set the prior, however our posterior indicates that the epidemic has on the order of tens of thousands of cases.

Estimated posterior distribution for the sampling proportion. This is not a cause for concern and is actually a well-documented phenomenon. When viral samples are collected over a short time period the clock rate is often overestimated.

The exact cause of the bias is not known, but it is suspected that incomplete purifying selection plays a role. What is important to keep in mind is that this is does not mean that the virus is mutating or evolving faster than usual. When samples are collected over a longer time period the estimated clock rate slows down and eventually reaches the long-term substitution rate. Estimated posterior distribution for the clock rate. H3N2 flu dynamics - Homochronous data We could also use the homochronous data to investigate the dynamics of the H3N2 spread in California in We use the 29 sequences from April 28, to investigate whether this is possible.

Follow the same procedure as for the heterochronous sampling. Note that for the Birth Death Skyline Contemporary model the sampling proportion is called rho, and refers only to the proportion of infected individuals sampled at the present time. This is to distinguish it from the sampling proportion in the Birth Death Skyline Serial model, which refers to the proportion of individuals sampled through time.

Specifying the sampling proportion prior for homochronous data. Save the file as Homochronous. Estimating the substitution rate from homochronous data After the run is finished, load the log file into Tracer and examine the traces of the parameters. Do you think running the analysis for longer will lead to the run mixing well? Most of the parameters again have ESS values belowhowever in this case the ESS values are lower than for heterochronous data and it is not clear that running the analysis for longer will lead to mixing.

Indeed, while running the analysis for longer increases increases the ESS values for some parameters, they remain low for some parameters, in particular the origin, TreeHeight tMRCA and clockRate. Now, check the clock rate and the tree height parameters.

### Relaxed phylogenetics and dating with confidence.

Do you think that homochronous samples allow for good substitution rate estimation? If yes, how would you know? If not, how can you see that and where do you think might the problem be?

Can we address this problem in our analysis? Notice the values of the substitution rate estimates. Our estimate of the clock rate is of the same order as this value, but has a very large confidence interval. Notice also, that the confidence interval of the tree height is very large [0. Another way to see that the homochronous sampling does not allow for the estimation of the clock rate is to observe a very strong negative correlation of the clock rate with the tree height.

In Tracer click on the Joint Marginal panel, select the TreeHeight and the clockRate simultaneously, and uncheck the Sample only box below the graphics Figure Clock rate and tree height correlation in homochronous data.

The correlation between the tree height and the clock rate is obvious: One way to solve this problem is to break this correlation by setting a strong prior on one of the two parameters. We describe how to set a prior on the tree height in the section below. Creating Taxon Sets We will use the results from the heterochronous data, to find out what a good estimate for the tree height of these homochronous samples is.

For this aim, we first create an MCC maximum clade credibility tree in the TreeAnnotator and then check with FigTree what the estimate of the tMRCA time to the most recent common ancestor of the samples from April 28, is. Note, however, that we do this for illustration purposes only. In good practice, one should avoid re-using the data or using the results of an analyses to inform any further analyses containing the same data.

Let's pretend therefore that the heterochronous dataset is an independent dataset from the homochronous one. Open the TreeAnnotator and set Burnin percentage to 10, Posterior probability limit to 0.

Leave the other options unchanged. Figure 24 Figure Creating the MCC tree.

## Workshop on Molecular Evolution Preparation

How can we find out what the tMRCA of our homochronous data may be? The best may be to have a look at the estimates of the heterochronous data in the FigTree. Figure 25 Figure Displaying median estimates of the node height in the MCC tree. Tick the Node Labels in the left menu, and click the arrow next to it to open the full options.

Notice, that since we are using only a subset of all the heterochronous sequences, we are interested in the tMRCA of the samples from April 28, which may not coincide with the tree height of all the heterochronous data.

## Molecular Clock Dating using MrBayes

These samples are spread around over all the clades in the tree, and the most recent common ancestor of all of them turns out to be the root of the MCC tree of the heterochronous samples. We therefore want to set the tMRCA prior of the tree formed by the homochronous sequences to be peaked around the median value of the MCC tree height, which is 0.

This will reveal the Taxon set editor. Change the Taxon set label to allseq. Select the sequences belonging to this clade, i. Figure 27 Figure Specifying the root height prior. The prior that we are specifying is the date not the height of the tMRCA of all the samples in our dataset. Thus, we need to recalculate the date from the tMRCA height estimates that we obtained above. The median date of the MRCA should therefore be calculated as follows Back in the Priors window, check the box labeled monophyletic for the allseq.

Click on the arrow next to the allseq. Change the prior distribution on the time of the MRCA of selected sequences from [none] to Laplace Distribution and set the Mu to Majority-rule consensus trees of extant taxa from a total-evidence dating and b node dating, under diversified sampling and IGR model.

The numbers at the internal nodes are the posterior probabilities of the corresponding clades. The majority-rule consensus trees are summarized in hym.

The node ages are also in hym.

The extant taxa trees from total-evidence dating and node dating under diversified sampling and IGR model are shown in Figure 4. The topologies are the same in general, except for a clade with more uncertainties.

The mean age of Hymenoptera Ma inferred from total-evidence dating is similar to that from node dating, with relatively narrower HPD interval. The total-evidence dating approach models the fossilization and sampling process explicitly, and incorporates different sources of information from the fossil record while accounting for the uncertainty of fossil placement.

In comparison, the node dating approach discards the fossil morphologies, and uses second interpretation of the fossil record as node calibrations.

Total-evidence dating provides an ideal platform for exploring and further improving the models used for Bayesian molecular clock dating analysis. Comparing the non-clock tree Fig. Thus the IGR relaxed clock model appears more suitable than the autocorrelated TK02 model, and it is not reasonable to assume a strict clock model. In conclusion, this study provides a brief overview and comparison of total-evidence dating and node dating analyses, and demonstrates the functionality of MrBayes using a dataset of Hymenoptera.

Majority-rule consensus tree of extant taxa from a non-clock analysis under the gamma-Dirichlet prior.

The branch lengths are measured by expected substitutions per site. Acknowledgements I sincerely thank Johan Nylander for valuable discussions and for organizing a workshop of MrBayes using this tutorial. Relaxed phylogenetics and dating with confidence. Bayesian inference of sampled ancestor trees for epidemiology and fossil calibration. PLoS Computational Biology Monte Carlo sampling methods using Markov chains and their applications.

The fossilized birth-death process for coherent calibration of divergence-time estimates. Accounting for calibration uncertainty in phylogenetic estimation of evolutionary divergence times. Inferring speciation and extinction rates under different sampling schemes. Molecular Biology and Evolution A compound poisson process for relaxing the molecular clock. Bayesian inference of phylogenetic trees.