Options
Frequentist estimation of evolutionary history of sequences with substitutions & indels
Editeur(s)
Maison d'édition
Neuchâtel : Université de Neuchâtel
Date de parution
2024
Nombre de page
134
Résumé
Estimation of the evolutionary history of molecules is mainly done by reconstructing the ancestral sequences given present-day sequences and phylogeny information. Biological sequence data is a result of evolution by mutational events such as character substitutions (or point mutations), insertions and deletions (indels). Inference of the evolutionary history of sequences with substitution and indels can be used in various biomedical applications, from tracking the origin of pandemic viruses to studies of the cause of visual impairment.
Indels are among the most important sources of genomic variation and carry sound evolutionary signals; however, well-known ancestral sequence reconstruction (ASR) methods ignore or mistreat them. ASR with indels is a big challenge from both computational and statistical viewpoints. This research proposed a novel solution to infer the ancestral sequences, while accounting for the evolutionary indel process.
First, I used an evolutionary model of substitution and indel for ASR and implemented it in the ARPIP program. ARPIP implemented a novel empirical Bayes method, which allows us to reconstruct ancestral sequences with indels under the Poisson indel process (PIP). While PIP is a continuous-time Markov chain (CTMC) model that assumes single-character indels, and has important computational advantages. I showed that ARPIP reconstructed biologically reasonable indels.
Second, it is difficult to model multiple-character (or "long") indels since most evolutionary CTMC models assume site-independence. Thus, I investigated whether a single-character indel assumption was detrimental for ASR. Analysis of real and simulated data showed that the single-character indel model could be used for ASR. ARPIP preserved gap length distribution in multiple sequence alignment, including regions with long indels. Moreover, the indel variation
in six eutherian mammalian orthologous proteins was studied to explore the evolutionary dynamics of insertions and deletions.
Finally, ASR, similar to other inferences, is affected by uncertainty. To account for it, a posterior probability profile method was devised. In collaboration with an experimental lab to study properties of ancestral proteins, the approach was applied to reflect the variation in ASR inference on neural retina leucine zipper transcription factor of selected vertebrates. Moreover, an alternative reconstruction for the ambiguous regions was introduced.
Indels are among the most important sources of genomic variation and carry sound evolutionary signals; however, well-known ancestral sequence reconstruction (ASR) methods ignore or mistreat them. ASR with indels is a big challenge from both computational and statistical viewpoints. This research proposed a novel solution to infer the ancestral sequences, while accounting for the evolutionary indel process.
First, I used an evolutionary model of substitution and indel for ASR and implemented it in the ARPIP program. ARPIP implemented a novel empirical Bayes method, which allows us to reconstruct ancestral sequences with indels under the Poisson indel process (PIP). While PIP is a continuous-time Markov chain (CTMC) model that assumes single-character indels, and has important computational advantages. I showed that ARPIP reconstructed biologically reasonable indels.
Second, it is difficult to model multiple-character (or "long") indels since most evolutionary CTMC models assume site-independence. Thus, I investigated whether a single-character indel assumption was detrimental for ASR. Analysis of real and simulated data showed that the single-character indel model could be used for ASR. ARPIP preserved gap length distribution in multiple sequence alignment, including regions with long indels. Moreover, the indel variation
in six eutherian mammalian orthologous proteins was studied to explore the evolutionary dynamics of insertions and deletions.
Finally, ASR, similar to other inferences, is affected by uncertainty. To account for it, a posterior probability profile method was devised. In collaboration with an experimental lab to study properties of ancestral proteins, the approach was applied to reflect the variation in ASR inference on neural retina leucine zipper transcription factor of selected vertebrates. Moreover, an alternative reconstruction for the ambiguous regions was introduced.
Notes
Membres du jury :
Prof. Dr. Daniel Croll, University of Neuchâtel, Switzerland (Co-chair)
Prof. Dr. Maria Anisimova, Zürich University of Applied Sciences, Switzerland (Co-chair)
Prof. Dr. Pilar Eugenia Junier, University of Neuchâtel, Switzerland (Internal expert)
Prof. Dr. Ziheng Yang, University College London, UK (External expert)
Prof. Dr. Daniel Croll, University of Neuchâtel, Switzerland (Co-chair)
Prof. Dr. Maria Anisimova, Zürich University of Applied Sciences, Switzerland (Co-chair)
Prof. Dr. Pilar Eugenia Junier, University of Neuchâtel, Switzerland (Internal expert)
Prof. Dr. Ziheng Yang, University College London, UK (External expert)
Identifiants
Type de publication
doctoral thesis
Dossier(s) à télécharger