Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
From genomes to post-processing of Bayesian inference of phylogeny
KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST). (Lars Arvestad)ORCID-id: 0000-0003-0539-3491
2016 (engelsk)Doktoravhandling, med artikler (Annet vitenskapelig)
Abstract [en]

Life is extremely complex and amazingly diverse; it has taken billions of years of evolution to attain the level of complexity we observe in nature now and ranges from single-celled prokaryotes to multi-cellular human beings. With availability of molecular sequence data, algorithms inferring homology and gene families have emerged and similarity in gene content between two genes has been the major signal utilized for homology inference. Recently there has been a significant rise in number of species with fully sequenced genome, which provides an opportunity to investigate and infer homologs with greater accuracy and in a more informed way. Phylogeny analysis explains the relationship between member genes of a gene family in a simple, graphical and plausible way using a tree representation. Bayesian phylogenetic inference is a probabilistic method used to infer gene phylogenies and posteriors of other evolutionary parameters. Markov chain Monte Carlo (MCMC) algorithm, in particular using Metropolis-Hastings sampling scheme, is the most commonly employed algorithm to determine evolutionary history of genes. There are many softwares available that process results from each MCMC run, and explore the parameter posterior but there is a need for interactive software that can analyse both discrete and real-valued parameters, and which has convergence assessment and burnin estimation diagnostics specifically designed for Bayesian phylogenetic inference.

In this thesis, a synteny-aware approach for gene homology inference, called GenFamClust (GFC), is proposed that uses gene content and gene order conservation to infer homology. The feature which distinguishes GFC from earlier homology inference methods is that local synteny has been combined with gene similarity to infer homologs, without inferring homologous regions. GFC was validated for accuracy on a simulated dataset. Gene families were computed by applying clustering algorithms on homologs inferred from GFC, and compared for accuracy, dependence and similarity with gene families inferred from other popular gene family inference methods on a eukaryotic dataset. Gene families in fungi obtained from GFC were evaluated against pillars from Yeast Gene Order Browser. Genome-wide gene families for some eukaryotic species are computed using this approach.

Another topic focused in this thesis is the processing of MCMC traces for Bayesian phylogenetics inference. We introduce a new software VMCMC which simplifies post-processing of MCMC traces. VMCMC can be used both as a GUI-based application and as a convenient command-line tool. VMCMC supports interactive exploration, is suitable for automated pipelines and can handle both real-valued and discrete parameters observed in a MCMC trace. We propose and implement joint burnin estimators that are specifically applicable to Bayesian phylogenetics inference. These methods have been compared for similarity with some other popular convergence diagnostics. We show that Bayesian phylogenetic inference and VMCMC can be applied to infer valuable evolutionary information for a biological case – the evolutionary history of FERM domain.

sted, utgiver, år, opplag, sider
Stockholm: KTH Royal Institute of Technology, 2016. , s. viii, 65
Serie
TRITA-CSC-A, ISSN 1653-5723 ; 2016:01
Emneord [en]
Bayesian inference
HSV kategori
Forskningsprogram
Datalogi
Identifikatorer
URN: urn:nbn:se:kth:diva-181319ISBN: 978-91-7595-849-1 (tryckt)OAI: oai:DiVA.org:kth-181319DiVA, id: diva2:899026
Disputas
2016-02-25, Fire, Tomtebodavägen 23, 171 65, Solna, 14:00 (engelsk)
Opponent
Veileder
Merknad

QC 20160201

Tilgjengelig fra: 2016-02-01 Laget: 2016-01-31 Sist oppdatert: 2018-01-10bibliografisk kontrollert
Delarbeid
1. Quantitative synteny scoring improves homology inference and partitioning of gene families
Åpne denne publikasjonen i ny fane eller vindu >>Quantitative synteny scoring improves homology inference and partitioning of gene families
2013 (engelsk)Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 14, s. S12-Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Background: Clustering sequences into families has long been an important step in characterization of genes and proteins. There are many algorithms developed for this purpose, most of which are based on either direct similarity between gene pairs or some sort of network structure, where weights on edges of constructed graphs are based on similarity. However, conserved synteny is an important signal that can help distinguish homology and it has not been utilized to its fullest potential. Results: Here, we present GenFamClust, a pipeline that combines the network properties of sequence similarity and synteny to assess homology relationship and merge known homologs into groups of gene families. GenFamClust identifies homologs in a more informed and accurate manner as compared to similarity based approaches. We tested our method against the Neighborhood Correlation method on two diverse datasets consisting of fully sequenced genomes of eukaryotes and synthetic data. Conclusions: The results obtained from both datasets confirm that synteny helps determine homology and GenFamClust improves on Neighborhood Correlation method. The accuracy as well as the definition of synteny scores is the most valuable contribution of GenFamClust.

sted, utgiver, år, opplag, sider
BioMed Central, 2013
Emneord
Efficient Algorithm, Eukaryotic Genomes, Protein Families, Orthologs, Identification, Clusters, Alignment, Blast, Link
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-136429 (URN)10.1186/1471-2105-14-S15-S12 (DOI)000328316700012 ()
Konferanse
11th Annual Research in Computational Molecular Biology (RECOMB) Satellite Workshop on Comparative Genomics, Lyon,France OCT 17-19, 2013
Forskningsfinansiär
Swedish e‐Science Research CenterScience for Life Laboratory - a national resource center for high-throughput molecular bioscience
Merknad

QC 20131219

Tilgjengelig fra: 2013-12-05 Laget: 2013-12-05 Sist oppdatert: 2018-01-11bibliografisk kontrollert
2. GenFamClust: An accurate, synteny-aware and reliable homology inference algorithm
Åpne denne publikasjonen i ny fane eller vindu >>GenFamClust: An accurate, synteny-aware and reliable homology inference algorithm
2016 (engelsk)Inngår i: BMC EVOLUTIONARY BIOLOGY, ISSN 1471-2148, Vol. 16Artikkel i tidsskrift (Annet vitenskapelig) Published
Abstract [en]

Background: Homology inference is pivotal to evolutionary biology and is primarily based on significant sequence similarity, which, in general, is a good indicator of homology. Algorithms have also been designed to utilize conservation in gene order as an indication of homologous regions. We have developed GenFamClust, a method based on quantification of both gene order conservation and sequence similarity. Results: In this study, we validate GenFamClust by comparing it to well known homology inference algorithms on a synthetic dataset. We applied several popular clustering algorithms on homologs inferred by GenFamClust and other algorithms on a metazoan dataset and studied the outcomes. Accuracy, similarity, dependence, and other characteristics were investigated for gene families yielded by the clustering algorithms. GenFamClust was also applied to genes from a set of complete fungal genomes and gene families were inferred using clustering. The resulting gene families were compared with a manually curated gold standard of pillars from the Yeast Gene Order Browser. We found that the gene-order component of GenFamClust is simple, yet biologically realistic, and captures local synteny information for homologs. Conclusions: The study shows that GenFamClust is a more accurate, informed, and comprehensive pipeline to infer homologs and gene families than other commonly used homology and gene-family inference methods.

sted, utgiver, år, opplag, sider
BioMed Central, 2016
Emneord
Homology inference; Gene synteny; Gene similarity; Gene family; Clustering; Gene order conservation
HSV kategori
Forskningsprogram
Datalogi
Identifikatorer
urn:nbn:se:kth:diva-180542 (URN)10.1186/s12862-016-0684-2 (DOI)000377161400002 ()27260514 (PubMedID)2-s2.0-84973324604 (Scopus ID)
Forskningsfinansiär
Swedish e‐Science Research Center
Merknad

QC 20160628

Tilgjengelig fra: 2016-01-18 Laget: 2016-01-18 Sist oppdatert: 2016-08-31bibliografisk kontrollert
3. VMCMC: a graphical and statistical analysis tool for Markov chain Monte Carlo traces
Åpne denne publikasjonen i ny fane eller vindu >>VMCMC: a graphical and statistical analysis tool for Markov chain Monte Carlo traces
Vise andre…
(engelsk)Manuskript (preprint) (Annet vitenskapelig)
Abstract [en]

Motivation: MCMC-based methods are important for Bayesian inference of phylogeny and related parameters. Although being computationally expensive, MCMC yields estimates of posterior distributions that are useful for estimating parameter values and are easy to use in subsequent analysis. There are, however, sometimes practical diculties with MCMC, relating to convergence assessment and determining burn-in, especially in large-scale analyses. Currently, multiple software are required to perform, e.g., convergence, mixing and interactive exploration of both continuous and tree parameters.

Results: We have written a software called VMCMC to simplify post-processing of MCMC traces with, for example, automatic burn-in estimation. VMCMC can also be used both as a GUI-based application, supporting interactive exploration, and as a command-line tool suitable for automated pipelines.

Availability: VMCMC is available for Java SE 6+ under the New BSD License. Executable jar les, tutorial manual and source code can be downloaded from https://bitbucket.org/rhali/visualmcmc/.

Emneord
Markov chain Monte Carlo
HSV kategori
Forskningsprogram
Datalogi
Identifikatorer
urn:nbn:se:kth:diva-180543 (URN)
Merknad

QS 2016

Tilgjengelig fra: 2016-01-18 Laget: 2016-01-18 Sist oppdatert: 2016-02-01bibliografisk kontrollert
4. Burnin estimation and convergence assessment in Bayesian phylogenetic inference
Åpne denne publikasjonen i ny fane eller vindu >>Burnin estimation and convergence assessment in Bayesian phylogenetic inference
(engelsk)Manuskript (preprint) (Annet vitenskapelig)
Abstract [en]

 Convergence assessment and burnin estimation are central concepts in Markov chain Monte Carlo algorithms. Studies on eects, statistical properties, and comparisons between dierent convergence assessment methods have been conducted during the past few decades. However, not much work has been done on the eect of convergence diagnostic on posterior distribution of tree parameters and which method should be used by researchers in Bayesian phylogenetics inference. In this study, we propose and evaluate two novel burnin estimation methods that estimate burnin using all parameters jointly. We also consider some other popular convergence diagnostics, evaluate them in light of parallel chains and quantify the eect of burnin estimates from various convergence diagnostics on the posterior distribution of trees. We motivate the use of convergence diagnostics to assess convergence and estimate burnin in Bayesian phylogenetics inference and found out that it is better to employ convergence diagnostics rather than remove a xed percentage as burnin. We concluded that the last burnin estimator using eective sample size appears to estimate burnin better than all other convergence diagnostics.

Emneord
Convergence assessment
HSV kategori
Forskningsprogram
Datalogi
Identifikatorer
urn:nbn:se:kth:diva-180544 (URN)
Merknad

QS 2016

Tilgjengelig fra: 2016-01-18 Laget: 2016-01-18 Sist oppdatert: 2016-02-01bibliografisk kontrollert
5. Tracing the evolution of FERM domain of Kindlins
Åpne denne publikasjonen i ny fane eller vindu >>Tracing the evolution of FERM domain of Kindlins
2014 (engelsk)Inngår i: Molecular Phylogenetics and Evolution, ISSN 1055-7903, E-ISSN 1095-9513, Vol. 80, s. 193-204Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Kindlin proteins represent a novel family of evolutionarily conserved FERM domain containing proteins (FDCPs) and are members of B4.1 superfamily. Kindlins consist of three conserved protein homologs in vertebrates: Kindlin-1, Kindlin-2 and Kindlin-3. All three homologs are associated with focal adhesions and are involved in Integrin activation. FERM domain of each Kindlin is bipartite and plays a key role in Integrin activation. A single ancestral Kindlin protein can be traced back to earliest metazoans, e.g., to Parazoa. This protein underwent multiple rounds of duplication in vertebrates, leading to the present Kindlin family. In this study, we trace phylogenetic and evolutionary history of Kindlin FERM domain with respect to FERM domain of other FDCPs. We show that FERM domain in Kindlin homologs is conserved among Kindlins but amount of conservation is less in comparison with FERM domain of other members in B4.1 superfamily. Furthermore, insertion of Pleckstrin Homology like domain in Kindlin FERM domain has important evolutionary and functional consequences. Important residues in Kindlins are traced and ranked according to their evolutionary significance. The structural and functional significance of high ranked residues is highlighted and validated by their known involvement in Kindlin associated diseases. In light of these findings, we hypothesize that FERM domain originated from a proto-Talin protein in unicellular or proto-multicellular organism and advent of multi-cellularity was accompanied by burst of FDCPs, which supported multi-cellularity functions required for complex organisms. This study helps in developing a better understanding of evolutionary history of FERM domain of FDCPs and the role of FERM domain in metazoan evolution.

sted, utgiver, år, opplag, sider
Elsevier, 2014
Emneord
FERM domain, Kindlins, Protein domain evolution, Evolutionary trace analysis, Focal adhesions, PH domain
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-156436 (URN)10.1016/j.ympev.2014.08.008 (DOI)000343742200020 ()2-s2.0-84906731105 (Scopus ID)
Merknad

QC 20141204

Tilgjengelig fra: 2014-12-04 Laget: 2014-11-28 Sist oppdatert: 2017-12-05bibliografisk kontrollert

Open Access i DiVA

Doctoral Thesis Hashim(6822 kB)598 nedlastinger
Filinformasjon
Fil FULLTEXT01.pdfFilstørrelse 6822 kBChecksum SHA-512
25f4f3bb0447df5196f964b6c0a329473dce4aa0ef4506703d5f14de0161c4fa5a20fc78db9470192faa949a47f43a4a389d05e56ee06eed5b7033b59a9ce14f
Type fulltextMimetype application/pdf

Personposter BETA

Ali, Raja Hashim

Søk i DiVA

Av forfatter/redaktør
Ali, Raja Hashim
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 598 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

isbn
urn-nbn

Altmetric

isbn
urn-nbn
Totalt: 2295 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf