Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Computational Problems in Modeling Evolution and Inferring Gene Families.
KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).ORCID-id: 0000-0003-4937-0670
2016 (engelsk)Doktoravhandling, med artikler (Annet vitenskapelig)
Abstract [en]

Over the last few decades, phylogenetics has emerged as a very promising field, facilitating a comparative framework to explain the genetic relationships among all the living organisms on earth. These genetic relationships are typically represented by a bifurcating phylogenetic tree — the tree of life. Reconstructing a phylogenetic tree is one of the central tasks in evolutionary biology. The different evolutionary processes, such as gene duplications, gene losses, speciation, and lateral gene transfer events, make the phylogeny reconstruction task more difficult. However, with the rapid developments in sequencing technologies and availability of genome-scale sequencing data, give us the opportunity to understand these evolutionary processes in a more informed manner, and ultimately, enable us to reconstruct genes and species phylogenies more accurately. This thesis is an attempt to provide computational methods for phylogenetic inference and give tools to conduct genome-scale comparative evolutionary studies, such as detecting homologous sequences and inferring gene families.

In the first project, we present FastPhylo as a software package containing fast tools for reconstructing distance-based phylogenies. It implements the previously published efficient algorithms for estimating a distance matrix from the input sequences and reconstructing an un-rooted Neighbour Joining tree from a given distance matrix. Results on simulated datasets reveal that FastPhylo can handles hundred of thousands of sequences in a minimum time and memory efficient manner. The easy to use, well-defined interfaces, and the modular structure of FastPhylo allows it to be used in very large Bioinformatic pipelines.

In the second project, we present a synteny-aware gene homology method, called GenFamClust (GFC) that uses gene content and gene order conservation to detect homology. Results on simulated and biological datasets suggest that local synteny information combined with the sequence similarity improves the detection of homologs.

In the third project, we introduce a novel phylogeny-based clustering method, PhyloGenClust, which partitions a very large gene family into smaller subfamilies. ROC (receiver operating characteristics) analysis on synthetic datasets show that PhyloGenClust identify subfamilies more accurately. PhyloGenClust can be used as a middle tier clustering method between raw clustering methods, such as sequence similarity methods, and more sophisticated Bayesian-based phylogeny methods.

Finally, we introduce a novel probabilistic Bayesian method based on the DLTRS model, to sample reconciliations of a gene tree inside a species tree. The method uses MCMC framework to integrate LGTs, gene duplications, gene losses and sequence evolution under a relaxed molecular clock for substitution rates. The proposed sampling method estimates the posterior distribution of gene trees and provides the temporal information of LGT events over the lineages of a species tree. Analysis on simulated datasets reveal that our method performs well in identifying the true temporal estimates of LGT events. We applied our method to the genome-wide gene families for mollicutes and cyanobacteria, which gave an interesting insight into the potential LGTs highways. 

sted, utgiver, år, opplag, sider
Stockholm: KTH Royal Institute of Technology, 2016. , s. 57
Serie
TRITA-CSC-A, ISSN 1653-5723 ; 2016:24
Emneord [en]
Evolution, Phylogenetics, Lateral Gene Transfer, Gene Families, Clustering
HSV kategori
Forskningsprogram
Datalogi
Identifikatorer
URN: urn:nbn:se:kth:diva-193637ISBN: 978-91-7729-131-2 (tryckt)OAI: oai:DiVA.org:kth-193637DiVA, id: diva2:1033289
Disputas
2016-10-18, Air, SciLifeLab, Tomtebodavägen 23A, Solna, 14:00 (engelsk)
Opponent
Veileder
Merknad

QC 20161010

Tilgjengelig fra: 2016-10-10 Laget: 2016-10-06 Sist oppdatert: 2022-12-07bibliografisk kontrollert
Delarbeid
1. fastphylo: Fast tools for phylogenetics
Åpne denne publikasjonen i ny fane eller vindu >>fastphylo: Fast tools for phylogenetics
Vise andre…
2013 (engelsk)Inngår i: BMC Bioinformatics, E-ISSN 1471-2105, Vol. 14, nr 1, s. 334-Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Background: Distance methods are ubiquitous tools in phylogenetics. Their primary purpose may be to reconstruct evolutionary history, but they are also used as components in bioinformatic pipelines. However, poor computational efficiency has been a constraint on the applicability of distance methods on very large problem instances. Results: We present fastphylo, a software package containing implementations of efficient algorithms for two common problems in phylogenetics: estimating DNA/protein sequence distances and reconstructing a phylogeny from a distance matrix. We compare fastphylo with other neighbor joining based methods and report the results in terms of speed and memory efficiency. Conclusions: Fastphylo is a fast, memory efficient, and easy to use software suite. Due to its modular architecture, fastphylo is a flexible tool for many phylogenetic studies.

sted, utgiver, år, opplag, sider
BioMed Central, 2013
Emneord
Distance matrices, Distance method, Evolutionary history, Large problems, Memory efficient, Modular architectures, Neighbor joining, Phylogenetic studies
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-136421 (URN)10.1186/1471-2105-14-334 (DOI)000329901900001 ()24255987 (PubMedID)2-s2.0-84887664660 (Scopus ID)
Forskningsfinansiär
Swedish e‐Science Research CenterScience for Life Laboratory - a national resource center for high-throughput molecular bioscience
Merknad

QC 20140205

Tilgjengelig fra: 2013-12-05 Laget: 2013-12-05 Sist oppdatert: 2024-03-15bibliografisk kontrollert
2. Quantitative synteny scoring improves homology inference and partitioning of gene families
Åpne denne publikasjonen i ny fane eller vindu >>Quantitative synteny scoring improves homology inference and partitioning of gene families
2013 (engelsk)Inngår i: BMC Bioinformatics, E-ISSN 1471-2105, Vol. 14, s. S12-Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Background: Clustering sequences into families has long been an important step in characterization of genes and proteins. There are many algorithms developed for this purpose, most of which are based on either direct similarity between gene pairs or some sort of network structure, where weights on edges of constructed graphs are based on similarity. However, conserved synteny is an important signal that can help distinguish homology and it has not been utilized to its fullest potential. Results: Here, we present GenFamClust, a pipeline that combines the network properties of sequence similarity and synteny to assess homology relationship and merge known homologs into groups of gene families. GenFamClust identifies homologs in a more informed and accurate manner as compared to similarity based approaches. We tested our method against the Neighborhood Correlation method on two diverse datasets consisting of fully sequenced genomes of eukaryotes and synthetic data. Conclusions: The results obtained from both datasets confirm that synteny helps determine homology and GenFamClust improves on Neighborhood Correlation method. The accuracy as well as the definition of synteny scores is the most valuable contribution of GenFamClust.

sted, utgiver, år, opplag, sider
BioMed Central, 2013
Emneord
Efficient Algorithm, Eukaryotic Genomes, Protein Families, Orthologs, Identification, Clusters, Alignment, Blast, Link
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-136429 (URN)10.1186/1471-2105-14-S15-S12 (DOI)000328316700012 ()24564516 (PubMedID)2-s2.0-84901249258 (Scopus ID)
Konferanse
11th Annual Research in Computational Molecular Biology (RECOMB) Satellite Workshop on Comparative Genomics, Lyon,France OCT 17-19, 2013
Forskningsfinansiär
Swedish e‐Science Research CenterScience for Life Laboratory, SciLifeLab
Merknad

QC 20220126

Tilgjengelig fra: 2013-12-05 Laget: 2013-12-05 Sist oppdatert: 2024-03-15bibliografisk kontrollert
3. Phylogenetic Partitioning of Gene Families
Åpne denne publikasjonen i ny fane eller vindu >>Phylogenetic Partitioning of Gene Families
(engelsk)Manuskript (preprint) (Annet vitenskapelig)
Abstract [en]

Clustering and organizing molecular sequences is one of the central tasks in Bioinformatics. It is a common first step in, for example, phylogenomic analysis. For some tasks, a large gene family needs to be partitioned into more manageable subfamilies. In particular, Bayesian phylogenetic analysis can be very expensive. There is a need for easy and natural means of breaking up a gene family, with moderate computational requirements, to enable careful analysis of subfamilies with computationally expensive tools. We devised and implemented a method that infer and reconcile gene trees to species trees and identifies putative orthogroups as subfamilies. To achieve reasonable speed, approximate ML phylogenies are inferred using the FastTree method and combined with a subfamily-centered bootstrapping procedure to ensure robustness. Using the new method, very large clusters of sequences are now easier to manage in pipelines containing computationally expensive steps. The implementation of PhyloGenClust is available at a public repository, https://github.com/malagori/PhyloGenClust, under the GNU General Public License version 3. 

Emneord
Phylogenetic, Clustering, Gene Families
HSV kategori
Forskningsprogram
Datalogi
Identifikatorer
urn:nbn:se:kth:diva-193634 (URN)
Merknad

QC 20161007

Tilgjengelig fra: 2016-10-06 Laget: 2016-10-06 Sist oppdatert: 2022-12-07bibliografisk kontrollert
4. Probabilistic inference of lataral gene transfer events
Åpne denne publikasjonen i ny fane eller vindu >>Probabilistic inference of lataral gene transfer events
Vise andre…
(engelsk)Manuskript (preprint) (Annet vitenskapelig)
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-162935 (URN)
Forskningsfinansiär
Swedish e‐Science Research Center
Merknad

QS 2015

Tilgjengelig fra: 2015-03-26 Laget: 2015-03-26 Sist oppdatert: 2022-10-25bibliografisk kontrollert

Open Access i DiVA

fulltext(1265 kB)700 nedlastinger
Filinformasjon
Fil FULLTEXT01.pdfFilstørrelse 1265 kBChecksum SHA-512
3419bcbe0d3c6517de4a2bb5f4675a5d381b7788f43177d2f578050615115fd099fabf09929121f418799aece7f0e3ad6113b7bdeff137d43c97ddb6443f8676
Type fulltextMimetype application/pdf

Søk i DiVA

Av forfatter/redaktør
Khan, Mehmood Alam
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 700 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

isbn
urn-nbn

Altmetric

isbn
urn-nbn
Totalt: 863 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf