Endre søk
Begrens søket
12 1 - 50 of 55
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Treff pr side
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
Merk
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 1. Addario-Berry, L
    et al.
    Chor, B
    Hallett, M
    Lagergren, Jens
    KTH, Tidigare Institutioner, Numerisk analys och datalogi, NADA.
    Panconesi, A
    Wareham, T
    Ancestral maximum likelihood of evolutionary trees is hard2004Inngår i: Journal of Bioinformatics and Computational Biology, ISSN 0219-7200, E-ISSN 1757-6334, Vol. 2, nr 2, s. 257-271Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Maximum likelihood (ML) (Felsenstein, 1981) is an increasingly popular optimality criterion for selecting evolutionary trees. Finding optimal ML trees appears to be a very hard computational task - in particular, algorithms and heuristics for ML take longer to run than algorithms and heuristics for maximum parsimony (MP). However, while MP has been known to be NP-complete for over 20 years, no such hardness result has been obtained so far for ML. In this work we make a first step in this direction by proving that ancestral maximum likelihood (AML) is NP-complete. The input to this problem is a set of aligned sequences of equal length and the goal is to find a tree and an assignment of ancestral sequences for all of that tree's internal vertices such that the likelihood of generating both the ancestral and contemporary sequences is maximized. Our NP-hardness proof follows that for MP given in (Day, Johnson and Sankoff, 1986) in that we use the same reduction from VERTEX COVER; however, the proof of correctness for this reduction relative to AML is different and substantially more involved.

  • 2. Alkema, W. B. L.
    et al.
    Johansson, O.
    Lagergren, Jens
    KTH, Tidigare Institutioner, Numerisk analys och datalogi, NADA.
    Wasserman, W. W.
    MSCAN: identification of functional clusters of transcription factor binding sites2004Inngår i: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 32, s. W195-W198Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Identification of functional transcription factor binding sites in genomic sequences is notoriously difficult. The critical problem is the low specificity of predictions, which directly reflects the low target specificity of DNA binding proteins. To overcome the noise produced in predictions of individual binding sites, a new generation of algorithms achieves better predictive specificity by focusing on locally dense clusters of binding sites. MSCAN is a leading method for binding site cluster detection that determines the significance of observed sites while correcting for local compositional bias of sequences. The algorithm is highly flexible, applying any set of input binding models to the analysis of a user-specified sequence. From the user's perspective, a key feature of the system is that no reference data sets of regulatory sequences from co-regulated genes are required to train the algorithm. The output from MSCAN consists of an ordered list of sequence segments that contain potential regulatory modules. We have chosen the features in MSCAN such that sequence and matrix retrieval is highly facilitated, resulting in a web server that is intuitive to use. MSCAN is available at http://mscan.cgb.ki.se/cgi-bin/MSCAN.

  • 3.
    Andersson, Samuel A.
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Numerisk Analys och Datalogi, NADA.
    Lagergren, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Numerisk Analys och Datalogi, NADA.
    Motif Yggdrasil: Sampling from a tree mixture model2006Inngår i: Research In Computational Molecular Biology, Proceedings / [ed] Apostolico, A; Guerra, C; Istrail, S; Pevzner, P; Waterman, M, 2006, Vol. 3909, s. 458-472Konferansepaper (Fagfellevurdert)
    Abstract [en]

    In phylogenetic foot-printing, putative regulatory elements are found in upstream regions of orthologous genes by searching for common motifs. Motifs in different upstream sequences are subject to mutations along the edges of the corresponding phylogenetic tree, consequently taking advantage of the tree in the motif search is an appealing idea. We describe the Motif Yggdrasil sampler; the first Gibbs sampler based on a general tree that uses unaligned sequences. Previous tree-based Gibbs samplers have assumed a star-shaped tree or partially aligned upstream regions. We give a probabilistic model describing upstream sequences with regulatory elements and build a Gibbs sampler with respect to this model. We apply the collapsing technique to eliminate the need to sample nuisance parameters, and give a derivation of the predictive update formula. The use of the tree achieves a substantial increase in nucleotide level correlation coefficient both for synthetic data and 37 bacterial lexA genes.

  • 4. Andersson, Samuel A.
    et al.
    Lagergren, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    Motif Yggdrasil: Sampling sequence motifs from a tree mixture model2007Inngår i: Journal of Computational Biology, ISSN 1066-5277, E-ISSN 1557-8666, Vol. 14, nr 5, s. 682-697Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    In phylogenetic foot-printing, putative regulatory elements are found in upstream regions of orthologous genes by searching for common motifs. Motifs in different upstream sequences are subject to mutations along the edges of the corresponding phylogenetic tree, consequently taking advantage of the tree in the motif search is an appealing idea. We describe the Motif Yggdrasil sampler; the first Gibbs sampler based on a general tree that uses unaligned sequences. Previous tree-based Gibbs samplers have assumed a star-shaped tree or partially aligned upstream regions. We give a probabilistic model (MY model) describing upstream sequences with regulatory elements and build a Gibbs sampler with respect to this model. The model allows toggling, i.e., the restriction of a position to a subset of nucleotides, but does not require aligned sequences nor edge lengths, which may be difficult to come by. We apply the collapsing technique to eliminate the need to sample nuisance parameters, and give a derivation of the predictive update formula. We show that the MY model improves the modeling of difficult motif instances and that the use of the tree achieves a substantial increase in nucleotide level correlation coefficient both for synthetic data and 37 bacterial lexA genes. We investigate the sensitivity to errors in the tree and show that using random trees MY sampler still has a performance similar to the original version.

  • 5.
    Arvestad, Lars
    et al.
    Center for Genomics and Bioinformatics, Karolinska Institutet.
    Berglund, Ann-Charlotte
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    Lagergren, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    Sennblad, Bengt
    Center for Genomics and Bioinformatics, Karolinska Institutet.
    Bayesian gene/species tree reconciliation and orthology analysis using MCMC2003Inngår i: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 19, s. i7-i15Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Motivation: Comparative genomics in general and orthology analysis in particular are becoming increasingly important parts of gene function prediction. Previously, orthology analysis and reconciliation has been performed only with respect to the parsimony model. This discards many plausible solutions and sometimes precludes finding the correct one. In many other areas in bioinformatics probabilistic models have proven to be both more realistic and powerful than parsimony models. For instance, they allow for assessing solution reliability and consideration of alternative solutions in a uniform way. There is also an added benefit in making model assumptions explicit and therefore making model comparisons possible. For orthology analysis, uncertainty has recently been addressed using parsimonious reconciliation combined with bootstrap techniques. However, until now no probabilistic methods have been available.

    Results: We introduce a probabilistic gene evolution model based on a birth-death process in which a gene tree evolves ‘inside’ a species tree. Based on this model, we develop a tool with the capacity to perform practical orthology analysis, based on Fitch’s original definition, and more generally for reconciling pairs of gene and species trees. Our gene evolution model is biologically sound (Nei et al., 1997) and intuitively attractive. We develop a Bayesian analysis based on MCMC which facilitates approximation of an a posteriori distribution for reconciliations. That is, we can find the most probable reconciliations and estimate the probability of any reconciliation, given the observed gene tree. This also gives a way to estimate the probability that a pair of genes are orthologs. The main algorithmic contribution presented here consists of an algorithm for computing the likelihood of a given reconciliation. To the best of our knowledge, this is the first successful introduction of this type of probabilistic methods, which flourish in phylogeny analysis, into reconciliation and orthology analysis. The MCMC algorithm has been implemented and, although not yet being in its final form, tests show that it performs very well on synthetic as well as biological data. Using standard correspondences, our results carry over to allele trees as well as biogeography.

  • 6.
    Arvestad, Lars
    et al.
    KTH, Tidigare Institutioner, Numerisk analys och datalogi, NADA.
    Berglund, Ann-Charlotte
    Stockholm Bioinformatics Center, Dept. of Biochemistry, Stockholm University.
    Lagergren, Jens
    KTH, Tidigare Institutioner, Numerisk analys och datalogi, NADA.
    Sennblad, Bengt
    KTH, Tidigare Institutioner, Numerisk analys och datalogi, NADA.
    Gene tree reconstruction and orthology analysis based on an integrated model for duplications and sequence evolution.2004Inngår i: Proceedings of the Annual International Conference on Computational Molecular Biology, RECOM, 2004, s. 326-335Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Gene tree and species tree reconstruction, orthology analysis and reconciliation, are problems important in multigenome-based comparative genomics and biology in general. In the present paper, we advance the frontier of these areas in several respects and provide important computational tools. First, exact algorithms are given for several probabilistic reconciliation problems with respect to the probabilistic gene evolutionmodel, previously developed by the authors. Until now, those problems were solved by MCMC estimation algorithms. Second, we extend the gene evolution model to the genesequence evolution model, by including sequence evolution. Third, we develop MCMC algorithms for the gene sequence evolution model that, given gene sequence data allows: (1) orthology analysis, reconciliation analysis, and gene tree reconstruction, w.r.t. a species tree, that balances a likely/unlikely reconciliation and a likely/unlikely genetree and (2) species tree reconstruction that balance a likely /unlikely reconciliation and a likely/unlikely gene trees. These MCMC algorithms take advantage of the exact algorithms for the gene evolution model. We have successfully tested our dynamical programming algorithms on real data for a biogeography problem. The MCMC algorithms perform very well both on synthetic and biological data.

  • 7.
    Arvestad, Lars
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    Lagergren, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    Sennblad, Bengt
    The Gene Evolution Model and Computing Its Associated Probabilities2009Inngår i: Journal of the ACM, ISSN 0004-5411, E-ISSN 1557-735X, Vol. 56, nr 2Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Phylogeny is both a fundamental tool in biology and a rich source of fascinating modeling and algorithmic problems. Today's wealth of sequenced genomes makes it increasingly important to understand evolutionary events such as duplications, losses, transpositions, inversions, lateral transfers, and domain shuffling. We focus on the gene duplication event, that constitutes a major force in the creation of genes with new function [Ohno 1970; Lynch and Force 2000] and, thereby also, of biodiversity. We introduce the probabilistic gene evolution model, which describes how a gene tree evolves within a given species tree with respect to speciation, gene duplication, and gene loss. The actual relation between gene tree and species tree is captured by a reconciliation, a concept which we generalize for more expressiveness. The model is a canonical generalization of the classical linear birth-death process, obtained by replacing the interval where the process takes place by a tree. For the gene evolution model, we derive efficient algorithms for some associated probability distributions: the probability of a reconciled tree, the probability of a gene tree, the maximum probability reconciliation, the posterior probability of a reconciliation, and sampling reconciliations with respect to the posterior probability. These algorithms provides the basis for several applications, including species tree construction, reconciliation analysis, orthology analysis, biogeography, and host-parasite co-evolution.

  • 8. Beerenwinkel, N.
    et al.
    Greenman, C. D.
    Lagergren, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Computational Cancer Biology: An Evolutionary Perspective2016Inngår i: PloS Computational Biology, ISSN 1553-734X, E-ISSN 1553-7358, Vol. 12, nr 2, artikkel-id e1004717Artikkel i tidsskrift (Fagfellevurdert)
  • 9.
    Berglund, Emelie
    et al.
    KTH, Skolan för kemi, bioteknologi och hälsa (CBH), Genteknologi.
    Maaskola, Jonas
    KTH, Skolan för kemi, bioteknologi och hälsa (CBH), Genteknologi.
    Schultz, Niklas
    Friedrich, Stefanie
    Marklund, Maja
    KTH, Skolan för kemi, bioteknologi och hälsa (CBH), Genteknologi.
    Bergenstrahle, Joseph
    KTH, Skolan för kemi, bioteknologi och hälsa (CBH), Genteknologi.
    Tarish, Firas
    Tanoglidi, Anna
    Vickovic, Sanja
    KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Larsson, Ludvig
    KTH, Skolan för kemi, bioteknologi och hälsa (CBH), Genteknologi.
    Salmén, Fredrik
    KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Ogris, Christoph
    Wallenborg, Karolina
    Lagergren, Jens
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Beräkningsvetenskap och beräkningsteknik (CST).
    Ståhl, Patrik
    Sonnhammer, Erik
    Helleday, Thomas
    Lundeberg, Joakim
    KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Spatial maps of prostate cancer transcriptomes reveal an unexplored landscape of heterogeneity2018Inngår i: Nature Communications, ISSN 2041-1723, E-ISSN 2041-1723, Vol. 9, artikkel-id 2419Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Intra-tumor heterogeneity is one of the biggest challenges in cancer treatment today. Here we investigate tissue-wide gene expression heterogeneity throughout a multifocal prostate cancer using the spatial transcriptomics (ST) technology. Utilizing a novel approach for deconvolution, we analyze the transcriptomes of nearly 6750 tissue regions and extract distinct expression profiles for the different tissue components, such as stroma, normal and PIN glands, immune cells and cancer. We distinguish healthy and diseased areas and thereby provide insight into gene expression changes during the progression of prostate cancer. Compared to pathologist annotations, we delineate the extent of cancer foci more accurately, interestingly without link to histological changes. We identify gene expression gradients in stroma adjacent to tumor regions that allow for re-stratification of the tumor microenvironment. The establishment of these profiles is the first step towards an unbiased view of prostate cancer and can serve as a dictionary for future studies.

  • 10. Bryant, D.
    et al.
    Lagergren, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Numerisk Analys och Datalogi, NADA.
    Compatibility of unrooted phylogenetic trees is FPT2006Inngår i: Theoretical Computer Science, ISSN 0304-3975, E-ISSN 1879-2294, Vol. 351, nr 3, s. 296-302Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    A collection of T-1, T-2,..., T-k of unrooted, leaf labelled (phylogenetic) trees, all with different leaf sets, is said to be compatible if there exists a tree T such that each tree T-i can be obtained from T by deleting leaves and contracting edges. Determining compatibility is NP-hard, and the fastest algorithm to date has worst case complexity of around Omega(n(k)) time, n being the number of leaves. Here, we present an O(nf (k)) algorithm, proving that compatibility of unrooted phylogenetic trees is fixed parameter tractable (FPT) with respect to the number k of trees.

  • 11. Daniel, C.
    et al.
    Lagergren, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Öhman, M.
    RNA editing of non-coding RNA and its role in gene regulation2015Inngår i: Biochimie, ISSN 0300-9084, E-ISSN 1638-6183, Vol. 117, s. 22-27Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    It has for a long time been known that repetitive elements, particularly Alu sequences in human, are edited by the adenosine deaminases acting on RNA, ADAR, family. The functional interpretation of these events has been even more difficult than that of editing events in coding sequences, but today there is an emerging understanding of their downstream effects. A surprisingly large fraction of the human transcriptome contains inverted Alu repeats, often forming long double stranded structures in RNA transcripts, typically occurring in introns and UTRs of protein coding genes. Alu repeats are also common in other primates, and similar inverted repeats can frequently be found in non-primates, although the latter are less prone to duplex formation. In human, as many as 700,000 Alu elements have been identified as substrates for RNA editing, of which many are edited at several sites. In fact, recent advancements in transcriptome sequencing techniques and bioinformatics have revealed that the human editome comprises at least a hundred million adenosine to inosine (A-to-I) editing sites in Alu sequences. Although substantial additional efforts are required in order to map the editome, already present knowledge provides an excellent starting point for studying cis-regulation of editing. In this review, we will focus on editing of long stem loop structures in the human transcriptome and how it can effect gene expression.

  • 12. Ekdahl, Y.
    et al.
    Shahrabi Farahani, Hossein
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Behm, M.
    Lagergren, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Öhman, M.
    A-to-I editing of microRNAs in the mammalian brain increases during development2012Inngår i: Genome Research, ISSN 1088-9051, E-ISSN 1549-5469, Vol. 22, nr 8, s. 1477-1487Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Adenosine-to-inosine (A-to-I) RNA editing targets double-stranded RNA stem-loop structures in the mammalian brain. It has previously been shown that miRNAs are substrates for A-to-I editing. For the first time, we show that for several definitions of edited miRNA, the level of editing increases with development, thereby indicating a regulatory role for editing during brain maturation. We use high-throughput RNA sequencing to determine editing levels in mature miRNA, from the mouse transcriptome, and compare these with the levels of editing in pri-miRNA. We show that increased editing during development gradually changes the proportions of the two miR-376a isoforms, which previously have been shown to have different targets. Several other miRNAs that also are edited in the seed sequence show an increased level of editing through development. By comparing editing of pri-miRNA with editing and expression of the corresponding mature miRNA, we also show an editing-induced developmental regulation of miRNA expression. Taken together, our results imply that RNA editing influences the miRNA repertoire during brain maturation.

  • 13.
    Elias, Isaac
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Numerisk Analys och Datalogi, NADA.
    Lagergren, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Numerisk Analys och Datalogi, NADA.
    Fast Computation of Distance Estimators2007Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 8, s. 89-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Background: Some distance methods are among the most commonly used methods for reconstructing phylogenetic trees from sequence data. The input to a distance method is a distance matrix, containing estimated pairwise distances between all pairs of taxa. Distance methods themselves are often fast, e.g., the famous and popular Neighbor Joining (NJ) algorithm reconstructs a phylogeny of n taxa in time O(n3). Unfortunately, the fastest practical algorithms known for Computing the distance matrix, from n sequences of length l, takes time proportional to l·n2. Since the sequence length typically is much larger than the number of taxa, the distance estimation is the bottleneck in phylogeny reconstruction. This bottleneck is especially apparent in reconstruction of large phylogenies or in applications where many trees have to be reconstructed, e.g., bootstrapping and genome wide applications. Results: We give an advanced algorithm for Computing the number of mutational events between DNA sequences which is significantly faster than both Phylip and Paup. Moreover, we give a new method for estimating pairwise distances between sequences which contain ambiguity Symbols. This new method is shown to be more accurate as well as faster than earlier methods. Conclusion: Our novel algorithm for Computing distance estimators provides a valuable tool in phylogeny reconstruction. Since the running time of our distance estimation algorithm is comparable to that of most distance methods, the previous bottleneck is removed. All distance methods, such as NJ, require a distance matrix as input and, hence, our novel algorithm significantly improves the overall running time of all distance methods. In particular, we show for real world biological applications how the running time of phylogeny reconstruction using NJ is improved from a matter of hours to a matter of seconds.

  • 14.
    Elias, Isaac
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Numerisk Analys och Datalogi, NADA.
    Lagergren, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Numerisk Analys och Datalogi, NADA.
    Fast neighbor joining2005Inngår i: AUTOMATA, LANGUAGES AND PROGRAMMING, PROCEEDINGS / [ed] Caires, L; Italiano, GE; Monteiro, L; Palamidessi, C; Yung, M, 2005, Vol. 3580, s. 1263-1274Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Reconstructing the evolutionary history of a set of species is a fundamental problem in biology and methods for solving this problem are gaged based on two characteristics: accuracy and efficiency. Neighbor Joining (NJ) is a so-called distance-based method that, thanks to its good accuracy and speed, has been embraced by the phylogeny community. It takes the distances between n taxa and produces in Theta(n(3)) time a phylogenetic tree, i.e., a tree which aims to describe the evolutionary history of the taxa. In addition to performing well in practice, the NJ algorithm has optimal reconstruction radius. The contribution of this paper is twofold: (1) we present an algorithm called Fast Neighbor Joining (FNJ) with optimal reconstruction radius and optimal run time complexity O(n(2)) and (2) we present a greatly simplified proof for the correctness of NJ. Initial experiments show that FNJ in practice has almost the same accuracy as NJ, indicating that the property of optimal reconstruction radius has great importance to their good performance. Moreover, we show how improved running time can be achieved for computing the so-called correction formulas.

  • 15.
    Elias, Isaac
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Numerisk Analys och Datalogi, NADA.
    Lagergren, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Numerisk Analys och Datalogi, NADA.
    Fast Neighbor Joining2009Inngår i: Theoretical Computer Science, ISSN 0304-3975, E-ISSN 1879-2294, Vol. 410, nr 21-23, s. 1993-2000Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

     Reconstructing the evolutionary history of a set of species is a fundamental problem in biology and methods for solving this problem are gaged based on two characteristics: accuracy and efficiency. Neighbor joining (NJ) is a so-called distance-based method that, thanks to its good accuracy and speed, has been embraced by the phylogeny community. it takes the distances between n taxa and produces in Theta(n(3)) time a phylogenetic tree, i.e., a tree which aims to describe the evolutionary history of the taxa. In addition to performing well in practice, the NJ algorithm has optimal reconstruction radius.

    The contribution of this paper is twofold: (1) we present an algorithm called Fast Neighbor Joining(FNJ) with optimal reconstruction radius and optimal run time complexity O(n(2)) and (2) we present a greatly simplified proof for the correctness of NJ. initial experiments show that FNJ in practice has almost the same accuracy as NJ, indicating that the property of optimal reconstruction radius has great importance to their good performance. Moreover, we show how improved running time can be achieved for Computing the so-called correction formulas.

  • 16.
    Ensterö, Mats
    et al.
    Department of Molecular Biology and Functional Genomics, Stockholm University.
    Åkerborg, Örjan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    Lundin, Daniel
    Department of Molecular Biology and Functional Genomics, Stockholm University.
    Wang, Bei
    Department of Computer Science, Duke University, Durham, United States.
    Furey, Terrence S.
    Department of Computer Science, Duke University, Durham, United States.
    Öhman, Marie
    Department of Molecular Biology and Functional Genomics, Stockholm University.
    Lagergren, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    A computational screen for site selective A-to-I editing detects novel sites in neuron specific Hu proteins2010Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 11Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Background: Several bioinformatic approaches have previously been used to find novel sites of ADAR mediated A-to-I RNA editing in human. These studies have discovered thousands of genes that are hyper-edited in their non-coding intronic regions, especially in alu retrotransposable elements, but very few substrates that are site-selectively edited in coding regions. Known RNA edited substrates suggest, however, that site selective A-to-I editing is particularly important for normal brain development in mammals. Results: We have compiled a screen that enables the identification of new sites of site-selective editing, primarily in coding sequences. To avoid hyper-edited repeat regions, we applied our screen to the alu-free mouse genome. Focusing on the mouse also facilitated better experimental verification. To identify candidate sites of RNA editing, we first performed an explorative screen based on RNA structure and genomic sequence conservation. We further evaluated the results of the explorative screen by determining which transcripts were enriched for A-G mismatches between the genomic template and the expressed sequence since the editing product, inosine (I), is read as guanosine (G) by the translational machinery. For expressed sequences, we only considered coding regions to focus entirely on re-coding events. Lastly, we refined the results from the explorative screen using a novel scoring scheme based on characteristics for known A-to-I edited sites. The extent of editing in the final candidate genes was verified using total RNA from mouse brain and 454 sequencing. Conclusions: Using this method, we identified and confirmed efficient editing at one site in the Gabra3 gene. Editing was also verified at several other novel sites within candidates predicted to be edited. Five of these sites are situated in genes coding for the neuron-specific RNA binding proteins HuB and HuD.

  • 17. Fernandez-Baca, D.
    et al.
    Lagergren, Jens
    KTH, Tidigare Institutioner                               , Numerisk analys och datalogi, NADA.
    A polynomial-time algorithm for near-perfect phylogeny2003Inngår i: SIAM journal on computing (Print), ISSN 0097-5397, E-ISSN 1095-7111, Vol. 32, nr 5, s. 1115-1127Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    A parameterized version of the Steiner tree problem in phylogeny is defined, where the parameter measures the amount by which a phylogeny differs from perfection. This problem is shown to be solvable in polynomial time for any fixed value of the parameter.

  • 18.
    Fritzell, Kajsa
    et al.
    Stockholm Univ, Wenner Gren Inst, Dept Mol Biosci, Svante Arrheniusvag 20C, S-10691 Stockholm, Sweden..
    Xu, Li-Di
    Stockholm Univ, Wenner Gren Inst, Dept Mol Biosci, Svante Arrheniusvag 20C, S-10691 Stockholm, Sweden..
    Lagergren, Jens
    KTH, Centra, Science for Life Laboratory, SciLifeLab. KTH, Skolan för elektroteknik och datavetenskap (EECS).
    Ohman, Marie
    Stockholm Univ, Wenner Gren Inst, Dept Mol Biosci, Svante Arrheniusvag 20C, S-10691 Stockholm, Sweden..
    ADARs and editing: The role of A-to-I RNA modification in cancer progression2018Inngår i: Seminars in Cell and Developmental Biology, ISSN 1084-9521, E-ISSN 1096-3634, Vol. 79, s. 123-130Artikkel, forskningsoversikt (Fagfellevurdert)
    Abstract [en]

    Cancer arises when pathways that control cell functions such as proliferation and migration are dysregulated to such an extent that cells start to divide uncontrollably and eventually spread throughout the body, ultimately endangering the survival of an affected individual. It is well established that somatic mutations are important in cancer initiation and progression as well as in creation of tumor diversity. Now also modifications of the transcriptome are emerging as a significant force during the transition from normal cell to malignant tumor. Editing of adenosine (A) to inosine (I) in double-stranded RNA, catalyzed by adenosine deaminases acting on RNA (ADARs), is one dynamic modification that in a combinatorial manner can give rise to a very diverse transcriptome. Since the cell interprets inosine as guanosine (G), editing can result in non-synonymous codon changes in transcripts as well as yield alternative splicing, but also affect targeting and disrupt maturation of microRNA. ADAR editing is essential for survival in mammals but its dysregulation can lead to cancer. ADAR1 is for instance overexpressed in, e.g., lung cancer, liver cancer, esophageal cancer and chronic myoelogenous leukemia, which with few exceptions promotes cancer progression. In contrast, ADAR2 is lowly expressed in e.g. glioblastoma, where the lower levels of ADAR2 editing leads to malignant phenotypes. Altogether, RNA editing by the ADAR enzymes is a powerful regulatory mechanism during tumorigenesis. Depending on the cell type, cancer progression seems to mainly be induced by ADAR1 upregulation or ADAR2 downregulation, although in a few cases ADAR1 is instead downregulated. In this review, we discuss how aberrant editing of specific substrates contributes to malignancy. 

  • 19. Frånberg, Mattias
    et al.
    Gertow, Karl
    Hamsten, Anders
    Lagergren, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB. KTH, Centra, Science for Life Laboratory, SciLifeLab. KTH, Centra, SeRC - Swedish e-Science Research Centre.
    Sennblad, Bengt
    Discovering Genetic Interactions in Large-Scale Association Studies by Stage-wise Likelihood Ratio Tests2015Inngår i: PLoS Genetics, ISSN 1553-7390, E-ISSN 1553-7404, Vol. 11, nr 9, artikkel-id e1005502Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Despite the success of genome-wide association studies in medical genetics, the underlying genetics of many complex diseases remains enigmatic. One plausible reason for this could be the failure to account for the presence of genetic interactions in current analyses. Exhaustive investigations of interactions are typically infeasible because the vast number of possible interactions impose hard statistical and computational challenges. There is, therefore, a need for computationally efficient methods that build on models appropriately capturing interaction. We introduce a new methodology where we augment the interaction hypothesis with a set of simpler hypotheses that are tested, in order of their complexity, against a saturated alternative hypothesis representing interaction. This sequential testing provides an efficient way to reduce the number of non-interacting variant pairs before the final interaction test. We devise two different methods, one that relies on a priori estimated numbers of marginally associated variants to correct for multiple tests, and a second that does this adaptively. We show that our methodology in general has an improved statistical power in comparison to seven other methods, and, using the idea of closed testing, that it controls the family-wise error rate. We apply our methodology to genetic data from the PRO-CARDIS coronary artery disease case/control cohort and discover three distinct interactions. While analyses on simulated data suggest that the statistical power may suffice for an exhaustive search of all variant pairs in ideal cases, we explore strategies for a priori selecting subsets of variant pairs to test. Our new methodology facilitates identification of new disease-relevant interactions from existing and future genome-wide association data, which may involve genes with previously unknown association to the disease. Moreover, it enables construction of interaction networks that provide a systems biology view of complex diseases, serving as a basis for more comprehensive understanding of disease pathophysiology and its clinical consequences.

  • 20.
    Frånberg, Mattias
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS). KTH, Centra, Science for Life Laboratory, SciLifeLab. Cardiovascular Medicine Unit, Department of Medicine, Solna, Karolinska Institutet, Stockholm, Sweden ; Department of Numerical Analysis and Computer Science, Stockholm University, Stockholm, Sweden.
    Lagergren, Jens
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Beräkningsvetenskap och beräkningsteknik (CST). KTH, Centra, SeRC - Swedish e-Science Research Centre.
    Sennblad, Bengt
    Cardiovascular Medicine Unit, Department of Medicine, Solna, Karolinska Institutet, Stockholm, Sweden ; Dept of Cell and Molecular Biology, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Uppsala University, Uppsala, Sweden.
    BESIQ: A tool for discovering gene-gene and gene-environment interactions in genome-wide association studiesManuskript (preprint) (Annet vitenskapelig)
  • 21. Frånberg, Mattias
    et al.
    Strawbridge, Rona J.
    Hamsten, Anders
    de Faire, Ulf
    Lagergren, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Sennblad, Bengt
    Fast and general tests of genetic interaction for genome-wide association studies2017Inngår i: PloS Computational Biology, ISSN 1553-734X, E-ISSN 1553-7358, Vol. 13, nr 6, artikkel-id e1005556Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    A complex disease has, by definition, multiple genetic causes. In theory, these causes could be identified individually, but their identification will likely benefit from informed use of anticipated interactions between causes. In addition, characterizing and understanding interactions must be considered key to revealing the etiology of any complex disease. Large-scale collaborative efforts are now paving the way for comprehensive studies of interaction. As a consequence, there is a need for methods with a computational efficiency sufficient for modern data sets as well as for improvements of statistical accuracy and power. Another issue is that, currently, the relation between different methods for interaction inference is in many cases not transparent, complicating the comparison and interpretation of results between different interaction studies. In this paper we present computationally efficient tests of interaction for the complete family of generalized linear models (GLMs). The tests can be applied for inference of single or multiple interaction parameters, but we show, by simulation, that jointly testing the full set of interaction parameters yields superior power and control of false positive rate. Based on these tests we also describe how to combine results from multiple independent studies of interaction in a meta-analysis. We investigate the impact of several assumptions commonly made when modeling interactions. We also show that, across the important class of models with a full set of interaction parameters, jointly testing the interaction parameters yields identical results. Further, we apply our method to genetic data for cardiovascular disease. This allowed us to identify a putative interaction involved in Lp(a) plasma levels between two 'tag' variants in the LPA locus (p = 2.42 . 10(-09)) as well as replicate the interaction (p = 6.97 . 10(-07)). Finally, our meta-analysis method is used in a small (N = 16,181) study of interactions in myocardial infarction.

  • 22. Hallett, M.
    et al.
    Lagergren, Jens
    KTH, Tidigare Institutioner, Mikroelektronik och informationsteknik, IMIT.
    Tofigh, A.
    Simultaneous identification of duplications and lateral transfers2004Inngår i: Proceedings of the Annual International Conference on Computational Molecular Biology, RECOMB, 2004, s. 347-356Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper introduces a combinatorial model that incorporates duplication events as well as lateral gene transfer events (a.k.a. horizontal gene transfer events). To the best of our knowledge, this is the first such model containing both of these events. A so-called dt-scenario is used to explain differences between a gene tree T and species trees S. The model is biologically as well as mathematically sound. Among other biological considerations, the model respects the partial order of evolution implied by 5 by demanding that the dt-scenarios are "acyclic". We present fixed parameter tractable algorithms that count the minimum number of duplications and lateral transfers, and more generally can compute the set of pairs (t, d) where d is the minimum number of duplications required by any explanation that requires t lateral transfers. This allows us to also compute a weighted parsimony score. We also show how gene loss events can be incorporated into our model. We also give an NP-completeness proof which suggests that the intractability is due to the demand that the dt-scenarios be acyclic. When this condition is removed, we can show that the problem is computable in polynomial time via dynamic programming. By generating "synthetic" gene and species trees via a birth-death process, we explored the capacity of our algorithms to faithfully reconstruct the actual number of events taken place. The results are positive.

  • 23. Hjelm, M.
    et al.
    Hoglund, M.
    Lagergren, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Numerisk Analys och Datalogi, NADA.
    New Probabilistic network models and algorithms for oncogenesis2006Inngår i: Journal of Computational Biology, ISSN 1066-5277, E-ISSN 1557-8666, Vol. 13, nr 4, s. 853-865Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Chromosomal aberrations in solid tumors appear in complex patterns. It is important to understand how these patterns develop, the dynamics of the process, the temporal or even causal order between aberrations, and the involved pathways. Here we present network models for chromosomal aberrations and algorithms for training models based on observed data. Our models are generative probabilistic models that can be used to study dynamical aspects of chromosomal evolution in cancer cells. They are well suited for a graphical representation that conveys the pathways found in a dataset. By allowing only pairwise dependencies and partition aberrations into modules, in which all aberrations are restricted to have the same dependencies, we reduce the number of parameters so that datasets sizes relevant to cancer applications can be handled. We apply our framework to a dataset of colorectal cancer tumor karyotypes. The obtained model explains the data significantly better than a model where independence between the aberrations is assumed. In fact, the obtained model performs very well with respect to several measures of goodness of fit and is, with respect to repetition of the training, more or less unique.

  • 24.
    Håstad, Johan
    et al.
    KTH, Tidigare Institutioner                               , Numerisk analys och datalogi, NADA.
    Ivansson, L.
    Lagergren, Jens
    KTH, Tidigare Institutioner                               , Numerisk analys och datalogi, NADA.
    Fitting points on the real line and its application to RH mapping2003Inngår i: Journal of Algorithms, ISSN 0196-6774, E-ISSN 1090-2678, Vol. 49, nr 1, s. 42-62Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    A natural problem is that of, given an n x n symmetric matrix D, finding an arrangement of n points on the real line such that the so obtained distances agree as well as possible with the by D specified distances. We refer to the variation in which the difference in distance is measured in maximum norm as the MATRIX-TO-LINE problem. The MATRIX-TO-LINE problem has previously been shown to be NP-complete [J.B. Saxe, 17th Allerton Conference in Communication, Control, and Computing, 1979, pp. 480-489]. We show that it can be approximated within 2, but unless P = NP not within 7/5 - delta for any delta > 0. We also show a tight lower bound under a stronger assumption. We show that the MATRIX-TO-LINE problem cannot be approximated within 2 - delta unless 3-colorable graphs can be colored with [4/delta] colors in polynomial time. Currently, the best polynomial time algorithm colors a 3-colorable graph with (O) over tilde (n(3/14)) colors [A. Blum, D. Karger, Inform. Process. Lett. 61 (1), (1997), 49-53]. We apply our MATRIX-TO-LINE algorithm to a problem in computational biology, namely, the Radiation Hybrid (RH) problem. That is, the algorithmic part of a physical mapping method called RH mapping. This gives us the first algorithm with a guaranteed convergence for the general RH problem.

  • 25. Iglesias, Maria Jesus
    et al.
    Reilly, Sarah-Jayne
    Emanuelsson, Olof
    KTH, Skolan för bioteknologi (BIO), Genteknologi. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Sennblad, Bengt
    Najafabadi, Mohammad Pirmoradian
    KTH, Skolan för bioteknologi (BIO), Genteknologi. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Folkersen, Lasse
    Mälarstig, Anders
    Lagergren, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    Eriksson, Per
    Hamsten, Anders
    Odeberg, Jacob
    KTH, Skolan för bioteknologi (BIO), Proteomik (stängd 20130101). KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Combined Chromatin and Expression Analysis Reveals Specific Regulatory Mechanisms within Cytokine Genes in the Macrophage Early Immune Response2012Inngår i: PLoS ONE, ISSN 1932-6203, E-ISSN 1932-6203, Vol. 7, nr 2, s. e32306-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Macrophages play a critical role in innate immunity, and the expression of early response genes orchestrate much of the initial response of the immune system. Macrophages undergo extensive transcriptional reprogramming in response to inflammatory stimuli such as Lipopolysaccharide (LPS). To identify gene transcription regulation patterns involved in early innate immune responses, we used two genome-wide approaches - gene expression profiling and chromatin immunoprecipitation-sequencing (ChIP-seq) analysis. We examined the effect of 2 hrs LPS stimulation on early gene expression and its relation to chromatin remodeling (H3 acetylation; H3Ac) and promoter binding of Sp1 and RNA polymerase II phosphorylated at serine 5 (S5P RNAPII), which is a marker for transcriptional initiation. Our results indicate novel and alternative gene regulatory mechanisms for certain proinflammatory genes. We identified two groups of upregulated inflammatory genes with respect to chromatin modification and promoter features. One group, including highly up-regulated genes such as tumor necrosis factor (TNF), was characterized by H3Ac, high CpG content and lack of TATA boxes. The second group, containing inflammatory mediators (interleukins and CCL chemokines), was up-regulated upon LPS stimulation despite lacking H3Ac in their annotated promoters, which were low in CpG content but did contain TATA boxes. Genome-wide analysis showed that few H3Ac peaks were unique to either +/-LPS condition. However, within these, an unpacking/expansion of already existing H3Ac peaks was observed upon LPS stimulation. In contrast, a significant proportion of S5P RNAPII peaks (approx 40%) was unique to either condition. Furthermore, data indicated a large portion of previously unannotated TSSs, particularly in LPS-stimulated macrophages, where only 28% of unique S5P RNAPII peaks overlap annotated promoters. The regulation of the inflammatory response appears to occur in a very specific manner at the chromatin level for specific genes and this study highlights the level of fine-tuning that occurs in the immune response.

  • 26. Ivansson, L.
    et al.
    Lagergren, Jens
    KTH, Tidigare Institutioner                               , Numerisk analys och datalogi, NADA.
    Algorithms for RH mapping: New ideas and improved analysis2004Inngår i: SIAM journal on computing (Print), ISSN 0097-5397, E-ISSN 1095-7111, Vol. 34, nr 1, s. 89-108Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Radiation hybrid ( RH) mapping is a technique for constructing a physical map describing the locations of n markers on a chromosome of an organism. In [J. Comput. Biol., 4 (1997), pp. 517-533], Ben-Dor and Chor presented new algorithms for the RH problem and gave the first performance guarantees for such algorithms. We improve the lower bounds on the number of experiments in a way that is sufficient for two of these algorithms to give a correct ordering of the markers with high probability. Not only are the new bounds tighter, but our analysis also captures to a much higher extent how the bounds depend on the actual arrangement of the markers. Furthermore, we modify the two algorithms to utilize RH mapping data produced with several radiation intensities. We show that the new algorithms are almost insensitive to the problem of using the correct intensity.

  • 27.
    Khan, Mehmood Alam
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Elias, Isaac
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Sjölund, Erik
    Stockholms universitet.
    Nylander, Kristina
    KTH, Skolan för datavetenskap och kommunikation (CSC).
    Guimera, Roman Valls
    Stockholms univetsitet.
    Schobesberger, Richard
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB. University of Applied Sciences Upper Austria.
    Schmitzberger, Peter
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB. University of Applied Sciences Upper Austria.
    Lagergren, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Arvestad, Lars
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    fastphylo: Fast tools for phylogenetics2013Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 14, nr 1, s. 334-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Background: Distance methods are ubiquitous tools in phylogenetics. Their primary purpose may be to reconstruct evolutionary history, but they are also used as components in bioinformatic pipelines. However, poor computational efficiency has been a constraint on the applicability of distance methods on very large problem instances. Results: We present fastphylo, a software package containing implementations of efficient algorithms for two common problems in phylogenetics: estimating DNA/protein sequence distances and reconstructing a phylogeny from a distance matrix. We compare fastphylo with other neighbor joining based methods and report the results in terms of speed and memory efficiency. Conclusions: Fastphylo is a fast, memory efficient, and easy to use software suite. Due to its modular architecture, fastphylo is a flexible tool for many phylogenetic studies.

  • 28.
    Khan, Mehmood Alam
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC). KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Mahmudi, Owais
    KTH, Skolan för datavetenskap och kommunikation (CSC). KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Ulah, Ikram
    KTH, Skolan för datavetenskap och kommunikation (CSC). KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Arvestad, Lars
    KTH, Centra, Science for Life Laboratory, SciLifeLab. KTH, Centra, SeRC - Swedish e-Science Research Centre. Stockholm Univ, Sweden.
    Lagergren, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST). KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Probabilistic inference of lateral gene transfer events2016Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 17, artikkel-id 431Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Background: Lateral gene transfer (LGT) is an evolutionary process that has an important role in biology. It challenges the traditional binary tree-like evolution of species and is attracting increasing attention of the molecular biologists due to its involvement in antibiotic resistance. A number of attempts have been made to model LGT in the presence of gene duplication and loss, but reliably placing LGT events in the species tree has remained a challenge. Results: In this paper, we propose probabilistic methods that samples reconciliations of the gene tree with a dated species tree and computes maximum a posteriori probabilities. The MCMC-based method uses the probabilistic model DLTRS, that integrates LGT, gene duplication, gene loss, and sequence evolution under a relaxed molecular clock for substitution rates. We can estimate posterior distributions on gene trees and, in contrast to previous work, the actual placement of potential LGT, which can be used to, e.g., identify "highways" of LGT. Conclusions: Based on a simulation study, we conclude that the method is able to infer the true LGT events on gene tree and reconcile it to the correct edges on the species tree in most cases. Applied to two biological datasets, containing gene families from Cyanobacteria and Molicutes, we find potential LGTs highways that corroborate other studies as well as previously undetected examples.

  • 29.
    Khan, Mehmood
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB. KTH, Centra, SeRC - Swedish e-Science Research Centre. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Mahmudi, Owais
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB. KTH, Centra, SeRC - Swedish e-Science Research Centre. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Ulah, Ikram
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB. KTH, Centra, SeRC - Swedish e-Science Research Centre. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Arvestad, Lars
    Lagergren, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB. KTH, Centra, SeRC - Swedish e-Science Research Centre. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Probabilistic inference of lataral gene transfer eventsManuskript (preprint) (Annet vitenskapelig)
  • 30.
    Lagergren, Jens
    KTH, Tidigare Institutioner                               , Numerisk analys och datalogi, NADA.
    Combining polynomial running time and fast convergence for the disk-covering method2002Inngår i: Journal of computer and system sciences (Print), ISSN 0022-0000, E-ISSN 1090-2724, Vol. 65, nr 3, s. 481-493Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    This paper treats polynomial-time algorithms for reconstruction of phylogenetic trees. The disc-covering method (DCM) presented by Huson et al. (J. Comput. Biol. 6 (3/4) (1999) 369) is a method that boosts the performance of phylogenetic tree construction algorithms. Actually, they gave two variations of DCM-Buneman. The first variation was guaranteed to recover the true tree with high probability from polynomial-length sequences (i.e. polynomial in the number of given taxa), but it was not proven to run in polynomial time. The second variation was guaranteed to run in polynomial time. However, it is a heuristic in the sense that it was not proven to recover the true tree with high probability from polynomial-length sequences. We present an improved DCM. The difference between our improved DCM and the heuristic variation of the original DCM is marginal. The main contribution of this paper is the analysis of the algorithm. Our analysis shows that the improved DCM combines the desirable properties of the two variations of the original DCM. That is, it runs in polynomial time and it recovers the true tree with high probability from polynomial-length sequences. Moreover, this is true when the improved DCM is applied to the Neighbor-Joining, the Buneman, as well as the Agarwala algorithm. A key observation for the result of Huson et al. was that threshold graphs of additive distance functions are chordal. We prove a chordal graph theorem concerning minimal triangulations of threshold graphs constructed from distance functions which are close to being additive. This theorem is the key observation behind our improved DCM and it may be interesting in its own right.

  • 31.
    Mahmudi, Owais
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB. KTH, Centra, SeRC - Swedish e-Science Research Centre. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Sennblad, Bengt
    Arvestad, Lars
    Nowick, Katja
    Lagergren, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB. KTH, Centra, SeRC - Swedish e-Science Research Centre. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Gene-Pseudogene Evolution: a Probalitistic ApproachManuskript (preprint) (Annet vitenskapelig)
  • 32.
    Mahmudi, Owais
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB. KTH, Centra, SeRC - Swedish e-Science Research Centre. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Sjöstrand, Joel
    Sennblad, Bengt
    Lagergren, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB. KTH, Centra, SeRC - Swedish e-Science Research Centre. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Genome-wide probabilistic reconciliation analysis across vertebrates2013Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 14, s. S10-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Gene duplication is considered to be a major driving force in evolution that enables the genome of a species to acquire new functions. A reconciliation - a mapping of gene tree vertices to the edges or vertices of a species tree explains where gene duplications have occurred on the species tree. In this study, we sample reconciliations from a posterior over reconciliations, gene trees, edge lengths and other parameters, given a species tree and gene sequences. We employ a Bayesian analysis tool, based on the probabilistic model DLRS that integrates gene duplication, gene loss and sequence evolution under a relaxed molecular clock for substitution rates, to obtain this posterior. By applying these methods, we perform a genome-wide analysis of a nine species dataset, OPTIC, and conclude that for many gene families, the most parsimonious reconciliation (MPR) - a reconciliation that minimizes the number of duplications - is far from the correct explanation of the evolutionary history. For the given dataset, we observe that approximately 19% of the sampled reconciliations are different from MPR. This is in clear contrast with previous estimates, based on simpler models and less realistic assumptions, according to which 98% of the reconciliations can be expected to be identical to MPR. We also generate heatmaps showing where in the species trees duplications have been most frequent during the evolution of these species.

  • 33.
    Muhammad, Sayyed Auwn
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Sennblad, Bengt
    Lagergren, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Species tree aware simultaneous reconstruction of gene and domain evolutionManuskript (preprint) (Annet vitenskapelig)
    Abstract [en]

    Most genes are composed of multiple domains with a common evolutionary history that typically perform a specific function in the resulting protein. As witnessed by many studies of key gene families, it is important to understand how domains have been duplicated, lost, transferred between genes, and rearranged. Similarly to the case of evolutionary events affecting entire genes, these domain events have large consequences for phylogenetic reconstruction and, in addition, they create considerable obstacles for gene sequence alignment algorithms, a prerequisite for phylogenetic reconstruction.

    We introduce the Domain-DLRS model, a hierarchical, generative probabilistic model containing three levels corresponding to species, genes, and domains, respectively. From a dated species tree, a gene tree is generated according to the DL model, which is a birth-death model generalized to occur in a dated tree. Then, from the dated gene tree, a pre-specified number of dated domain trees are generated using the DL model and the molecular clock is relaxed, effectively converting edge times to edge lengths. Finally, for each domain tree and its lengths, domain sequences are generated for the leaves based on a selected model of sequence evolution.

    For this model, we present a MCMC based inference framework called Domain-DLRS that as input takes a dates species tree together with a multiple sequence alignment for each domain family, while it as output provids an estimated posterior distribution over reconciled gene and domain trees. By requiring aligned domains rather than genes, our framework evades the problem of aligning genes that have been exposed to domain duplications, in particular non-tandem domain duplications. We show that Domain-DLRS performs better than MrBayes on synthetic data and that it outperforms MrBayes on biological data. We analyse several zinc-finger genes and show that most domain duplications have been tandem duplications, of which some have involved two or more domains, but non-tandem duplications have also been common, in particular in gene families of complex evolutionary history such as PRDM9.

  • 34.
    Parviainen, Pekka
    et al.
    KTH, Centra, Science for Life Laboratory, SciLifeLab. KTH, Centra, SeRC - Swedish e-Science Research Centre.
    Farahani, H. S.
    Lagergren, Jens
    KTH, Centra, Science for Life Laboratory, SciLifeLab. KTH, Centra, SeRC - Swedish e-Science Research Centre.
    Learning bounded tree-width Bayesian networks using integer linear programming2014Inngår i: Journal of machine learning research, ISSN 1532-4435, E-ISSN 1533-7928, Vol. 33, s. 751-759Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    In many applications one wants to compute conditional probabilities given a Bayesian network. This inference problem is NP-hard in general but becomes tractable when the network has low tree-width. Since the inference problem is common in many application areas, we provide a practical algorithm for learning bounded tree-width Bayesian networks. We cast this problem as an integer linear program (ILP). The program can be solved by an anytime algorithm which provides upper bounds to assess the quality of the found solutions. A key component of our program is a novel integer linear formulation for bounding tree-width of a graph. Our tests clearly indicate that our approach works in practice, as our implementation was able to find an optimal or nearly optimal network for most of the data sets.

  • 35. Sehat, Bita
    et al.
    Tofigh, Ali
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    Lin, Yingbo
    Trocme, Eric
    Liljedahl, Ulrika
    Lagergren, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    Larsson, Olle
    SUMOylation Mediates the Nuclear Translocation and Signaling of the IGF-1 Receptor2010Inngår i: Science Signaling, ISSN 1945-0877, E-ISSN 1937-9145, Vol. 3, nr 108Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    The insulin-like growth factor 1 receptor (IGF-1R) plays crucial roles in developmental and cancer biology. Most of its biological effects have been ascribed to its tyrosine kinase activity, which propagates signaling through the phosphatidylinositol 3-kinase and mitogen-activated protein kinase pathways. Here, we report that IGF-1 promotes the modification of IGF-1R by small ubiquitin-like modifier protein-1 (SUMO-1) and its translocation to the nucleus. Nuclear IGF-1R associated with enhancer-like elements and increased transcription in reporter assays. The SUMOylation sites of IGF-1R were identified as three evolutionarily conserved lysine residues-Lys(1025), Lys(1100), and Lys(1120)-in the beta subunit of the receptor. Mutation of these SUMO-1 sites abolished the ability of IGF-1R to translocate to the nucleus and activate transcription, but did not alter its kinase-dependent signaling. Thus, we demonstrate a SUMOylation-mediated mechanism of IGF-1R signaling that has potential implications for gene regulation.

  • 36. Sennblad, Bengt
    et al.
    Lagergren, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    Probabilistic Orthology Analysis2009Inngår i: Systematic Biology, ISSN 1063-5157, E-ISSN 1076-836X, Vol. 58, nr 4, s. 411-424Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Orthology analysis aims at identifying orthologous genes and gene products from different organisms and, therefore, is a powerful tool in modern computational and experimental biology. Although reconciliation-based orthology methods are generally considered more accurate than distance-based ones, the traditional parsimony-based implementation of reconciliation-based orthology analysis (most parsimonious reconciliation [MPR]) suffers from a number of shortcomings. For example, 1) it is limited to orthology predictions from the reconciliation that minimizes the number of gene duplication and loss events, 2) it cannot evaluate the support of this reconciliation in relation to the other reconciliations, and 3) it cannot make use of prior knowledge (e.g., about species divergence times) that provides auxiliary information for orthology predictions. We present a probabilistic approach to reconciliation-based orthology analysis that addresses all these issues by estimating orthology probabilities. The method is based on the gene evolution model, an explicit evolutionary model for gene duplication and gene loss inside a species tree, that generalizes the standard birth-death process. We describe the probabilistic approach to orthology analysis using 2 experimental data sets and show that the use of orthology probabilities allows a more informative analysis than MPR and, in particular, that it is less sensitive to taxon sampling problems. We generalize these anecdotal observations and show, using data generated under biologically realistic conditions, that MPR give false orthology predictions at a substantial frequency. Last, we provide a new orthology prediction method that allows an orthology and paralogy classification with any chosen sensitivity/specificity combination from the spectra of achievable combinations. We conclude that probabilistic orthology analysis is a strong and more advanced alternative to traditional orthology analysis and that it provides a framework for sophisticated comparative studies of processes in genome evolution.

  • 37. Sennblad, Bengt
    et al.
    Schreil, Eva
    Berglund Sonnhammer, Ann-Charlotte
    Lagergren, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC).
    Arvestad, Lars
    KTH, Skolan för datavetenskap och kommunikation (CSC).
    primetv: a viewer for reconciled trees2007Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 8Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Background: Evolutionary processes, such as gene family evolution or parasite-host cospeciation, can often be viewed as a tree evolving inside another tree. Relating two given trees under such a constraint is known as reconciling them. Adequate software tools for generating illustrations of tree reconciliations are instrumental for presenting and communicating results and ideas regarding these phenomena. Available visualization tools have been limited to illustrations of the most parsimonious reconciliation. However, there exists a plethora of biologically relevant non-parsimonious reconciliations. Illustrations of these general reconciliations may not be achieved without manual editing. Results: We have developed a new reconciliation viewer, primetv. It is a simple and compact visualization program that is the first automatic tool for illustrating general tree reconciliations. It reads reconciled trees in an extended Newick format and outputs them as tree-within-tree illustrations in a range of graphic formats. Output attributes, such as colors and layout, can easily be adjusted by the user. To enhance the construction of input to primetv, two helper programs, readReconciliation and reconcile, accompany primetv. Detailed examples of all programs' usage are provided in the text. For the casual user a web-service provides a simple user interface to all programs. Conclusion: With primetv, the first visualization tool for general reconciliations, illustrations of trees-within-trees are easy to produce. Because it clarifies and accentuates an underlying structure in a reconciled tree, e. g., the impact of a species tree on a gene-family phylogeny, it will enhance scientific presentations as well as pedagogic illustrations in an educational setting. primetv is available at http://prime.sbc.su.se/primetv, both as a standalone command-line tool and as a web service. The software is distributed under the GNU General Public License.

  • 38.
    Shahrabi Farahani, Hossein
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    Lagergren, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    A structural EM algorithm for learning hidden variable oncogenetic networksManuskript (preprint) (Annet vitenskapelig)
  • 39.
    Shahrabi Farahani, Hossein
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    Lagergren, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    A Structural EM Algorithm for Learning Oncogenetic Networks by Reducing to MILPManuskript (preprint) (Annet vitenskapelig)
    Abstract [en]

    Data that is obtained from assaying cancer tumors is cross-sectional, i.e., it does not contain information about temporal order and patterns of accumulation of mutations in the tumors. Learning progression patterns of cancer is important for understanding the disease. Also, in a realistic model of cancer progression, the issue of experimental errors must be taken into account. Here, the experimental errors are modeled by introducing hidden variables. The well-know structural EM algorithm is used for learning Bayesian networks from incomplete data. The selection of parents in the E-step of this algorithm is usually performed using a greedy heuristics. Unfortunately, the E-step also involves making inference in the present Bayesian network, which is #P-complete. There are ecient algorithms for performing exact inference in bounded tree-width Bayesian networks. In order to use them, we developed an algorithm for learning bounded tree-width Bayesian networks [2]. In the E-step, we obtain a globally optimal solution over dependence structure of bounded treewidth and parameters. That is, we obtain a Global Structural EM algorithm for this problem. Finally, we test our algorithm both on synthetic data and cancer data from renal cell carcinoma and show that it also performs well in practice. 

  • 40.
    Shahrabi Farahani, Hossein
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Lagergren, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Learning oncogenetic networks by reducing to MILP2013Inngår i: PLoS ONE, ISSN 1932-6203, E-ISSN 1932-6203Artikkel i tidsskrift (Annet vitenskapelig)
  • 41.
    Shahrabi Farahani, Hossein
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Lagergren, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Learning Oncogenetic Networks by Reducing to Mixed Integer Linear Programming2013Inngår i: PLoS ONE, ISSN 1932-6203, E-ISSN 1932-6203, Vol. 8, nr 6, s. e65773-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Cancer can be a result of accumulation of different types of genetic mutations such as copy number aberrations. The data from tumors are cross-sectional and do not contain the temporal order of the genetic events. Finding the order in which the genetic events have occurred and progression pathways are of vital importance in understanding the disease. In order to model cancer progression, we propose Progression Networks, a special case of Bayesian networks, that are tailored to model disease progression. Progression networks have similarities with Conjunctive Bayesian Networks (CBNs) [1], a variation of Bayesian networks also proposed for modeling disease progression. We also describe a learning algorithm for learning Bayesian networks in general and progression networks in particular. We reduce the hard problem of learning the Bayesian and progression networks to Mixed Integer Linear Programming (MILP). MILP is a Non-deterministic Polynomial-time complete (NP-complete) problem for which very good heuristics exists. We tested our algorithm on synthetic and real cytogenetic data from renal cell carcinoma. We also compared our learned progression networks with the networks proposed in earlier publications. The software is available on the website https://bitbucket.org/farahani/diprog.

  • 42.
    Shahrabi Farahani, Hossein
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    Parviainen, Pekka
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    Lagergren, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    A Linear Programming Approach for Learning Bounded Treewidth Bayesian Networks2013Manuskript (preprint) (Annet vitenskapelig)
    Abstract [en]

    In many applications, one wants to compute conditional probabilities from a Bayesian network. This inference problem is NP-hard in general but becomes tractable when the network has bounded treewidth. Motivated by the needs of applications, we study learning bounded treewidth Bayesian networks. We formulate this problem as a mixed integer linear program (MILP) which can be solved by an anytime algorithm. 

  • 43.
    Sjöstrand, Joel
    et al.
    Department of Numerical Analysis and Computer Science, Stockholm University, Stockholm, Sweden .
    Arvestad, Lars
    Department of Numerical Analysis and Computer Science, Stockholm University, Stockholm, Sweden .
    Lagergren, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Sennblad, Bengt
    Department of Medicine, Karolinska Institutet, Atherosclerosis Research Unit, Stockholm, Sweden .
    GenPhyloData: realistic simulation of gene family evolution2013Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 14, nr 1, s. 209-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Background: PrIME-GenPhyloData is a suite of tools for creating realistic simulated phylogenetic trees, in particular for families of homologous genes. It supports generation of trees based on a birth-death process and-perhaps more interestingly-also supports generation of gene family trees guided by a known (synthetic or biological) species tree while accounting for events such as gene duplication, gene loss, and lateral gene transfer (LGT). The suite also supports a wide range of branch rate models enabling relaxation of the molecular clock. Result: Simulated data created with PrIME-GenPhyloData can be used for benchmarking phylogenetic approaches, or for characterizing models or model parameters with respect to biological data. Conclusion: The concept of tree-in-tree evolution can also be used to model, for instance, biogeography or host-parasite co-evolution.

  • 44.
    Sjöstrand, Joel
    et al.
    Dept. of Numerical Analysis and Computer Science, Stockholm University.
    Sennblad, Bengt
    Karolinska Institutet.
    Arvestad, Lars
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    Lagergren, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    DLRS: gene tree evolution in light of a species tree2012Inngår i: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 28, nr 22, s. 2994-2995Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    PrIME-DLRS (or colloquially: 'Delirious') is a phylogenetic software tool to simultaneously infer and reconcile a gene tree given a species tree. It accounts for duplication and loss events, a relaxed molecular clock and is intended for the study of homologous gene families, for example in a comparative genomics setting involving multiple species. PrIME-DLRS uses a Bayesian MCMC framework, where the input is a known species tree with divergence times and a multiple sequence alignment, and the output is a posterior distribution over gene trees and model parameters.

  • 45. Sjöstrand, Joel
    et al.
    Tofigh, Ali
    Daubin, Vincent
    Arvestad, Lars
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Sennblad, Bengt
    Lagergren, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    A Bayesian Method for Analyzing Lateral Gene Transfer2014Inngår i: Systematic Biology, ISSN 1063-5157, E-ISSN 1076-836X, Vol. 63, nr 3, s. 409-420Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Lateral gene transfer (LGT)uwhich transfers DNA between two non-vertically related individuals belonging to the same or different speciesuis recognized as a major force in prokaryotic evolution, and evidence of its impact on eukaryotic evolution is ever increasing. LGT has attracted much public attention for its potential to transfer pathogenic elements and antibiotic resistance in bacteria, and to transfer pesticide resistance from genetically modified crops to other plants. In a wider perspective, there is a growing body of studies highlighting the role of LGT in enabling organisms to occupy new niches or adapt to environmental changes. The challenge LGT poses to the standard tree-based conception of evolution is also being debated. Studies of LGT have, however, been severely limited by a lack of computational tools. The best currently available LGT algorithms are parsimony-based phylogenetic methods, which require a pre-computed gene tree and cannot choose between sometimes wildly differing most parsimonious solutions. Moreover, in many studies, simple heuristics are applied that can only handle putative orthologs and completely disregard gene duplications (GDs). Consequently, proposed LGT among specific gene families, and the rate of LGT in general, remain debated. We present a Bayesian Markov-chain Monte Carlo-based method that integrates GD, gene loss, LGT, and sequence evolution, and apply the method in a genome-wide analysis of two groups of bacteria: Mollicutes and Cyanobacteria. Our analyses show that although the LGT rate between distant species is high, the net combined rate of duplication and close-species LGT is on average higher. We also show that the common practice of disregarding reconcilability in gene tree inference overestimates the number of LGT and duplication events. [Bayesian; gene duplication; gene loss; horizontal gene transfer; lateral gene transfer; MCMC; phylogenetics.].

  • 46.
    Svensson, Örjan
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC).
    Arvestad, Lars
    KTH, Skolan för datavetenskap och kommunikation (CSC).
    Lagergren, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC).
    Genome-wide survey for biologically functional pseudogenes2006Inngår i: PloS Computational Biology, ISSN 1553-734X, E-ISSN 1553-7358, Vol. 2, nr 5, s. 358-369Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    According to current estimates there exist about 20,000 pseudogenes in a mammalian genome. The vast majority of these are disabled and nonfunctional copies of protein-coding genes which, therefore, evolve neutrally. Recent findings that a Makorin1 pseudogene, residing on mouse Chromosome 5, is, indeed, in vivo vital and also evolutionarily preserved, encouraged us to conduct a genome-wide survey for other functional pseudogenes in human, mouse, and chimpanzee. We identify to our knowledge the first examples of conserved pseudogenes common to human and mouse, originating from one duplication predating the human-mouse species split and having evolved as pseudogenes since the species split. Functionality is one possible way to explain the apparently contradictory properties of such pseudogene pairs, i. e., high conservation and ancient origin. The hypothesis of functionality is tested by comparing expression evidence and synteny of the candidates with proper test sets. The tests suggest potential biological function. Our candidate set includes a small set of long-lived pseudogenes whose unknown potential function is retained since before the human - mouse species split, and also a larger group of primate-specific ones found from human - chimpanzee searches. Two processed sequences are notable, their conservation since the human - mouse split being as high as most protein-coding genes; one is derived from the protein Ataxin 7- like 3 ( ATX7NL3), and one from the Spinocerebellar ataxia type 1 protein (ATX1). Our approach is comparative and can be applied to any pair of species. It is implemented by a semi-automated pipeline based on cross- species BLAST comparisons and maximum-likelihood phylogeny estimations. To separate pseudogenes from protein- coding genes, we use standard methods, utilizing in- frame disablements, as well as a probabilistic filter based on Ka/ Ks ratios.

  • 47. Tofigh, A.
    et al.
    Sjölund, E.
    Höglund, M.
    Lagergren, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB. KTH, Centra, SeRC - Swedish e-Science Research Centre. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    A global structural em algorithm for a model of cancer progression2011Inngår i: Adv. Neural Inf. Process. Syst.: Annu. Conf. Neural Inf. Process. Syst., NIPS, 2011Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Cancer has complex patterns of progression that include converging as well as diverging progressional pathways. Vogelstein's path model of colon cancer was a pioneering contribution to cancer research. Since then, several attempts have been made at obtaining mathematical models of cancer progression, devising learning algorithms, and applying these to cross-sectional data. Beerenwinkel et al. provided, what they coined, EM-like algorithms for Oncogenetic Trees (OTs) and mixtures of such. Given the small size of current and future data sets, it is important to minimize the number of parameters of a model. For this reason, we too focus on tree-based models and introduce Hidden-variable Oncogenetic Trees (HOTs). In contrast to OTs, HOTs allow for errors in the data and thereby provide more realistic modeling. We also design global structural EM algorithms for learning HOTs and mixtures of HOTs (HOT-mixtures). The algorithms are global in the sense that, during the M-step, they find a structure that yields a global maximum of the expected complete log-likelihood rather than merely one that improves it. The algorithm for single HOTs performs very well on reasonable-sized data sets, while that for HOT-mixtures requires data sets of sizes obtainable only with tomorrow's more cost-efficient technologies.

  • 48.
    Tofigh, Ali
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    Hallett, Mikael
    Lagergren, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    Simultaneous Identification of Duplications and Lateral Gene Transfers2011Inngår i: IEEE/ACM Transactions on Computational Biology & Bioinformatics, ISSN 1545-5963, E-ISSN 1557-9964, Vol. 8, nr 2, s. 517-535Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    The incongruency between a gene tree and a corresponding species tree can be attributed to evolutionary events such as gene duplication and gene loss. This paper describes a combinatorial model where a so-called DTL-scenario is used to explain the differences between a gene tree anda corresponding species tree taking into account gene duplications, gene losses, and lateral genetransfers (also known as horizontal gene transfers). The reasonable biological constraint that a lateralgene transfer may only occur between contemporary species leads to the notion of acyclic DTLscenarios.Parsimony methods are introduced by defining appropriate optimization problems. Weshow that finding most parsimonious acyclic DTL-scenarios is NP-complete. However, by droppingthe condition of acyclicity, the problem becomes tractable, and we provide a dynamic programmingalgorithm as well as a fixed-parameter-tractable algorithm for finding most parsimonious DTLscenarios.

  • 49.
    Tofigh, Ali
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    Lagergren, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    Inferring Duplications and Lateral Gene Transfers: An Algorithm for Parametric Tree ReconciliationManuskript (Annet vitenskapelig)
    Abstract [en]

    Prediction of the function of genes and their products is an increasingly important computational problem. The ability to correctly identify the historic relationship of homologous genes is essential for making accurate predictions.In 1970, Fitch made a distinction between paralogous and orthologous genes, its importance lying in the observation that genes are more likely to have similar functions when they have evolved from a common ancestral gene through speciation rather than duplication. Lateral gene transfer (LGT) is yet another important evolutionary event that creates copies of genes, and asour understanding of the importance and prevalence of LGT in evolution is deepening, there is a high demand for methods for detection of LGTs when reconstructing the evolutionary past of genes.

    In this paper, we present highly efficient and practical algorithms for treereconciliation that simultaneously consider both duplications and LGTs. Weallow costs to be associated with duplications and LGTs and develop methods for finding reconciliations of minimal total cost between species trees andgene trees. Moreover, we provide an efficient algorithm for parametric treereconciliation—a computational problem analogous to parametric sequencealignment. Experimental results on synthetic data indicate that our methodsare robust with high specificity and sensitivity.

  • 50.
    Tofigh, Ali
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    Sjölund, Erik
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    Höglund, Mattias
    Lagergren, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    A global structural EM algorithm for a model of cancer progression2011Manuskript (preprint) (Annet vitenskapelig)
    Abstract [en]

    Cancer has complex patterns of progression that include converging as well as diverging progressional pathways. Vogelstein’s path model of colon cancer was clearly a pioneering contribution to cancer research. Since then, several attempts have been made at obtaining mathematical models of cancer progression, devising training algorithms,and applying these to cross-sectional data.Beerenwinkel et al. provided, what they coined, EM-like algorithms for OncogeneticTrees (OTs) and mixtures of such. Given the small size of current and future datasets, it is important to minimize the number of parameters of a model. For this reason,also we focus on tree-based models and introduce Hidden-variable Oncogenetic Trees(HOTs). In contrast to OTs, HOTs allow for errors in the data and thereby provide more realistic modeling. We also design global structural EM algorithms for learning HOTs and mixtures of HOTs (HOT-mixtures). The algorithms are global in the sense that, during the M-step, they find a structure that yields a global maximum of the expected complete log-likelihood rather than merely one that improves it.The algorithm for single HOTs performs very well on reasonable-sized data sets,while that for HOT-mixtures requires data sets of sizes obtainable only with tomorrows more cost efficient technologies. To facilitate analysis of complex cytogenetic data sets requiring more than one HOT, we devise a decomposition strategy based on PrincipalComponent Analysis and train parameters on a colon cancer data set. The method so obtained is then successfully applied to kidney cancer.

12 1 - 50 of 55
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf