Change search
Refine search result
1 - 31 of 31
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Afkham, Heydar Maboudi
    et al.
    KTH, School of Computer Science and Communication (CSC).
    Qiu, Xuanbin
    KTH, School of Computer Science and Communication (CSC).
    The, Matthew
    KTH, School of Biotechnology (BIO), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Käll, Lukas
    KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Uncertainty estimation of predictions of peptides' chromatographic retention times in shotgun proteomics2017In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 33, no 4, p. 508-513Article in journal (Refereed)
    Abstract [en]

    Motivation: Liquid chromatography is frequently used as a means to reduce the complexity of peptide-mixtures in shotgun proteomics. For such systems, the time when a peptide is released from a chromatography column and registered in the mass spectrometer is referred to as the peptide's retention time. Using heuristics or machine learning techniques, previous studies have demonstrated that it is possible to predict the retention time of a peptide from its amino acid sequence. In this paper, we are applying Gaussian Process Regression to the feature representation of a previously described predictor ELUDE. Using this framework, we demonstrate that it is possible to estimate the uncertainty of the prediction made by the model. Here we show how this uncertainty relates to the actual error of the prediction. Results: In our experiments, we observe a strong correlation between the estimated uncertainty provided by Gaussian Process Regression and the actual prediction error. This relation provides us with new means for assessment of the predictions. We demonstrate how a subset of the peptides can be selected with lower prediction error compared to the whole set. We also demonstrate how such predicted standard deviations can be used for designing adaptive windowing strategies.

  • 2.
    Andersson, Anders
    et al.
    KTH, School of Biotechnology (BIO).
    Bernander, R.
    Department of Molecular Evolution, Evolutionary Biology Center, Uppsala University.
    Nilsson, Peter
    KTH, School of Biotechnology (BIO).
    Dual-genome primer design for construction of DNA microarrays2005In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 21, no 3, p. 325-332Article in journal (Refereed)
    Abstract [en]

    Motivation: Microarray experiments using probes covering a whole transcriptome are expensive to initiate, and a major part of the costs derives from synthesizing gene-specific PCR primers or hybridization probes. The high costs may force researchers to limit their studies to a single organism, although comparing gene expression in different species would yield valuable information. Results: We have developed a method, implemented in the software DualPrime, that reduces the number of primers required to amplify the genes of two different genomes. The software identifies regions of high sequence similarity, and from these regions selects PCR primers shared between the genomes, such that either one or, preferentially, both primers in a given PCR can be used for amplification from both genomes. To assure high microarray probe specificity, the software selects primer pairs that generate products of low sequence similarity to other genes within the same genome. We used the software to design PCR primers for 2182 and 1960 genes from the hyperthermophilic archaea Sulfolobus solfataricus and Sulfolobus acidocaldarius, respectively. Primer pairs were shared among 705 pairs of genes, and single primers were shared among 1184 pairs of genes, resulting in a saving of 31% compared to using only unique primers. We also present an alternative primer design method, in which each gene shares primers with two different genes of the other genome, enabling further savings.

  • 3.
    Anil, Anandashankar
    et al.
    KTH, School of Biotechnology (BIO). KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Spalinskas, Rapolas
    KTH, Centres, Science for Life Laboratory, SciLifeLab. KTH, School of Biotechnology (BIO).
    Åkerborg, Örjan
    KTH, School of Biotechnology (BIO). KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Sahlén, Pelin
    KTH, Centres, Science for Life Laboratory, SciLifeLab. KTH, School of Biotechnology (BIO).
    HiCapTools: a software suite for probe design and proximity detection for targeted chromosome conformation capture applications2018In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 34, no 4, p. 675-677Article in journal (Refereed)
    Abstract [en]

    Folding of eukaryotic genomes within nuclear space enables physical and functional contacts between regions that are otherwise kilobases away in sequence space. Targeted chromosome conformation capture methods (T2C, chi-C and HiCap) are capable of informing genomic contacts for a subset of regions targeted by probes. We here present HiCapTools, a software package that can design sequence capture probes for targeted chromosome capture applications and analyse sequencing output to detect proximities involving targeted fragments. Two probes are designed for each feature while avoiding repeat elements and non-unique regions. The data analysis suite processes alignment files to report genomic proximities for each feature at restriction fragment level and is isoform-aware for gene features. Statistical significance of contact frequencies is evaluated using an empirically derived background distribution. Targeted chromosome conformation capture applications are invaluable for locating target genes of disease-associated variants found by genome-wide association studies. Hence, we believe our software suite will prove to be useful for a wider user base within clinical and functional applications.

  • 4.
    Arvestad, Lars
    et al.
    Center for Genomics and Bioinformatics, Karolinska Institutet.
    Berglund, Ann-Charlotte
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
    Lagergren, Jens
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
    Sennblad, Bengt
    Center for Genomics and Bioinformatics, Karolinska Institutet.
    Bayesian gene/species tree reconciliation and orthology analysis using MCMC2003In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 19, p. i7-i15Article in journal (Refereed)
    Abstract [en]

    Motivation: Comparative genomics in general and orthology analysis in particular are becoming increasingly important parts of gene function prediction. Previously, orthology analysis and reconciliation has been performed only with respect to the parsimony model. This discards many plausible solutions and sometimes precludes finding the correct one. In many other areas in bioinformatics probabilistic models have proven to be both more realistic and powerful than parsimony models. For instance, they allow for assessing solution reliability and consideration of alternative solutions in a uniform way. There is also an added benefit in making model assumptions explicit and therefore making model comparisons possible. For orthology analysis, uncertainty has recently been addressed using parsimonious reconciliation combined with bootstrap techniques. However, until now no probabilistic methods have been available.

    Results: We introduce a probabilistic gene evolution model based on a birth-death process in which a gene tree evolves ‘inside’ a species tree. Based on this model, we develop a tool with the capacity to perform practical orthology analysis, based on Fitch’s original definition, and more generally for reconciling pairs of gene and species trees. Our gene evolution model is biologically sound (Nei et al., 1997) and intuitively attractive. We develop a Bayesian analysis based on MCMC which facilitates approximation of an a posteriori distribution for reconciliations. That is, we can find the most probable reconciliations and estimate the probability of any reconciliation, given the observed gene tree. This also gives a way to estimate the probability that a pair of genes are orthologs. The main algorithmic contribution presented here consists of an algorithm for computing the likelihood of a given reconciliation. To the best of our knowledge, this is the first successful introduction of this type of probabilistic methods, which flourish in phylogeny analysis, into reconciliation and orthology analysis. The MCMC algorithm has been implemented and, although not yet being in its final form, tests show that it performs very well on synthetic as well as biological data. Using standard correspondences, our results carry over to allele trees as well as biogeography.

  • 5.
    Bernhem, Kristoffer
    et al.
    KTH, School of Engineering Sciences (SCI), Applied Physics. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Brismar, Hjalmar
    KTH, School of Engineering Sciences (SCI), Applied Physics. KTH, Centres, Science for Life Laboratory, SciLifeLab. Karolinska Institutet, Sweden.
    SMLocalizer, a GPU accelerated ImageJ plugin for single molecule localization microscopy2018In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 34, no 1, p. 137-Article in journal (Refereed)
    Abstract [en]

    SMLocalizer combines the availability of ImageJ with the power of GPU processing for fast and accurate analysis of single molecule localization microscopy data. Analysis of 2D and 3D data in multiple channels is supported.

  • 6. Ewels, P.
    et al.
    Magnusson, M.
    Lundin, S.
    Käller, Max
    KTH, School of Biotechnology (BIO), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    MultiQC: Summarize analysis results for multiple tools and samples in a single report2016In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 32, no 19, p. 3047-3048Article in journal (Refereed)
    Abstract [en]

    Motivation: Fast and accurate quality control is essential for studies involving next-generation sequencing data. Whilst numerous tools exist to quantify QC metrics, there is no common approach to flexibly integrate these across tools and large sample sets. Assessing analysis results across an entire project can be time consuming and error prone; batch effects and outlier samples can easily be missed in the early stages of analysis. Results: We present MultiQC, a tool to create a single report visualising output from multiple tools across many samples, enabling global trends and biases to be quickly identified. MultiQC can plot data from many common bioinformatics tools and is built to allow easy extension and customization. Availability and implementation: MultiQC is available with an GNU GPLv3 license on GitHub, the Python Package Index and Bioconda. Documentation and example reports are available at http://multiqc.info.

  • 7.
    Holme, Petter
    et al.
    KTH, Superseded Departments, Numerical Analysis and Computer Science, NADA.
    Huss, Mikael
    KTH, Superseded Departments, Numerical Analysis and Computer Science, NADA.
    Jeong, H. W.
    Subnetwork hierarchies of biochemical pathways2003In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 19, no 4, p. 532-538Article in journal (Refereed)
    Abstract [en]

    Motivation: The vastness and complexity of the biochemical networks that have been mapped out by modern genomics calls for decomposition into subnetworks. Such networks can have inherent non-local features that require the global structure to be taken into account in the decomposition procedure. Furthermore, basic questions such as to what extent the network (graph theoretically) can be said to be built by distinct subnetworks are little studied. Results: We present a method to decompose biochemical networks into subnetworks based on the global geometry of the network. This method enables us to analyze the full hierarchical organization of biochemical networks and is applied to 43 organisms from the WIT database. Two types of biochemical networks are considered: metabolic networks and whole-cellular networks (also including for example information processes). Conceptual and quantitative ways of describing the hierarchical ordering are discussed. The general picture of the metabolic networks arising from our study is that of a few core-clusters centred around the most highly connected substances enclosed by other substances in outer shells, and a few other well-defined subnetworks.

  • 8. Järvstråt, Linnea
    et al.
    Johansson, Mikael
    KTH, School of Electrical Engineering (EES), Automatic Control.
    Gullberg, Urban
    Nilsson, Björn
    Ultranet: efficient solver for the sparse inverse covariance selection problem in gene network modeling2013In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 29, no 4, p. 511-512Article in journal (Refereed)
    Abstract [en]

    Graphical Gaussian models (GGMs) are a promising approach to identify gene regulatory networks. Such models can be robustly inferred by solving the sparse inverse covariance selection (SICS) problem. With the high dimensionality of genomics data, fast methods capable of solving large instances of SICS are needed. We developed a novel network modeling tool, Ultranet, that solves the SICS problem with significantly improved efficiency. Ultranet combines a range of mathematical and programmatical techniques, exploits the structure of the SICS problem and enables computation of genome-scale GGMs without compromising analytic accuracy.

  • 9. Kaushik, Swati
    et al.
    Nair, Anu G.
    KTH, School of Computer Science and Communication (CSC). Tata Institute of Fundamental Research, India.
    Mutt, Eshita
    Subramanian, Hari Prasanna
    Sowdhamini, Ramanathan
    Rapid and enhanced remote homology detection by cascading hidden Markov model searches in sequence space2016In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 32, no 3, p. 338-344Article in journal (Refereed)
    Abstract [en]

    Motivation: In the post-genomic era, automatic annotation of protein sequences using computational homology-based methods is highly desirable. However, often protein sequences diverge to an extent where detection of homology and automatic annotation transfer is not straightforward. Sophisticated approaches to detect such distant relationships are needed. We propose a new approach to identify deep evolutionary relationships of proteins to overcome shortcomings of the availablemethods. Results: We have developed a method to identify remote homologues more effectively from any protein sequence database by using several cascading events with Hidden Markov Models (C-HMM). We have implemented clustering of hits and profile generation of hit clusters to effectively reduce the computational timings of the cascaded sequence searches. Our C-HMM approach could cover 94, 83 and 40% coverage at family, superfamily and fold levels, respectively, when applied on diverse protein folds. We have compared C-HMM with various remote homology detection methods and discuss the trade-offs between coverage and false positives.

  • 10. Koski, Timo
    A dissimilarity matrix between protein atom classes based on Gaussian mixtures2002In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 18, no 9, p. 1257-1263Article in journal (Refereed)
    Abstract [en]

    Motivation: Previously, Rantanen et al. (2001; J. Mol. Biol., 313, 197-214) constructed a protein atom-ligand fragment interaction library embodying experimentally solved, high-resolution three-dimensional (3D) structural data from the Protein Data Bank (PDB). The spatial locations of protein atoms that surround ligand fragments were modeled with Gaussian mixture models, the parameters of which were estimated with the expectation-maximization (EM) algorithm. In the validation analysis of this library, there was strong indication that the protein atom classification, 24 classes, was too large and that a reduction in the classes would lead to improved predictions. Results: Here, a dissimilarity (distance) matrix that is suitable for comparison and fusion of 24 pre-defined protein atom classes has been derived. Jeffreys' distances between Gaussian mixture models are used as a basis to estimate dissimilarities between protein atom classes. The dissimilarity data are analyzed both with a hierarchical clustering method and independently by using multidimensional scaling analysis. The results provide additional insight into the relationships between different protein atom classes, giving us guidance on, for example, how to readjust protein atom classification and, thus, they will help us to improve protein-ligand interaction predictions.

  • 11.
    Käll, Lukas
    et al.
    Center for Genomics and Bioinformatics, Karolinska Institutet.
    Krogh, Anders
    Sonnhammer, Erik L. L.
    An HMM posterior decoder for sequence feature prediction that includes homology information2005In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 21, no Suppl.1, p. i251-i257Article in journal (Refereed)
    Abstract [en]

    Motivation: When predicting sequence features like transmembrane topology, signal peptides, coil-coil structures, protein secondary structure or genes, extra support can be gained from homologs. Results: We present here a general hidden Markov model (HMM) decoding algorithm that combines probabilities for sequence features of homologs by considering the average of the posterior label probability of each position in a global sequence alignment. The algorithm is an extension of the previously described 'optimal accuracy' decoder, allowing homology information to be used. It was benchmarked using an HMM for transmembrane topology and signal peptide prediction, Phobius. We found that the performance was substantially increased when incorporating information from homologs.

  • 12. Käll, Lukas
    et al.
    Storey, John D.
    Noble, William Stafford
    Non-parametric estimation of posterior error probabilities associated with peptides identified by tandem mass spectrometry2008In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 24, no 16, p. i42-i48Article in journal (Refereed)
    Abstract [en]

    Motivation: A mass spectrum produced via tandem mass spectrometry can be tentatively matched to a peptide sequence via database search. Here, we address the problem of assigning a posterior error probability (PEP) to a given peptide-spectrum match (PSM). This problem is considerably more difficult than the related problem of estimating the error rate associated with a large collection of PSMs. Existing methods for estimating PEPs rely on a parametric or semiparametric model of the underlying score distribution. Results: We demonstrate how to apply non-parametric logistic regression to this problem. The method makes no explicit assumptions about the form of the underlying score distribution; instead, the method relies upon decoy PSMs, produced by searching the spectra against a decoy sequence database, to provide a model of the null score distribution. We show that our non-parametric logistic regression method produces accurate PEP estimates for six different commonly used PSM score functions. In particular, the estimates produced by our method are comparable in accuracy to those of PeptideProphet, which uses a parametric or semiparametric model designed specifically to work with SEQUEST. The advantage of the non-parametric approach is applicability and robustness to new score functions and new types of data.

  • 13. Käll, Lukas
    et al.
    Storey, John D.
    Noble, William Stafford
    QVALITY: non-parametric estimation of q-values and posterior error probabilities2009In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 25, no 7, p. 964-966Article in journal (Refereed)
    Abstract [en]

    Qvality is a C++ program for estimating two types of standard statistical confidence measures: the q-value, which is an analog of the p-value that incorporates multiple testing correction, and the posterior error probability (PEP, also known as the local false discovery rate), which corresponds to the probability that a given observation is drawn from the null distribution. In computing q-values, qvality employs a standard bootstrap procedure to estimate the prior probability of a score being from the null distribution; for PEP estimation, qvality relies upon non-parametric logistic regression. Relative to other tools for estimating statistical confidence measures, qvality is unique in its ability to estimate both types of scores directly from a null distribution, without requiring the user to calculate p-values.

  • 14.
    Larsson, Per
    et al.
    Stockholm University.
    Skwark, Marcin
    Stockholms universitet.
    Wallner, Björn
    Linköpings universitet.
    Elofsson, Arne
    Stockholms universitet.
    Improved predictions by Pcons.net using multiple templates2011In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 27, no 3, p. 426-427Article in journal (Refereed)
    Abstract [en]

    Multiple templates can often be used to build more accurate homology models than models built from a single template. Here we introduce PconsM, an automated protocol that uses multiple templates to build protein models. PconsM has been among the top-performing methods in the recent CASP experiments and consistently perform better than the single template models used in Pcons. net. In particular for the easier targets with many alternative templates with a high degree of sequence identity, quality is readily improved with a few percentages over the highest ranked model built on a single template. PconsM is available as an additional pipeline within the Pcons. net protein structure prediction server.

  • 15. Maska, Martin
    et al.
    Ulman, Vladimir
    Svoboda, David
    Matula, Pavel
    Matula, Petr
    Ederra, Cristina
    Urbiola, Ainhoa
    Espana, Tomas
    Venkatesan, Subramanian
    Balak, Deepak M. W.
    Karas, Pavel
    Bolckova, Tereza
    Streitova, Marketa
    Carthel, Craig
    Coraluppi, Stefano
    Harder, Nathalie
    Rohr, Karl
    Magnusson, Klas E. G.
    KTH, School of Electrical Engineering (EES), Signal Processing. KTH, School of Electrical Engineering (EES), Centres, ACCESS Linnaeus Centre.
    Jaldén, Joakim
    KTH, School of Electrical Engineering (EES), Signal Processing. KTH, School of Electrical Engineering (EES), Centres, ACCESS Linnaeus Centre.
    Blau, Helen M.
    Dzyubachyk, Oleh
    Krizek, Pavel
    Hagen, Guy M.
    Pastor-Escuredo, David
    Jimenez-Carretero, Daniel
    Ledesma-Carbayo, Maria J.
    Munoz-Barrutia, Arrate
    Meijering, Erik
    Kozubek, Michal
    Ortiz-de-Solorzano, Carlos
    A benchmark for comparison of cell tracking algorithms2014In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 30, no 11, p. 1609-1617Article in journal (Refereed)
    Abstract [en]

    Motivation: Automatic tracking of cells in multidimensional time-lapse fluorescence microscopy is an important task in many biomedical applications. A novel framework for objective evaluation of cell tracking algorithms has been established under the auspices of the IEEE International Symposium on Biomedical Imaging 2013 Cell Tracking Challenge. In this article, we present the logistics, datasets, methods and results of the challenge and lay down the principles for future uses of this benchmark. Results: The main contributions of the challenge include the creation of a comprehensive video dataset repository and the definition of objective measures for comparison and ranking of the algorithms. With this benchmark, six algorithms covering a variety of segmentation and tracking paradigms have been compared and ranked based on their performance on both synthetic and real datasets. Given the diversity of the datasets, we do not declare a single winner of the challenge. Instead, we present and discuss the results for each individual dataset separately.

  • 16. Michel, Mirco
    et al.
    Skwark, Marcin J.
    Hurtado, David Menendez
    Ekeberg, Magnus
    KTH, School of Computer Science and Communication (CSC).
    Elofsson, Arne
    Predicting accurate contacts in thousands of Pfam domain families using PconsC32017In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 33, no 18, p. 2859-2866Article in journal (Refereed)
    Abstract [en]

    Motivation: A few years ago it was shown that by using a maximum entropy approach to describe couplings between columns in a multiple sequence alignment it is possible to significantly increase the accuracy of residue contact predictions. For very large protein families with more than 1000 effective sequences the accuracy is sufficient to produce accurate models of proteins as well as complexes. Today, for about half of all Pfam domain families no structure is known, but unfortunately most of these families have at most a few hundred members, i.e. are too small for such contact prediction methods. Results: To extend accurate contact predictions to the thousands of smaller protein families we present PconsC3, a fast and improved method for protein contact predictions that can be used for families with even 100 effective sequence members. PconsC3 outperforms direct coupling analysis (DCA) methods significantly independent on family size, secondary structure content, contact range, or the number of selected contacts. Availability and implementation: PconsC3 is available as a web server and downloadable version at http://c3.pcons.net. The downloadable version is free for all to use and licensed under the GNU General Public License, version 2. At this site contact predictions for most Pfam families are also available. We do estimate that more than 4000 contact maps for Pfam families of unknown structure have more than 50% of the top-ranked contacts predicted correctly. Contact: arne@bioinfo.se Supplementary information: Supplementary data are available at Bioinformatics online.

  • 17.
    Navarro, Jose Fernandez
    et al.
    KTH, School of Biotechnology (BIO), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Sjöstrand, Joel
    KTH, School of Biotechnology (BIO), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Salmén, Fredrik
    KTH, School of Biotechnology (BIO), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Lundeberg, Joakim
    KTH, School of Biotechnology (BIO), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Ståhl, Patrik L.
    KTH, School of Biotechnology (BIO), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab. Karolinska Institutet, Sverige.
    ST Pipeline: an automated pipeline for spatial mapping of unique transcripts2017In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 33, no 16, p. 2591-2593Article in journal (Refereed)
    Abstract [en]

    Motivation: In recent years we have witnessed an increase in novel RNA-seq based techniques for transcriptomics analysis. Spatial transcriptomics is a novel RNA-seq based technique that allows spatial mapping of transcripts in tissue sections. The spatial resolution adds an extra level of complexity, which requires the development of new tools and algorithms for efficient and accurate data processing. Results: Here we present a pipeline to automatically and efficiently process RNA-seq data obtained from spatial transcriptomics experiments to generate datasets for downstream analysis.

  • 18. Nilsson, Björn
    et al.
    Johansson, Mikael
    KTH, School of Electrical Engineering (EES), Automatic Control.
    Al-Shahrour, Fatima
    Carpenter, Anne E.
    Ebert, Benjamin L.
    Ultrasome: efficient aberration caller for copy number studies of ultra-high resolution2009In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 25, no 8, p. 1078-1079Article in journal (Refereed)
    Abstract [en]

    Motivation: Multimillion-probe microarrays allow detection of gains and losses of chromosomal material at unprecedented resolution. However, the data generated by these arrays are several-fold larger than data from earlier platforms, creating a need for efficient analysis tools that scale robustly with data size. 

    Results: We developed a new aberration caller, Ultrasome, that delineates genomic changes-of-interest with dramatically improved efficiency. Ultrasome shows near-linear computational complexity and processes latest generation copy number arrays about 10 000 times faster than standard methods with preserved analytic accuracy.

  • 19. Pons, Carles
    et al.
    Jimenez-Gonzalez, Daniel
    Gonzalez-Alvarez, Cecilia
    Servat, Harald
    Cabrera-Benitez, Daniel
    Aguilar, Xavier
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Fernandez-Recio, Juan
    Cell-Dock: high-performance protein-protein docking2012In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 28, no 18, p. 2394-2396Article in journal (Refereed)
    Abstract [en]

    The application of docking to large-scale experiments or the explicit treatment of protein flexibility are part of the new challenges in structural bioinformatics that will require large computer resources and more efficient algorithms. Highly optimized fast Fourier transform (FFT) approaches are broadly used in docking programs but their optimal code implementation leaves hardware acceleration as the only option to significantly reduce the computational cost of these tools. In this work we present Cell-Dock, an FFT-based docking algorithm adapted to the Cell BE processor. We show that Cell-Dock runs faster than FTDock with maximum speedups of above 200x, while achieving results of similar quality.

  • 20.
    Pronk, Sander
    et al.
    KTH, School of Engineering Sciences (SCI), Theoretical Physics, Theoretical & Computational Biophysics. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Pall, Szilard
    KTH, School of Engineering Sciences (SCI), Theoretical Physics, Theoretical & Computational Biophysics. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Schulz, Roland
    Larsson, Per
    Bjelkmar, Pär
    Apostolov, Rossen
    KTH, School of Engineering Sciences (SCI), Theoretical Physics, Theoretical & Computational Biophysics. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Shirts, Michael R.
    Smith, Jeremy C.
    Kasson, Peter M.
    van der Spoel, David
    Hess, Berk
    KTH, School of Engineering Sciences (SCI), Theoretical Physics, Theoretical & Computational Biophysics. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Lindahl, Erik
    KTH, School of Engineering Sciences (SCI), Theoretical Physics, Theoretical & Computational Biophysics. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit2013In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 29, no 7, p. 845-854Article in journal (Refereed)
    Abstract [en]

    Motivation: Molecular simulation has historically been a low-throughput technique, but faster computers and increasing amounts of genomic and structural data are changing this by enabling large-scale automated simulation of, for instance, many conformers or mutants of biomolecules with or without a range of ligands. At the same time, advances in performance and scaling now make it possible to model complex biomolecular interaction and function in a manner directly testable by experiment. These applications share a need for fast and efficient software that can be deployed on massive scale in clusters, web servers, distributed computing or cloud resources. Results: Here, we present a range of new simulation algorithms and features developed during the past 4 years, leading up to the GROMACS 4.5 software package. The software now automatically handles wide classes of biomolecules, such as proteins, nucleic acids and lipids, and comes with all commonly used force fields for these molecules built-in. GROMACS supports several implicit solvent models, as well as new free-energy algorithms, and the software now uses multithreading for efficient parallelization even on low-end systems, including windows-based workstations. Together with hand-tuned assembly kernels and state-of-the-art parallelization, this provides extremely high performance and cost efficiency for high-throughput as well as massively parallel simulations.

  • 21.
    Ray, Arjun
    et al.
    KTH, School of Engineering Sciences (SCI), Theoretical Physics, Theoretical & Computational Biophysics.
    Lindahl, Erik
    KTH, School of Engineering Sciences (SCI), Theoretical Physics, Theoretical & Computational Biophysics.
    Wallner, Björn
    Center for Biomembrane Research, Stockholm, Sweden.
    Model quality assessment for membrane proteins2010In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 26, no 24, p. 3067-3074Article in journal (Refereed)
    Abstract [en]

    Motivation: Learning-based model quality assessment programs have been quite successful at discriminating between high-and low-quality protein structures. Here, we show that it is possible to improve this performance significantly by restricting the learning space to a specific context, in this case membrane proteins. Since these are among the most important structures from a pharmaceutical point-of-view, it is particularly interesting to resolve local model quality for regions corresponding, e. g. to binding sites. Results: Our new ProQM method uses a support vector machine with a combination of general and membrane protein-specific features. For the transmembrane region, ProQM clearly outperforms all methods developed for generic proteins, and it does so while maintaining performance for extra-membrane domains; in this region it is only matched by ProQres. The predictor is shown to accurately predict quality both on the global and local level when applied to GPCR models, and clearly outperforms consensus-based scoring. Finally, the combination of ProQM and the Rosetta low-resolution energy function achieve a 7-fold enrichment in selection of near-native structural models, at very limited computational cost.

  • 22.
    Sahlin, Kristoffer
    et al.
    KTH, School of Computer Science and Communication (CSC). KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Chikhi, Rayan
    Arvestad, Lars
    Assembly scaffolding with PE-contaminated mate-pair libraries2016In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 32, no 13, p. 1925-1932Article in journal (Refereed)
    Abstract [en]

    Motivation: Scaffolding is often an essential step in a genome assembly process, in which contigs are ordered and oriented using read pairs from a combination of paired-end libraries and longer-range mate-pair libraries. Although a simple idea, scaffolding is unfortunately hard to get right in practice. One source of problems is so-called PE-contamination in mate-pair libraries, in which a non-negligible fraction of the read pairs get the wrong orientation and a much smaller insert size than what is expected. This contamination has been discussed before, in relation to integrated scaffolders, but solutions rely on the orientation being observable, e.g. by finding the junction adapter sequence in the reads. This is not always possible, making orientation and insert size of a read pair stochastic. To our knowledge, there is neither previous work on modeling PE-contamination, nor a study on the effect PE-contamination has on scaffolding quality. Results: We have addressed PE-contamination in an update to our scaffolder BESST. We formulate the problem as an integer linear program which is solved using an efficient heuristic. The new method shows significant improvement over both integrated and stand-alone scaffolders in our experiments. The impact of modeling PE-contamination is quantified by comparing with the previous BESST model. We also show how other scaffolders are vulnerable to PE-contaminated libraries, resulting in an increased number of misassemblies, more conservative scaffolding and inflated assembly sizes.

  • 23.
    Sahlin, Kristoffer
    et al.
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Street, Nathaniel
    Lundeberg, Joakim
    KTH, School of Biotechnology (BIO), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Arvestad, Lars
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Improved gap size estimation for scaffolding algorithms2012In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 28, no 17, p. 2215-2222Article in journal (Refereed)
    Abstract [en]

    Motivation: One of the important steps of genome assembly is scaffolding, in which contigs are linked using information from read-pairs. Scaffolding provides estimates about the order, relative orientation and distance between contigs. We have found that contig distance estimates are generally strongly biased and based on false assumptions. Since erroneous distance estimates can mislead in subsequent analysis, it is important to provide unbiased estimation of contig distance.Results: In this article, we show that state-of-the-art programs for scaffolding are using an incorrect model of gap size estimation. We discuss why current maximum likelihood estimators are biased and describe what different cases of bias we are facing. Furthermore, we provide a model for the distribution of reads that span a gap and derive the maximum likelihood equation for the gap length. We motivate why this estimate is sound and show empirically that it outperforms gap estimators in popular scaffolding programs. Our results have consequences both for scaffolding software, structural variation detection and for library insert-size estimation as is commonly performed by read aligners.

  • 24. Sastry, Anand
    et al.
    Monk, Jonathan
    Tegel, Hanna
    KTH, School of Biotechnology (BIO), Proteomics and Nanobiotechnology.
    Uhlén, Mathias
    KTH, School of Biotechnology (BIO), Proteomics and Nanobiotechnology. Technical University of Denmark - DTU.
    Pålsson, Bernhard O.
    Rockberg, Johan
    KTH, School of Biotechnology (BIO), Proteomics and Nanobiotechnology.
    Brunk, Elizabeth
    Machine learning in computational biology to accelerate high-throughput protein expression2017In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 33, no 16, p. 2487-2495Article in journal (Refereed)
    Abstract [en]

    Motivation: The Human Protein Atlas (HPA) enables the simultaneous characterization of thousands of proteins across various tissues to pinpoint their spatial location in the human body. This has been achieved through transcriptomics and high-throughput immunohistochemistry-based approaches, where over 40 000 unique human protein fragments have been expressed in E. coli. These datasets enable quantitative tracking of entire cellular proteomes and present new avenues for understanding molecular-level properties influencing expression and solubility. Results: Combining computational biology and machine learning identifies protein properties that hinder the HPA high-throughput antibody production pipeline. We predict protein expression and solubility with accuracies of 70% and 80%, respectively, based on a subset of key properties (aromaticity, hydropathy and isoelectric point). We guide the selection of protein fragments based on these characteristics to optimize high-throughput experimentation.

  • 25.
    Sjöstrand, Joel
    et al.
    Dept. of Numerical Analysis and Computer Science, Stockholm University.
    Sennblad, Bengt
    Karolinska Institutet.
    Arvestad, Lars
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
    Lagergren, Jens
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    DLRS: gene tree evolution in light of a species tree2012In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 28, no 22, p. 2994-2995Article in journal (Refereed)
    Abstract [en]

    PrIME-DLRS (or colloquially: 'Delirious') is a phylogenetic software tool to simultaneously infer and reconcile a gene tree given a species tree. It accounts for duplication and loss events, a relaxed molecular clock and is intended for the study of homologous gene families, for example in a comparative genomics setting involving multiple species. PrIME-DLRS uses a Bayesian MCMC framework, where the input is a known species tree with divergence times and a multiple sequence alignment, and the output is a posterior distribution over gene trees and model parameters.

  • 26. Sköld, Martin
    et al.
    Rydén, Tobias
    Lund University.
    Samuelsson, Viktoria
    Bratt, Charlotte
    Ekblad, Lars
    Olsson, Håkan
    Baldetorp, Bo
    Regression analysis and modelling of data acquisition for SELDI-TOF mass spectrometry2007In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 23, no 11, p. 1401-1409Article in journal (Refereed)
    Abstract [en]

    Motivation: Pre-processing of SELDI-TOF mass spectrometry data is currently performed on a largel y ad hoc basis. This makes comparison of results from independent analyses troublesome and does not provide a framework for distinguishing different sources of variation in data. Results: In this article, we consider the task of pooling a large number of single-shot spectra, a task commonly performed automatically by the instrument software. By viewing the underlying statistical problem as one of heteroscedastic linear regression, we provide a framework for introducing robust methods and for dealing with missing data resulting from a limited span of recordable intensity values provided by the instrument. Our framework provides an interpretation of currently used methods as a maximum-likelihood estimator and allows theoretical derivation of its variance. We observe that this variance depends crucially on the total number of ionic species, which can vary considerably between different pooled spectra. This variation in variance can potentially invalidate the results from naive methods of discrimination/classification and we outline appropriate data transformations. Introducing methods from robust statistics did not improve the standard errors of the pooled samples. Imputing missing values however-using the EM algorithm-had a notable effect on the result; for our data, the pooled height of peaks which were frequently truncated increased by up to 30%.

  • 27. Stjernqvist, Susann
    et al.
    Rydén, Tobias
    Lund University.
    Sköld, Martin
    Staaf, Johan
    Continuous-index hidden Markov modelling of array CGH copy number data2007In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 23, no 8, p. 1006-1014Article in journal (Refereed)
  • 28.
    Stranneheim, Henrik
    et al.
    KTH, School of Biotechnology (BIO), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Kaller, Max
    Allander, Tobias
    Andersson, Björn
    Arvestad, Lars
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
    Lundeberg, Joakim
    KTH, School of Biotechnology (BIO), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Classification of DNA sequences using Bloom filters2010In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 26, no 13, p. 1595-1600Article in journal (Refereed)
    Abstract [en]

    Motivation: New generation sequencing technologies producing increasingly complex datasets demand new efficient and specialized sequence analysis algorithms. Often, it is only the 'novel' sequences in a complex dataset that are of interest and the superfluous sequences need to be removed. Results: A novel algorithm, fast and accurate classification of sequences (FACSs), is introduced that can accurately and rapidly classify sequences as belonging or not belonging to a reference sequence. FACS was first optimized and validated using a synthetic metagenome dataset. An experimental metagenome dataset was then used to show that FACS achieves comparable accuracy as BLAT and SSAHA2 but is at least 21 times faster in classifying sequences.

  • 29.
    Unneberg, Per
    et al.
    KTH, School of Biotechnology (BIO).
    Strömberg, Michael
    KTH, School of Biotechnology (BIO).
    Sterky, Fredrik
    KTH, School of Biotechnology (BIO).
    SNP discovery using advanced algorithms and neural networks2005In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 21, no 10, p. 2528-2530Article in journal (Refereed)
    Abstract [en]

    Forage is an application which uses two neural networks for detecting single nucleotide polymorphisms (SNPs). Potential SNP candidates are identified in multiple alignments. Each candidate is then represented by a vector of features, which is classified as SNP or monomorphic by the networks. A validated dataset of SNPs was constructed from experimentally verified SNP data and used for network training and method evalutation.

  • 30.
    Wong, Kim
    et al.
    KTH, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Navarro, Jose Fernandez
    KTH, Centres, Science for Life Laboratory, SciLifeLab. KTH, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), Gene Technology.
    Bergenstrahle, Ludvig
    KTH, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab. Royal Inst Technol KTH, Sch Biotechnol, Div Gene Technol, Sci Life Lab, SE-10691 Solna, Sweden..
    Stahl, Patrik L.
    KTH, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Lundeberg, Joakim
    KTH, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    ST Spot Detector: a web-based application for automatic spot and tissue detection for spatial Transcriptomics image datasets2018In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 34, no 11, p. 1966-1968Article in journal (Refereed)
    Abstract [en]

    Motiviation: Spatial Transcriptomics (ST) is a method which combines high resolution tissue imaging with high troughput transcriptome sequencing data. This data must be aligned with the images for correct visualization, a process that involves several manual steps. Results: Here we present ST Spot Detector, a web tool that automates and facilitates this alignment through a user friendly interface.

  • 31.
    Zhang, Cheng
    East China University of Science and Technology, China; Chalmers University of Technology,Sweden.
    Logical transformation of genome-scale metabolic models for gene level applications and analysis2015In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 31, no 14, p. 2324-2331Article in journal (Refereed)
    Abstract [en]

    Motivation: In recent years, genome-scale metabolic models (GEMs) have played important roles in areas like systems biology and bioinformatics. However, because of the complexity of genereaction associations, GEMs often have limitations in gene level analysis and related applications. Hence, the existing methods were mainly focused on applications and analysis of reactions and metabolites. Results: Here, we propose a framework named logic transformation of model (LTM) that is able to simplify the gene-reaction associations and enables integration with other developed methods for gene level applications. We show that the transformed GEMs have increased reaction and metabolite number as well as degree of freedom in flux balance analysis, but the gene-reaction associations and the main features of flux distributions remain constant. In addition, we develop two methods, OptGeneKnock and FastGeneSL by combining LTM with previously developed reaction-based methods. We show that the FastGeneSL outperforms exhaustive search. Finally, we demonstrate the use of the developed methods in two different case studies. We could design fast genetic intervention strategies for targeted overproduction of biochemicals and identify double and triple synthetic lethal gene sets for inhibition of hepatocellular carcinoma tumor growth through the use of OptGeneKnock and FastGeneSL, respectively.

1 - 31 of 31
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf