Change search
Refine search result
1234 1 - 50 of 152
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1. Adhikari, P. R.
    et al.
    Upadhyaya, B. B.
    Meng, Chen
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
    Hollmén, J.
    Gene selection in time-series gene expression data2011In: 6th IAPR International Conference on Pattern Recognition in Bioinformatics, PRIB 2011, 2011, p. 145-156Conference paper (Refereed)
    Abstract [en]

    The dimensionality of biological data is often very high. Feature selection can be used to tackle the problem of high dimensionality. However, majority of the work in feature selection consists of supervised feature selection methods which require class labels. The problem further escalates when the data is time-series gene expression measurements that measure the effect of external stimuli on biological system. In this paper we propose an unsupervised method for gene selection from time-series gene expression data founded on statistical significance testing and swap randomization. We perform experiments with a publicly available mouse gene expression dataset and also a human gene expression dataset describing the exposure to asbestos. The results in both datasets show a considerable decrease in number of genes.

  • 2.
    Ali, Raja Hashim
    et al.
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Arvestad, Lars
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Burnin estimation and convergence assessment in Bayesian phylogenetic inferenceManuscript (preprint) (Other academic)
    Abstract [en]

     Convergence assessment and burnin estimation are central concepts in Markov chain Monte Carlo algorithms. Studies on eects, statistical properties, and comparisons between dierent convergence assessment methods have been conducted during the past few decades. However, not much work has been done on the eect of convergence diagnostic on posterior distribution of tree parameters and which method should be used by researchers in Bayesian phylogenetics inference. In this study, we propose and evaluate two novel burnin estimation methods that estimate burnin using all parameters jointly. We also consider some other popular convergence diagnostics, evaluate them in light of parallel chains and quantify the eect of burnin estimates from various convergence diagnostics on the posterior distribution of trees. We motivate the use of convergence diagnostics to assess convergence and estimate burnin in Bayesian phylogenetics inference and found out that it is better to employ convergence diagnostics rather than remove a xed percentage as burnin. We concluded that the last burnin estimator using eective sample size appears to estimate burnin better than all other convergence diagnostics.

  • 3.
    Ali, Raja Hashim
    et al.
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Bark, Mikael
    KTH, School of Information and Communication Technology (ICT).
    Miro, Jorge
    KTH, School of Information and Communication Technology (ICT).
    Muhammad, Sayyed Auwn
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Sjöstrand, Joel
    Stockholm University.
    Zubair, Syed Muhammad
    KTH, School of Electrical Engineering (EES), Communication Networks. University of Balochistan, Pakistan.
    Abbas, Raja Manzar
    Arvestad, Lars
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    VMCMC: a graphical and statistical analysis tool for Markov chain Monte Carlo tracesManuscript (preprint) (Other academic)
    Abstract [en]

    Motivation: MCMC-based methods are important for Bayesian inference of phylogeny and related parameters. Although being computationally expensive, MCMC yields estimates of posterior distributions that are useful for estimating parameter values and are easy to use in subsequent analysis. There are, however, sometimes practical diculties with MCMC, relating to convergence assessment and determining burn-in, especially in large-scale analyses. Currently, multiple software are required to perform, e.g., convergence, mixing and interactive exploration of both continuous and tree parameters.

    Results: We have written a software called VMCMC to simplify post-processing of MCMC traces with, for example, automatic burn-in estimation. VMCMC can also be used both as a GUI-based application, supporting interactive exploration, and as a command-line tool suitable for automated pipelines.

    Availability: VMCMC is available for Java SE 6+ under the New BSD License. Executable jar les, tutorial manual and source code can be downloaded from https://bitbucket.org/rhali/visualmcmc/.

  • 4.
    Ali, Raja Hashim
    et al.
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Muhammad, Sayyed Auwn
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Arvestad, Lars
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    GenFamClust: An accurate, synteny-aware and reliable homology inference algorithm2016In: BMC EVOLUTIONARY BIOLOGY, ISSN 1471-2148, Vol. 16Article in journal (Other academic)
    Abstract [en]

    Background: Homology inference is pivotal to evolutionary biology and is primarily based on significant sequence similarity, which, in general, is a good indicator of homology. Algorithms have also been designed to utilize conservation in gene order as an indication of homologous regions. We have developed GenFamClust, a method based on quantification of both gene order conservation and sequence similarity. Results: In this study, we validate GenFamClust by comparing it to well known homology inference algorithms on a synthetic dataset. We applied several popular clustering algorithms on homologs inferred by GenFamClust and other algorithms on a metazoan dataset and studied the outcomes. Accuracy, similarity, dependence, and other characteristics were investigated for gene families yielded by the clustering algorithms. GenFamClust was also applied to genes from a set of complete fungal genomes and gene families were inferred using clustering. The resulting gene families were compared with a manually curated gold standard of pillars from the Yeast Gene Order Browser. We found that the gene-order component of GenFamClust is simple, yet biologically realistic, and captures local synteny information for homologs. Conclusions: The study shows that GenFamClust is a more accurate, informed, and comprehensive pipeline to infer homologs and gene families than other commonly used homology and gene-family inference methods.

  • 5.
    Alneberg, Johannes
    KTH, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Bioinformatic Methods in Metagenomics2018Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Microbial organisms are a vital part of our global ecosystem. Yet, our knowledge of them is still lacking. Direct sequencing of microbial communities, i.e. metagenomics, have enabled detailed studies of these microscopic organisms by inspection of their DNA sequences without the need to culture them. Furthermore, the development of modern high- throughput sequencing technologies have made this approach more powerful and cost-effective. Taken together, this has shifted the field of microbiology from previously being centered around microscopy and culturing studies, to largely consist of computational analyses of DNA sequences. One such computational analysis which is the main focus of this thesis, aims at reconstruction of the complete DNA sequence of an organism, i.e. its genome, directly from short metagenomic sequences.

    This thesis consists of an introduction to the subject followed by five papers. Paper I describes a large metagenomic data resource spanning the Baltic Sea microbial communities. This dataset is complemented with a web-interface allowing researchers to easily extract and visualize detailed information. Paper II introduces a bioinformatic method which is able to reconstruct genomes from metagenomic data. This method, which is termed CONCOCT, is applied on Baltic Sea metagenomics data in Paper III and Paper V. This enabled the reconstruction of a large number of genomes. Analysis of these genomes in Paper III led to the proposal of, and evidence for, a global brackish microbiome. Paper IV presents a comparison between genomes reconstructed from metagenomes with single-cell sequenced genomes. This further validated the technique presented in Paper II as it was found to produce larger and more complete genomes than single-cell sequencing.

  • 6.
    Alneberg, Johannes
    et al.
    KTH, Centres, Science for Life Laboratory, SciLifeLab. KTH, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), Gene Technology.
    Bennke, Christin
    Leibniz Institute for Baltic Sea Research, Warnemünde, Germany.
    Beier, Sara
    Leibniz Institute for Baltic Sea Research, Warnemünde, Germany.
    Pinhassi, Jarone
    Centre for Ecology and Evolution in Microbial Model Systems, Linnaeus University, Kalmar, Sweden.
    Jürgens, Klaus
    Leibniz Institute for Baltic Sea Research, Warnemünde, Germany.
    Ekman, Martin
    Department of Ecology, Environment and Plant Sciences, Stockholm University Science for Life Laboratory, Solna, Sweden.
    Ininbergs, Karolina
    Department of Ecology, Environment and Plant Sciences, Stockholm University Science for Life Laboratory, Solna, Sweden.
    Labrenz, Matthias
    Leibniz Institute for Baltic Sea Research, Warnemünde, Germany.
    Andersson, Anders F.
    KTH, Centres, Science for Life Laboratory, SciLifeLab. KTH, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), Gene Technology.
    Recovering 2,032 Baltic Sea microbial genomes by optimized metagenomic binningManuscript (preprint) (Other academic)
    Abstract [en]

    Aquatic microorganism are key drivers of global biogeochemical cycles and form the basis of aquatic food webs. However, there is still much left to be learned about these organisms and their interaction within specific environments, such as the Baltic Sea. Crucial information for such an understanding can be found within the genome sequences of organisms within the microbial community.

    In this study, the previous set of Baltic Sea clusters, constructed by Hugert et al., is greatly expanded using a large set of metagenomic samples, spanning the environmental gradients of the Baltic Sea. In total, 124 samples were individually assembled and binned to obtain 2,032 Metagenome Assembled Genomes (MAGs), clustered into 353 prokaryotic and 14 eukaryotic species- level clusters. The prokaryotic genomes were widely distributed over the prokaryotic tree of life, representing 20 different phyla, while the eukaryotic genomes were mostly limited to the division of Chlorophyta. The large number of reconstructed genomes allowed us to identify key factors determining the quality of the genome reconstructions.

    The Baltic Sea is heavily influenced of human activities of which we might not see the full implications. The genomes reported within this study will greatly aid further studies in our strive for an understanding of the Baltic Sea microbial ecosystem.

  • 7.
    Alneberg, Johannes
    et al.
    KTH, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Karlsson, Christofer M.G.
    Centre for Ecology and Evolution in Microbial Model Systems, EEMiS, Linnaeus University, Barlastgatan 11, 391 82 Kalmar, Sweden.
    Divne, Anna-Maria
    Department of Cell and Molecular Biology, SciLifeLab, Uppsala University, Uppsala, Sweden .
    Bergin, Claudia
    Department of Cell and Molecular Biology, SciLifeLab, Uppsala University, Uppsala, Sweden .
    Homa, Felix
    Department of Cell and Molecular Biology, SciLifeLab, Uppsala University, Uppsala, Sweden .
    Lindh, Markus V.
    Centre for Ecology and Evolution in Microbial Model Systems, EEMiS, Linnaeus University, Barlastgatan 11, 391 82 Kalmar, Sweden.
    Hugerth, Luisa W.
    Karolinska Institutet, Science for Life Laboratory, Department of Molecular, Tumour and Cell Biology, Centre for Translational Microbiome Research, Solna, Sweden.
    Ettema, Thijs JG
    Department of Cell and Molecular Biology, SciLifeLab, Uppsala University, Uppsala, Sweden.
    Bertilsson, Stefan
    Department of Ecology and Genetics, Limnology and Science for Life Laboratory, Uppsala University, Uppsala, Sweden.
    Andersson, Anders F.
    KTH, Centres, Science for Life Laboratory, SciLifeLab. KTH, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), Gene Technology.
    Pinhassi, Jarone
    Centre for Ecology and Evolution in Microbial Model Systems, EEMiS, Linnaeus University, Barlastgatan 11, 391 82 Kalmar, Sweden.
    Genomes from uncultivated prokaryotes: a comparison of metagenome-assembled and single-amplified genomesManuscript (preprint) (Other academic)
    Abstract [en]

    Background: Prokaryotes dominate the biosphere and regulate biogeochemical processes essential to all life. Yet, our knowledge about their biology is for the most part limited to the minority that has been successfully cultured. Molecular techniques now allow for obtaining genome sequences of uncultivated prokaryotic taxa, facilitating in-depth analyses that may ultimately improve our understanding of these key organisms.

    Results: We compared results from two culture-independent strategies for recovering bacterial genomes: single-amplified genomes and metagenome-assembled genomes. Single-amplified genomes were obtained from samples collected at an offshore station in the Baltic Sea Proper and compared to previously obtained metagenome-assembled genomes from a time series at the same station. Among 16 single-amplified genomes analyzed, seven were found to match metagenome-assembled genomes, affiliated with a diverse set of taxa. Notably, genome pairs between the two approaches were nearly identical (>98.7% identity) across overlapping regions (30-80% of each genome). Within matching pairs, the single-amplified genomes were consistently smaller and less complete, whereas the genetic functional profiles were maintained. For the metagenome-assembled genomes, only on average 3.6% of the bases were estimated to be missing from the genomes due to wrongly binned contigs; the metagenome assembly was found to cause incompleteness to a higher degree than the binning procedure.

    Conclusions: The strong agreement between the single-amplified and metagenome-assembled genomes emphasizes that both methods generate accurate genome information from uncultivated bacteria. Importantly, this implies that the research questions and the available resources are allowed to determine the selection of genomics approach for microbiome studies.

  • 8.
    Alneberg, Johannes
    et al.
    KTH, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Sundh, John
    Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Solna, Sweden.
    Bennke, Christin
    Leibniz Institute for Baltic Sea Research, Warnemünde, Germany.
    Beier, Sara
    Leibniz Institute for Baltic Sea Research, Warnemünde, Germany.
    Lundin, Daniel
    Centre for Ecology and Evolution in Microbial Model Systems, Linnaeus University, Kalmar, Sweden.
    Hugerth, Luisa
    KTH Royal Institute of Technology, Science for Life Laboratory, School of Biotechnology, Stockholm, Sweden.
    Pinhassi, Jarone
    Centre for Ecology and Evolution in Microbial Model Systems, Linnaeus University, Kalmar, Sweden.
    Kisand, Veljo
    University of Tartu, Institute of Technology, Tartu, Estonia.
    Riemann, Lasse
    Section for Marine Biological Section, Department of Biology, University of Copenhagen, Helsingør, Denmark.
    Jürgens, Klaus
    Leibniz Institute for Baltic Sea Research, Warnemünde, Germany.
    Labrenz, Matthias
    Leibniz Institute for Baltic Sea Research, Warnemünde, Germany.
    Andersson, Anders F.
    KTH, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    BARM and BalticMicrobeDB, a reference metagenome and interface to meta-omic data for the Baltic SeaManuscript (preprint) (Other academic)
    Abstract [en]

    The Baltic Sea is one of the world’s largest brackish water bodies and is characterised by pronounced physicochemical gradients where microbes are the main biogeochemical catalysts. Meta-omic methods provide rich information on the composition of, and activities within microbial ecosystems, but are computationally heavy to perform. We here present the BAltic Sea Reference Metagenome (BARM), complete with annotated genes to facilitate further studies with much less computational effort. The assembly is constructed using 2.6 billion metagenomic reads from 81 water samples, spanning both spatial and temporal dimensions, and contains 6.8 million genes that have been annotated for function and taxonomy. The assembly is useful as a reference, facilitating taxonomic and functional annotation of additional samples by simply mapping their reads against the assembly. This capability is demonstrated by the successful mapping and annotation of 24 external samples. In addition, we present a public web interface, BalticMicrobeDB, for interactive exploratory analysis of the dataset.

  • 9.
    Alneberg, Johannes
    et al.
    KTH, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Sundh, John
    Stockholm Univ, Sci Life Lab, Dept Biochem & Biophys, S-17165 Solna, Sweden..
    Bennke, Christin
    Leibniz Inst Balt Sea Res Warnemunde, D-18119 Rostock, Germany..
    Beier, Sara
    Leibniz Inst Balt Sea Res Warnemunde, D-18119 Rostock, Germany..
    Lundin, Daniel
    Linnaeus Univ, Ctr Ecol & Evolut Microbial Model Syst, S-39182 Kalmar, Sweden..
    Hugerth, Luisa W.
    KTH, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab. KKarolinska Inst, Dept Mol Tumor & Cell Biol, Ctr Translat Microbiome Res, Sci Life Lab, S-17165 Solna, Sweden..
    Pinhassi, Jarone
    Linnaeus Univ, Ctr Ecol & Evolut Microbial Model Syst, S-39182 Kalmar, Sweden..
    Kisand, Veljo
    Univ Tartu, Inst Technol, EE-50411 Tartu, Estonia..
    Riemann, Lasse
    Univ Copenhagen, Sect Marine Biol Sect, Dept Biol, DK-3000 Helsingor, Denmark..
    Juergens, Klaus
    Leibniz Inst Balt Sea Res Warnemunde, D-18119 Rostock, Germany..
    Labrenz, Matthias
    Leibniz Inst Balt Sea Res Warnemunde, D-18119 Rostock, Germany..
    Andersson, Anders F.
    KTH, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    BARM and BalticMicrobeDB, a reference metagenome and interface to meta-omic data for the Baltic Sea2018In: Scientific Data, E-ISSN 2052-4463, Vol. 5, article id 180146Article in journal (Refereed)
    Abstract [en]

    The Baltic Sea is one of the world's largest brackish water bodies and is characterised by pronounced physicochemical gradients where microbes are the main biogeochemical catalysts. Meta-omic methods provide rich information on the composition of, and activities within, microbial ecosystems, but are computationally heavy to perform. We here present the Baltic Sea Reference Metagenome (BARM), complete with annotated genes to facilitate further studies with much less computational effort. The assembly is constructed using 2.6 billion metagenomic reads from 81 water samples, spanning both spatial and temporal dimensions, and contains 6.8 million genes that have been annotated for function and taxonomy. The assembly is useful as a reference, facilitating taxonomic and functional annotation of additional samples by simply mapping their reads against the assembly. This capability is demonstrated by the successful mapping and annotation of 24 external samples. In addition, we present a public web interface, BalticMicrobeDB, for interactive exploratory analysis of the dataset. [GRAPHICS] .

  • 10. Anders, Gerd
    et al.
    Mackowiak, Sebastian D
    Jens, Marvin
    Maaskola, Jonas
    Systems Biology of Gene Regulatory Elements, Berlin Institute for Medical Systems Biology, Max Delbrück Centre for Molecular Medicine, 13125 Berlin, Germany.
    Kuntzagk, Andreas
    Rajewsky, Nikolaus
    Landthaler, Markus
    Dieterich, Christoph
    doRiNA: a database of RNA interactions in post-transcriptional regulation.2012In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 40, no Database issueArticle in journal (Refereed)
    Abstract [en]

    In animals, RNA binding proteins (RBPs) and microRNAs (miRNAs) post-transcriptionally regulate the expression of virtually all genes by binding to RNA. Recent advances in experimental and computational methods facilitate transcriptome-wide mapping of these interactions. It is thought that the combinatorial action of RBPs and miRNAs on target mRNAs form a post-transcriptional regulatory code. We provide a database that supports the quest for deciphering this regulatory code. Within doRiNA, we are systematically curating, storing and integrating binding site data for RBPs and miRNAs. Users are free to take a target (mRNA) or regulator (RBP and/or miRNA) centric view on the data. We have implemented a database framework with short query response times for complex searches (e.g. asking for all targets of a particular combination of regulators). All search results can be browsed, inspected and analyzed in conjunction with a huge selection of other genome-wide data, because our database is directly linked to a local copy of the UCSC genome browser. At the time of writing, doRiNA encompasses RBP data for the human, mouse and worm genomes. For computational miRNA target site predictions, we provide an update of PicTar predictions.

  • 11.
    Andrade, Jorge
    et al.
    KTH, School of Biotechnology (BIO), Gene Technology.
    Andersen, Malin
    KTH, School of Biotechnology (BIO), Gene Technology.
    Sillén, Anna
    Graff, Caroline
    Odeberg, Jacob
    KTH, School of Biotechnology (BIO), Gene Technology.
    The use of grid computing to drive data-intensive genetic research2007In: European Journal of Human Genetics, ISSN 1018-4813, E-ISSN 1476-5438, Vol. 15, no 6, p. 694-702Article in journal (Refereed)
    Abstract [en]

    In genetics, with increasing data sizes and more advanced algorithms for mining complex data, a point is reached where increased computational capacity or alternative solutions becomes unavoidable. Most contemporary methods for linkage analysis are based on the Lander-Green hidden Markov model (HMM), which scales exponentially with the number of pedigree members. In whole genome linkage analysis, genotype simulations become prohibitively time consuming to perform on single computers. We have developed 'Grid-Allegro', a Grid aware implementation of the Allegro software, by which several thousands of genotype simulations can be performed in parallel in short time. With temporary installations of the Allegro executable and datasets on remote nodes at submission, the need of predefined Grid run-time environments is circumvented. We evaluated the performance, efficiency and scalability of this implementation in a genome scan on Swedish multiplex Alzheimer's disease families. We demonstrate that 'Grid-Allegro' allows for the full exploitation of the features available in Allegro for genome-wide linkage. The implementation of existing bioinformatics applications on Grids (Distributed Computing) represent a cost-effective alternative for addressing highly resource-demanding and data-intensive bioinformatics task, compared to acquiring and setting up clusters of computational hardware in house (Parallel Computing), a resource not available to most geneticists today.

  • 12.
    Apostolov, Rossen
    et al.
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Yonezawa, Yasushige
    Standley, Daron M
    Kikugawa, Gota
    Takano, Yu
    Nakamura, Haruki
    Membrane attachment facilitates ligand access to the active site in monoamine oxidase A2009In: Biochemistry, ISSN 0006-2960, E-ISSN 1520-4995, Vol. 48, no 25, p. 5864-5873Article in journal (Refereed)
    Abstract [en]

    Monoamine oxidase membrane enzymes are responsible for the catalytic breakdown of extra- and intracellular neurotransmitters and are targets for the development of central nervous system drugs. We analyzed the dynamics of rat MAOA by performing multiple independent molecular dynamics simulations of membrane-bound and membrane-free forms to clarify the relationship between the mechanics of the enzyme and its function, with particular emphasis on the significance of membrane attachment. Principal component analysis of the simulation trajectories as well as correlations in the fluctuations of the residues pointed to the existence of three domains that define the global dynamics of the protein. Interdomain anticorrelated movements in the membrane-bound system facilitated the relaxation of interactions between residues surrounding the substrate cavity and induced conformational changes which expanded the active site cavity and opened putative pathways for substrate uptake and product release. Such events were less pronounced in the membrane-free system due to differences in the nature of the dominant modes of motion. The presence of the lipid environment is suggested to assist in decoupling the interdomain motions, consistent with the observed reduction in enzyme activity under membrane-free conditions. Our results are also in accordance with mutational analysis which shows that modifications of interdomain hinge residues decrease the activity of rat MAOA in solution.

  • 13. Ardelius, John
    et al.
    Aurell, Erik
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
    Krishnamurthy, Supriya
    KTH, School of Information and Communication Technology (ICT).
    Clustering of solutions in hard satisfiability problems2007In: Journal of Statistical Mechanics: Theory and Experiment, ISSN 1742-5468, no 10, p. P10012-Article in journal (Refereed)
    Abstract [en]

    We study numerically the solution space structure of random 3-SAT problems close to the SAT/UNSAT transition. This is done by considering chains of satisfiability problems, where clauses are added sequentially to a problem instance. Using the overlap measure of similarity between different solutions found on the same problem instance, we examine geometrical changes as a function of α. In each chain, the overlap distribution is first smooth, but then develops a tiered structure, indicating that the solutions are found in well separated clusters. On chains of not too large instances, all remaining solutions are eventually observed to be found in only one small cluster before vanishing. This condensation transition point is estimated by finite size scaling to be αc = 4.26 with an apparent critical exponent of about 1.7. The average overlap value is also observed to increase with α up to the transition, indicating a reduction in solutions space size, in accordance with theoretical predictions. The solutions are generated by a local heuristic, ASAT, and compared to those found by the Survey Propagation algorithm up to αc.

  • 14.
    Aspeborg, Henrik
    et al.
    KTH, School of Biotechnology (BIO), Glycoscience.
    Coutinho, Pedro M.
    Wang, Yang
    KTH, School of Biotechnology (BIO), Glycoscience.
    Brumer, Harry
    KTH, School of Biotechnology (BIO), Glycoscience.
    Henrissat, Bernard
    Evolution, substrate specificity and subfamily classification of glycoside hydrolase family 5 (GH5)2012In: BMC Evolutionary Biology, ISSN 1471-2148, E-ISSN 1471-2148, Vol. 12, no 1, p. 186-Article in journal (Refereed)
    Abstract [en]

    Background: The large Glycoside Hydrolase family 5 (GH5) groups together a wide range of enzymes acting on beta-linked oligo- and polysaccharides, and glycoconjugates from a large spectrum of organisms. The long and complex evolution of this family of enzymes and its broad sequence diversity limits functional prediction. With the objective of improving the differentiation of enzyme specificities in a knowledge-based context, and to obtain new evolutionary insights, we present here a new, robust subfamily classification of family GH5. Results: About 80% of the current sequences were assigned into 51 subfamilies in a global analysis of all publicly available GH5 sequences and associated biochemical data. Examination of subfamilies with catalytically-active members revealed that one third are monospecific (containing a single enzyme activity), although new functions may be discovered with biochemical characterization in the future. Furthermore, twenty subfamilies presently have no characterization whatsoever and many others have only limited structural and biochemical data. Mapping of functional knowledge onto the GH5 phylogenetic tree revealed that the sequence space of this historical and industrially important family is far from well dispersed, highlighting targets in need of further study. The analysis also uncovered a number of GH5 proteins which have lost their catalytic machinery, indicating evolution towards novel functions. Conclusion: Overall, the subfamily division of GH5 provides an actively curated resource for large-scale protein sequence annotation for glycogenomics; the subfamily assignments are openly accessible via the Carbohydrate-Active Enzyme database at http://www.cazy.org/GH5.html.

  • 15.
    Bergman, Julia
    et al.
    Uppsala University.
    Botling, Johan
    Uppsala University.
    Fagerberg, Linn
    KTH, School of Biotechnology (BIO), Proteomics and Nanobiotechnology.
    M Hallström, Björn
    KTH, School of Biotechnology (BIO), Proteomics and Nanobiotechnology.
    Djureinovic, Dijana
    Uppsala University.
    Pontén, Fredrik
    Uppsala University.
    Mathias, Uhlén
    KTH, School of Biotechnology (BIO), Proteomics and Nanobiotechnology.
    The human adrenal gland proteome defined by transcriptomics and antibody-based profiling.2017In: Endocrinology, ISSN 0013-7227, E-ISSN 1945-7170, Vol. 158, no 2, p. 239-251Article in journal (Refereed)
    Abstract [en]

    The adrenal gland is a composite endocrine organ with vital functions that include the synthesis and release of glucocorticoids and catecholamines. To define the molecular landscape that underlies the specific functions of the adrenal gland, we combined a genome-wide transcriptomics approach using messenger RNA sequencing of human tissues with immunohistochemistry-based protein profiling on tissue microarrays. Approximately two-thirds of all putative protein coding genes were expressed in the adrenal gland, and the analysis identified 253 genes with an elevated pattern of expression in the adrenal gland, with only 37 genes showing a markedly greater expression level (more than fivefold) in the adrenal gland compared with 31 other normal human tissue types analyzed. The analyses allowed for an assessment of the relative expression levels for well-known proteins involved in adrenal gland function but also identified previously poorly characterized proteins in the adrenal cortex, such as the FERM (4.1 protein, ezrin, radixin, moesin) domain containing 5 and the nephroblastoma overexpressed (NOV) protein homolog. We have provided a global analysis of the adrenal gland transcriptome and proteome, with a comprehensive list of genes with elevated expression in the adrenal gland and spatial information with examples of protein expression patterns for corresponding proteins. These genes and proteins constitute important starting points for an improved understanding of the normal function and pathophysiology of the adrenal glands.

  • 16.
    Bernsel, Andreas
    et al.
    Stockholm University.
    Viklund, Håkan
    Stockholm University.
    Falk, Jenny
    Stockholm University.
    Lindahl, Erik
    Stockholm University.
    von Heijne, Gunnar
    Stockholm University.
    Elofsson, Arne
    Stockholm University.
    Prediction of membrane-protein topology from first principles2008In: Proceedings of the National Academy of Sciences of the United States of America, ISSN 0027-8424, E-ISSN 1091-6490, Vol. 105, no 20, p. 7177-7781Article in journal (Refereed)
    Abstract [en]

    The current best membrane-protein topology-prediction methods are typically based on sequence statistics and contain hundreds of parameters that are optimized on known topologies of membrane proteins. However, because the insertion of transmembrane helices into the membrane is the outcome of molecular interactions among protein, lipids and water, it should be possible to predict topology by methods based directly on physical data, as proposed >20 years ago by Kyte and Doolittle. Here, we present two simple topology-prediction methods using a recently published experimental scale of position-specific amino acid contributions to the free energy of membrane insertion that perform on a par with the current best statistics-based topology predictors. This result suggests that prediction of membrane-protein topology and structure directly from first principles is an attainable goal, given the recently improved understanding of peptide recognition by the translocon.

  • 17.
    Bertaccini, Edward J.
    et al.
    Stanford University.
    Lindahl, Erik
    Stockholm University.
    Sixma, Titia
    Netherlands Cancer Institute.
    Trudell, James R.
    Stanford University.
    Effect of cobratoxin binding on the normal mode vibration within acetylcholine binding protein2008In: Journal of chemical information and modeling, ISSN 1549-9596, Vol. 48, no 4, p. 855-860Article in journal (Refereed)
    Abstract [en]

    Recent crystal structures of the acetylcholine binding protein (AChBP) have revealed surprisingly small structural alterations upon ligand binding. Here we investigate the extent to which ligand binding may affect receptor dynamics. AChBP is a homologue of the extracellular component of ligand-gated ion channels (LGICs). We have previously used an elastic network normal-mode analysis to propose a gating mechanism for the LGICs and to suggest the effects of various ligands on such motions. However, the difficulties with elastic network methods lie in their inability to account for the modest effects of a small ligand or mutation on ion channel motion. Here, we report the successful application of an elastic network normal mode technique to measure the effects of large ligand binding on receptor dynamics. The present calculations demonstrate a clear alteration in the native symmetric motions of a protein due to the presence of large protein cobratoxin ligands. In particular, normal-mode analysis revealed that cobratoxin binding to this protein significantly dampened the axially symmetric motion of the AChBP that may be associated with channel gating in the full nAChR. The results suggest that alterations in receptor dynamics could be a general feature of ligand binding.

  • 18.
    Bertaccini, Edward J
    et al.
    Stanford University.
    Lindahl, Erik
    Stanford University.
    Titia, Sixma
    Netherlands Cancer Institute.
    Trudell, James R
    Stanford University.
    Toxin Binding Serves as an Initial Model for Studying the Effects of Anesthetics on Ion Channels2007Conference paper (Refereed)
    Abstract [en]

    Introduction: We have previously used molecular modeling techniques combined with experimental data to visualize a plausible model of an anesthetic binding site within a LGIC complex.We have also previously shown a computational mechanism by which these ion channels may open and close and postulated how this motion may be affected by the presence of anesthetics.2 The difficulties with these methods, however, lay in their inability to account for the modest effects of a separate anesthetic ligand or small mutation on ion channel motion. Here we show the successful application of an elastic network calculation on a homologue of the extracellular component of LGIC's, the acetycholine binding protein (AChBP), in the presence and absence of large cobratoxin ligands. These calculations demonstrate a clear alteration in the native symmetric motion of a protein due to the presence of multiple ligands, as may occur with anesthetics and muscle relaxants.

    Methods: Coordinates of the AChBP with (1YI5)3 and without (1I9B)4 cobratoxin were obtained from the Research Collaboratory for Structural Biology (RCSB). Hydrogens were added using DSViewer 5.0 (Accelrys, San Diego, CA). Normal mode analysis was performed using an all atom elastic network model developed by Lindahl. Root-mean-square deviations (RMSD) of each residue were produced from the application of the RMSD analysis utility within the GROMACS software suite to the coordinate trajectory output files. The RMSD data was then imported into Microsoft Excel for plotting and further comparison of protein backbone motions between the two different normal mode trajectories.

    Results: Normal mode analysis reveals that ligand binding to this protein alters its natural harmonic vibration. In this case, the axially symmetric motion of the AChBP, that may be associated with channel gating in the full nAChR, is highly dampened by the presence of bound cobratoxin. A large proportion of the kinetic energy within this mode seems to be absorbed by the cobratoxin, leaving the channel motion significantly decreased.

    Conclusions: This is among the first descriptions of the effect of bound ligand on large scale protein dynamics, especially as it relates to ion channel gating. This analysis was possible using an elastic network approximation due to the large protein nature of the cobratoxin ligand. For nonpeptide drugs such as anesthetics which contain far fewer atoms, using the effects of bound ligand on protein motion as additional criteria for future drug design may require a more robust molecular mechanics treatment of the ligand-receptor complex.

  • 19.
    Bertaccini, Edward J.
    et al.
    Stanford University.
    Trudell, James R.
    Stanford University.
    Lindahl, Erik
    Stockholm University.
    Normal-mode analysis of the glycine alpha1 receptor by three separate methods2007In: Journal of Chemical Information and Modeling, ISSN 1549-9596, Vol. 47, no 4, p. 1572-1579Article in journal (Refereed)
    Abstract [en]

    Predicting collective dynamics and structural changes in biological macromolecules is pivotal toward a better understanding of many biological processes. Limitations due to large system sizes and inaccessible time scales have prompted the development of alternative techniques for the calculation of such motions. In this work, we present the results of a normal-mode analysis technique based on molecular mechanics that enables the calculation of accurate force-field based vibrations of extremely large molecules and compare it with two elastic network approximate models. When applied to the glycine alpha1 receptor, all three normal-mode analysis algorithms demonstrate an "iris-like" gating motion. Such gating motions have implications for understanding the effects of anesthetic and other ligand binding sites and for the means of transducing agonist binding into ion channel opening. Unlike the more approximate methods, molecular mechanics based analyses can also reveal approximate vibrational frequencies. Such analyses may someday allow the use of protein dynamics elucidated via normal-mode calculations as additional endpoints for future drug design.

  • 20.
    Bertaccini, Edward J
    et al.
    Stanford University.
    Trudell, James R
    Stanford University.
    Lindahl, Erik
    Stockholm University.
    Understanding Effects of Anesthetics on Ligand-Gated Ion Channels (LGIC) in Lipid Membranes2008Conference paper (Refereed)
    Abstract [en]

    Introduction: We have previously used molecular modeling combined with experimental data to visualize a plausible model of an anesthetic binding site within a LGIC.1 We have also previously shown a computational mechanism by which these LGICs may gate and postulated how this motion may be affected by the presence of anesthetics.2 The initial difficulty with these calculations concerns the 26000 atoms present in the receptor and the computing capabilities required to perform vibrational analyses on such a large construct. Here we show the successful application of an elastic network calculation on our previously published model of a glycine alpha one receptor (GlyRa1), now suspended in a fully hydrated lipid bilayer. Despite the presence of over 100,000 atoms , these calculations continue to demonstrate a symmetric motion of the ion channel protein that is consistent with the gating motion demonstrated in previous in vacuo work by us and others. Methods: Coordinates of the GlyRa1 model were obtained from our previous work. A 100x100A lipid bilayer matrix was constructed from POPC and then hydrated on both surfaces with water molecules using the VMD 1.86 software package (NCSA, Urbana, Ill.). Discovery Studio 1.7 (Accelrys, San Diego, CA) molecular modeling software was used to insert our GlyRa1 model into the lipid bilayer such that the known interfacial residue GLY 221 was at the POPC-water interface. All waters within 3.8A of the protein were removed as were all lipid molecules within 2A of the protein. Hydrogens were added followed by energy minimization of the entire system to remove energetically unfavorable contacts. The system was subsequently further hydrated within the GROMACS software suite and subjected to further energy equilibration via molecular dynamics simulation with periodic boundary conditions. Subsequent normal mode analysis was performed using an all atom elastic network model developed by Lindahl which takes advantage of a sparse matrix implementation for computational efficiency. Results: Despite the large size of the system, the introduction of water and lipid did not grossly distort the overall gating motion of the glyRa1 noted in previous works. Normal mode analysis revealed that the GlyRa1 in a fully hydrated bilayer environment continues to demonstrate an iris-like gating motion as a low frequency, high amplitude natural harmonic vibration. Furthermore, the introduction of periodic boundary conditions allowed simultaneous harmonic vibrations of lipid in sync with the protein gating motion that are compatible with reasonable lipid bilayer perturbations. Conclusions: This is among the first description of a normal mode calculation describing large-scale protein dynamics and ion channel gating in the presence of a fully hydrated lipid bilayer complex. This analysis was only possible on such a large system due to the computational efficiencies of the elastic network approximation. This model will hopefully provide a more accurate means of introducing anesthetics and alcohols into protein and lipid bilayer systems and allow us to discern their effects on LGIC gating. 1Bertaccini EJ, Shapiro J, Brutlag DL, Trudell JR: J Chem Inf Model 2005; 45: 128-35; 2Bertaccini EJ, Trudell JR, Lindahl E:J Chem Inf Model 2007; 47: 1572-9.

  • 21.
    Bertaccini, Edward J.
    et al.
    Stanford University.
    Trudell, James R.
    Stanford University.
    Lindahl, Erik
    Stockholm University.
    Murail, Samuel
    Stockholm University.
    Anesthetic Binding Sites in a GlyRa1 Model Based on Open State Prokaryotic Ion Channel Templates2009In: Proceedings of the 2009 Annual Meeting of the American Society Anesthesiologists, 2009Conference paper (Refereed)
    Abstract [en]

    Introduction : Ligand-gated ion channels (LGICs) are thought to mediate a significant proportion of anesthetic effects. We built atomic level models of the glycine alpha one receptor (GlyRa1) to examine its interactions with anesthetics. We previously built models of a GlyRa1 based on a prokaryotic pentameric ion channel in the closed state from Erwinia Chrysanthemi (ELIC) (1-3). Here, we built a GlyRa1 model based on the open state structures of two new ion channels from the prokaryote Gloebacter violaceus (GLIC).(4-5) These new templates are relevant since anesthetics are thought to bind to and stabilize the open state of the GlyRa1. Methods : The 3D coordinates of two forms of GLIC (3EHZ.pdb and 3EAM.pdb) were obtained from the RCSB database. The sequence of the human GlyRa1 was obtained from the NCBI database. A BLAST sequence search was performed using the GLIC sequences. Among the best scored homologous human sequences were those of the GlyRa1. The template structures and the sequence of GlyRa1 were aligned with Discovery Studio 2.0.1 (Accelrys, San Diego, CA) and the Modeler module was used for assignment of coordinates for aligned amino acids, the construction of possible loops, and the initial refinement of amino acid sidechains. Results : The BLAST derived scores suggest a close homology between the LGICs, GLIC and ELIC. Subsequent CLUSTALW alignment of the GLIC and GlyRa1 sequences demonstrates reasonable sequence similarity. The model of the GlyRa1 is a homomer with pentameric symmetry about a central ion pore and shows significant transmembrane alpha helical and extracellular beta sheet content. Unlike our previous model based on the ELIC template, the current model based on the GLIC templates shows a continuously open pore with a partial restriction within the transmembrane region. Three of the residues notable for modulating anesthetic action are on transmembrane segments 1-3 (TM1-3) (ILE229, SER 267, ALA 288). They now line the intersubunit interface, in contrast to our previous models. However, residues from TM4 that are known to modulate a variety of anesthetic effects on this or homologous LGICs are present but could only indirectly influence an intersubunit anesthetic binding site. Normal mode analyses show an iris-like motion similar to previous results.Conclusions : A model of the GlyRa1 was constructed using homology modeling based on the GLIC templates. This model posits an intersubunit site for anesthetic binding that may communicate with the intrasubunit region of each TMD. 

  • 22.
    Bertaccini, Edward J
    et al.
    Stanford University.
    Wallner, Björn
    Stockholm University.
    Trudell, James R
    Stanford University.
    Lindahl, Erik
    Stockholm University.
    Modeling anesthetic binding sites within the glycine alpha one receptor based on prokaryotic ion channel templates: the problem with TM42010In: Journal of chemical information and modeling, ISSN 1549-9596, Vol. 50, no 12, p. 2248-2255Article in journal (Refereed)
    Abstract [en]

    Ligand-gated ion channels (LGICs) significantly modulate anesthetic effects. Their exact molecular structure remains unknown. This has led to ambiguity regarding the proper amino acid alignment within their 3D structure and, in turn, the location of any anesthetic binding sites. Current controversies suggest that such a site could be located in either an intra- or intersubunit locale within the transmembrane domain of the protein. Here, we built a model of the glycine alpha one receptor (GlyRa1) based on the open-state structures of two new high-resolution ion channel templates from the prokaryote, Gloebacter violaceus (GLIC). Sequence scoring suggests reasonable homology between GlyRa1 and GLIC. Three of the residues notable for modulating anesthetic action are on transmembrane segments 1-3 (TM1-3): (ILE229, SER 267, and ALA 288). They line an intersubunit interface, in contrast to previous models. However, residues from the fourth transmembrane domain (TM4) that are known to modulate a variety of anesthetic effects are quite distant from this putative anesthetic binding site. While this model can account for a large proportion of the physicochemical data regarding such proteins, it cannot readily account for the alterations on anesthetic effects that are due to mutations within TM4.

  • 23.
    Bertaccini, Edward
    et al.
    Stanford University.
    Trudell, James
    Stanford University.
    Lindahl, Erik
    Stockholm University.
    Successful Calculation of Glycine Receptor Gating Motion Via a Full Molecular Mechanics Force Field2006Conference paper (Refereed)
    Abstract [en]

    Introduction: Analyses of ligand-gated ion channel receptors (LGIC) have demonstrated that possible sites of anesthetic action exist within their transmembrane domains. We have previously used molecular modeling techniques combined with experimental data to visualize a plausible model of an anesthetic binding site within a LGIC complex.1 We have also previously shown an approximate computational mechanism, based on an elastic network model, by which these ion channels may open and close and postulated how this motion may be affected by the presence of anesthetics.2 The difficulties with these approximation methods, however, lay in their inability to account for the modest effects of a separate ligand or a small mutation on ion channel motion. Here we show the successful application of a formal molecular mechanics force field for the normal mode calculation of protein motions.Methods: Coordinates of the homomeric GABARa1 pentamer complex composed of both an extracellular ligand binding domain and a transmembrane domain came from our previous work.3 Using this structure as a template, we built a model of the glyRa1 homomer using the homology modeling tools within the InsightII 2005 software package (Accelrys, San Diego, CA). This model then underwent a series of restrained optimizations within the GROMACS modeling package using the OPLS force field and no distance cutoffs on electrostatic and van der Waals interactions. After final unrestrained optimization, normal mode analysis was performed with a sparse matrix implementation.Results: As we previously reported for the approximate elastic network technique2, analysis of the entire glyRa1 complex demonstrated a clear iris-like motion of the protein about the central axis of the ion pore as the first (highest amplitude-lowest frequency) normal mode. In this mode, the rotation of the ligand binding domain occurred in the opposite direction to that of the transmembrane domain, producing a “wringing” like motion of the entire protein complex as it traversed its gating cycle. However, unlike the elastic network calculation of normal modes, which could only report relative frequencies of vibration, the GROMACS-based normal mode analysis allows for the calculation of real vibrational frequencies on the order of 321 GHz or around 3.1 ps per cycle. Likewise, while elastic network calculations could be completed in a few hours, the GROMACS calculations took approximately a week to complete on a Dell Workstation with dual 3GHz Xeon processors and a 64 bit software implementation.Conclusions: Despite these proteins containing upwards of 26,000 atoms, our new methods have made it possible to derive normal modes via the full implementation of a formal force field calculation. Despite their length and markedly increased complexity, these calculations still demonstrate that the harmonic motion of LGIC complexes is consistent with the direction of channel opening and closing. Such calculations should now allow the elucidation of the subtle effects on ion channel motion that are due to anesthetic binding.

  • 24.
    Bertone, Paul
    et al.
    Yale University.
    Trifonov, Valery
    Yale University.
    Rozowsky, Joel S
    Yale University.
    Schubert, Falk
    Yale University.
    Emanuelsson, Olof
    Yale University.
    Karro, John
    Yale University.
    Kao, Ming-Yang
    Northwestern University.
    Snyder, Michael
    Yale University.
    Gerstein, Mark
    Yale University.
    Design optimization methods for genomic DNA tiling arrays.2006In: Genome Research, ISSN 1088-9051, E-ISSN 1549-5469, Vol. 16, no 2, p. 271-281Article in journal (Refereed)
    Abstract [en]

    A recent development in microarray research entails the unbiased coverage, or tiling, of genomic DNA for the large-scale identification of transcribed sequences and regulatory elements. A central issue in designing tiling arrays is that of arriving at a single-copy tile path, as significant sequence cross-hybridization can result from the presence of non-unique probes on the array. Due to the fragmentation of genomic DNA caused by the widespread distribution of repetitive elements, the problem of obtaining adequate sequence coverage increases with the sizes of subsequence tiles that are to be included in the design. This becomes increasingly problematic when considering complex eukaryotic genomes that contain many thousands of interspersed repeats. The general problem of sequence tiling can be framed as finding an optimal partitioning of non-repetitive subsequences over a prescribed range of tile sizes, on a DNA sequence comprising repetitive and non-repetitive regions. Exact solutions to the tiling problem become computationally infeasible when applied to large genomes, but successive optimizations are developed that allow their practical implementation. These include an efficient method for determining the degree of similarity of many oligonucleotide sequences over large genomes, and two algorithms for finding an optimal tile path composed of longer sequence tiles. The first algorithm, a dynamic programming approach, finds an optimal tiling in linear time and space; the second applies a heuristic search to reduce the space complexity to a constant requirement. A Web resource has also been developed, accessible at http://tiling.gersteinlab.org, to generate optimal tile paths from user-provided DNA sequences.

  • 25.
    Bjelkmar, Pär
    et al.
    Stockholm University.
    Niemelä, Perttu S
    Helsinki University of Technology.
    Vattulainen, Ilpo
    Helsinki University of Technology.
    Lindahl, Erik
    Stockholm University.
    Conformational changes and slow dynamics through microsecond polarized atomistic molecular simulation of an integral Kv1.2 ion channel2009In: PloS Computational Biology, ISSN 1553-734X, E-ISSN 1553-7358, Vol. 5, no 2, p. e1000289-Article in journal (Refereed)
    Abstract [en]

    Structure and dynamics of voltage-gated ion channels, in particular the motion of the S4 helix, is a highly interesting and hotly debated topic in current membrane protein research. It has critical implications for insertion and stabilization of membrane proteins as well as for finding how transitions occur in membrane proteins-not to mention numerous applications in drug design. Here, we present a full 1 micros atomic-detail molecular dynamics simulation of an integral Kv1.2 ion channel, comprising 120,000 atoms. By applying 0.052 V/nm of hyperpolarization, we observe structural rearrangements, including up to 120 degrees rotation of the S4 segment, changes in hydrogen-bonding patterns, but only low amounts of translation. A smaller rotation ( approximately 35 degrees ) of the extracellular end of all S4 segments is present also in a reference 0.5 micros simulation without applied field, which indicates that the crystal structure might be slightly different from the natural state of the voltage sensor. The conformation change upon hyperpolarization is closely coupled to an increase in 3(10) helix contents in S4, starting from the intracellular side. This could support a model for transition from the crystal structure where the hyperpolarization destabilizes S4-lipid hydrogen bonds, which leads to the helix rotating to keep the arginine side chains away from the hydrophobic phase, and the driving force for final relaxation by downward translation is partly entropic, which would explain the slow process. The coordinates of the transmembrane part of the simulated channel actually stay closer to the recently determined higher-resolution Kv1.2 chimera channel than the starting structure for the entire second half of the simulation (0.5-1 micros). Together with lipids binding in matching positions and significant thinning of the membrane also observed in experiments, this provides additional support for the predictive power of microsecond-scale membrane protein simulations.

  • 26. Boekel, Jorrit
    et al.
    Chilton, John M
    Cooke, Ira R
    Horvatovich, Peter L
    Jagtap, Pratik D
    Käll, Lukas
    KTH, School of Biotechnology (BIO), Gene Technology.
    Lehtiö, Janne
    Lukasse, Pieter
    Moerland, Perry D
    Griffin, Timothy J
    Multi-omic data analysis using Galaxy2015In: Nature Biotechnology, ISSN 1087-0156, E-ISSN 1546-1696, Vol. 33, no 2, p. 137-9Article in journal (Refereed)
  • 27.
    Borgström, Erik
    KTH, School of Biotechnology (BIO), Gene Technology.
    Technologies for Single Cell Genome Analysis2016Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    During the last decade high throughput DNA sequencing of single cells has evolved from an idea to one of the most high profile fields of research. Much of this development has been possible due to the dramatic reduction in costs for massively parallel sequencing. The four papers included in this thesis describe or evaluate technological advancements for high throughput DNA sequencing of single cells and single molecules.

    As the sequencing technologies improve, more samples are analyzed in parallel. In paper 1, an automated procedure for preparation of samples prior to massively parallel sequencing is presented. The method has been applied to several projects and further development by others has enabled even higher sample throughputs. Amplification of single cell genomes is a prerequisite for sequence analysis. Paper 2 evaluates four commercially available kits for whole genome amplification of single cells. The results show that coverage of the genome differs significantly among the protocols and as expected this has impact on the downstream analysis. In Paper 3, single cell genotyping by exome sequencing is used to confirm the presence of fat cells derived from donated bone marrow within the recipients’ fat tissue. Close to hundred single cells were exome sequenced and a subset was validated by whole genome sequencing. In the last paper, a new method for phasing (i.e. determining the physical connection of variant alleles) is presented. The method barcodes amplicons from single molecules in emulsion droplets. The barcodes can then be used to determine which variants were present on the same original DNA molecule. The method is applied to two variable regions in the bacterial 16S gene in a metagenomic sample.

    Thus, two of the papers (1 and 4) present development of new methods for increasing the throughput and information content of data from massively parallel sequencing. Paper 2 evaluates and compares currently available methods and in paper 3, a biological question is answered using some of these tools.

  • 28. Chen, Kevin
    et al.
    Maaskola, Jonas
    Max Delbrück Centrum für Molekulare Medizin, Berlin-Buch, Germany.
    Siegal, Mark L
    Rajewsky, Nikolaus
    Reexamining microRNA site accessibility in Drosophila: a population genomics study.2009In: PLoS ONE, ISSN 1932-6203, E-ISSN 1932-6203, Vol. 4, no 5Article in journal (Refereed)
    Abstract [en]

    Kertesz et al. (Nature Genetics 2008) described PITA, a miRNA target prediction algorithm based on hybridization energy and site accessibility. In this note, we used a population genomics approach to reexamine their data and found that the PITA algorithm had lower specificity than methods based on evolutionary conservation at comparable levels of sensitivity.We also showed that deeply conserved miRNAs tend to have stronger hybridization energies to their targets than do other miRNAs. Although PITA had higher specificity in predicting targets than a naïve seed-match method, this signal was primarily due to the use of a single cutoff score for all miRNAs and to the observed correlation between conservation and hybridization energy. Overall, our results clarify the accuracy of different miRNA target prediction algorithms in Drosophila and the role of site accessibility in miRNA target prediction.

  • 29.
    Contreras, F.-Xabier
    et al.
    Heidelberg University.
    Ernst, Andreas M
    Heidelberg University.
    Haberkant, Per
    Heidelberg University.
    Björkholm, Patrik
    Stockholm University.
    Lindahl, Erik
    KTH, School of Engineering Sciences (SCI), Theoretical Physics, Theoretical & Computational Biophysics.
    Gönen, Başak
    Tischer, Christian
    Heidelberg University.
    Elofsson, Arne
    Stockholm University.
    von Heijne, Gunnar
    Stockholm University.
    Thiele, Christoph
    Heidelberg University.
    Pepperkok, Rainer
    Heidelberg University.
    Wieland, Felix
    Heidelberg University.
    Brügger, Britta
    Heidelberg University.
    Molecular recognition of a single sphingolipid species by a protein's transmembrane domain2012In: Nature, ISSN 0028-0836, E-ISSN 1476-4687, Vol. 481, no 7382, p. 525-529Article in journal (Refereed)
    Abstract [en]

    Functioning and processing of membrane proteins critically depend on the way their transmembrane segments are embedded in the membrane. Sphingolipids are structural components of membranes and can also act as intracellular second messengers. Not much is known of sphingolipids binding to transmembrane domains (TMDs) of proteins within the hydrophobic bilayer, and how this could affect protein function. Here we show a direct and highly specific interaction of exclusively one sphingomyelin species, SM 18, with the TMD of the COPI machinery protein p24 (ref. 2). Strikingly, the interaction depends on both the headgroup and the backbone of the sphingolipid, and on a signature sequence (VXXTLXXIY) within the TMD. Molecular dynamics simulations show a close interaction of SM 18 with the TMD. We suggest a role of SM 18 in regulating the equilibrium between an inactive monomeric and an active oligomeric state of the p24 protein, which in turn regulates COPI-dependent transport. Bioinformatic analyses predict that the signature sequence represents a conserved sphingolipid-binding cavity in a variety of mammalian membrane proteins. Thus, in addition to a function as second messengers, sphingolipids can act as cofactors to regulate the function of transmembrane proteins. Our discovery of an unprecedented specificity of interaction of a TMD with an individual sphingolipid species adds to our understanding of why biological membranes are assembled from such a large variety of different lipids.

  • 30. Daniel, C.
    et al.
    Lagergren, Jens
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Öhman, M.
    RNA editing of non-coding RNA and its role in gene regulation2015In: Biochimie, ISSN 0300-9084, E-ISSN 1638-6183, Vol. 117, p. 22-27Article in journal (Refereed)
    Abstract [en]

    It has for a long time been known that repetitive elements, particularly Alu sequences in human, are edited by the adenosine deaminases acting on RNA, ADAR, family. The functional interpretation of these events has been even more difficult than that of editing events in coding sequences, but today there is an emerging understanding of their downstream effects. A surprisingly large fraction of the human transcriptome contains inverted Alu repeats, often forming long double stranded structures in RNA transcripts, typically occurring in introns and UTRs of protein coding genes. Alu repeats are also common in other primates, and similar inverted repeats can frequently be found in non-primates, although the latter are less prone to duplex formation. In human, as many as 700,000 Alu elements have been identified as substrates for RNA editing, of which many are edited at several sites. In fact, recent advancements in transcriptome sequencing techniques and bioinformatics have revealed that the human editome comprises at least a hundred million adenosine to inosine (A-to-I) editing sites in Alu sequences. Although substantial additional efforts are required in order to map the editome, already present knowledge provides an excellent starting point for studying cis-regulation of editing. In this review, we will focus on editing of long stem loop structures in the human transcriptome and how it can effect gene expression.

  • 31. Del Giudice, M.
    et al.
    Bo, Stefano
    KTH, Centres, Nordic Institute for Theoretical Physics NORDITA.
    Grigolon, S.
    Bosia, C.
    On the role of extrinsic noise in microRNA-mediated bimodal gene expression2018In: PloS Computational Biology, ISSN 1553-734X, E-ISSN 1553-7358, Vol. 14, no 4, article id e1006063Article in journal (Refereed)
    Abstract [en]

    Several studies highlighted the relevance of extrinsic noise in shaping cell decision making and differentiation in molecular networks. Bimodal distributions of gene expression levels provide experimental evidence of phenotypic differentiation, where the modes of the distribution often correspond to different physiological states of the system. We theoretically address the presence of bimodal phenotypes in the context of microRNA (miRNA)-mediated regulation. MiRNAs are small noncoding RNA molecules that downregulate the expression of their target mRNAs. The nature of this interaction is titrative and induces a threshold effect: below a given target transcription rate almost no mRNAs are free and available for translation. We investigate the effect of extrinsic noise on the system by introducing a fluctuating miRNA-transcription rate. We find that the presence of extrinsic noise favours the presence of bimodal target distributions which can be observed for a wider range of parameters compared to the case with intrinsic noise only and for lower miRNA-target interaction strength. Our results suggest that combining threshold-inducing interactions with extrinsic noise provides a simple and robust mechanism for obtaining bimodal populations without requiring fine tuning. Furthermore, we characterise the protein distribution’s dependence on protein half-life.

  • 32.
    Emanuelsson, Olof
    Stockholms unviersitet.
    Predicting protein subcellular localisation from amino acid sequence information.2002In: Briefings in Bioinformatics, ISSN 1467-5463, E-ISSN 1477-4054, Vol. 3, no 4, p. 361-76Article in journal (Refereed)
    Abstract [en]

    Predicting the subcellular localisation of proteins is an important part of the elucidation of their functions and interactions. Here, the amino acid sequence motifs that direct proteins to their proper subcellular compartment are surveyed, different methods for localisation prediction are discussed, and some benchmarks for the more commonly used predictors are presented.

  • 33.
    Emanuelsson, Olof
    et al.
    Stockholms Universitet.
    Brunak, Søren
    von Heijne, Gunnar
    Nielsen, Henrik
    Locating proteins in the cell using TargetP, SignalP and related tools.2007In: Nature Protocols, ISSN 1754-2189, E-ISSN 1750-2799, Vol. 2, no 4, p. 953-971Article in journal (Refereed)
    Abstract [en]

    Determining the subcellular localization of a protein is an important first step toward understanding its function. Here, we describe the properties of three well-known N-terminal sequence motifs directing proteins to the secretory pathway, mitochondria and chloroplasts, and sketch a brief history of methods to predict subcellular localization based on these sorting signals and other sequence properties. We then outline how to use a number of internet-accessible tools to arrive at a reliable subcellular localization prediction for eukaryotic and prokaryotic proteins. In particular, we provide detailed step-by-step instructions for the coupled use of the amino-acid sequence-based predictors TargetP, SignalP, ChloroP and TMHMM, which are all hosted at the Center for Biological Sequence Analysis, Technical University of Denmark. In addition, we describe and provide web references to other useful subcellular localization predictors. Finally, we discuss predictive performance measures in general and the performance of TargetP and SignalP in particular.

  • 34.
    Emanuelsson, Olof
    et al.
    Stockholms unviersitet.
    Elofsson, Arne
    von Heijne, Gunnar
    Cristóbal, Susana
    In silico prediction of the peroxisomal proteome in fungi, plants and animals.2003In: Journal of Molecular Biology, ISSN 0022-2836, E-ISSN 1089-8638, Vol. 330, no 2, p. 443-456Article in journal (Refereed)
    Abstract [en]

    In an attempt to improve our abilities to predict peroxisomal proteins, we have combined machine-learning techniques for analyzing peroxisomal targeting signals (PTS1) with domain-based cross-species comparisons between eight eukaryotic genomes. Our results indicate that this combined approach has a significantly higher specificity than earlier attempts to predict peroxisomal localization, without a loss in sensitivity. This allowed us to predict 430 peroxisomal proteins that almost completely lack a localization annotation. These proteins can be grouped into 29 families covering most of the known steps in all known peroxisomal pathways. In general, plants have the highest number of predicted peroxisomal proteins, and fungi the smallest number.

  • 35.
    Emanuelsson, Olof
    et al.
    Yale University.
    Nagalakshmi, Ugrappa
    Zheng, Deyou
    Rozowsky, Joel S
    Urban, Alexander E
    Du, Jiang
    Lian, Zheng
    Stolc, Viktor
    Weissman, Sherman
    Snyder, Michael
    Gerstein, Mark B
    Assessing the performance of different high-density tiling microarray strategies for mapping transcribed regions of the human genome.2007In: Genome Research, ISSN 1088-9051, E-ISSN 1549-5469, Vol. 17, no 6, p. 886-897Article in journal (Refereed)
    Abstract [en]

    Genomic tiling microarrays have become a popular tool for interrogating the transcriptional activity of large regions of the genome in an unbiased fashion. There are several key parameters associated with each tiling experiment (e.g., experimental protocols and genomic tiling density). Here, we assess the role of these parameters as they are manifest in different tiling-array platforms used for transcription mapping. First, we analyze how a number of published tiling-array experiments agree with established gene annotation on human chromosome 22. We observe that the transcription detected from high-density arrays correlates substantially better with annotation than that from other array types. Next, we analyze the transcription-mapping performance of the two main high-density oligonucleotide array platforms in the ENCODE regions of the human genome. We hybridize identical biological samples and develop several ways of scoring the arrays and segmenting the genome into transcribed and nontranscribed regions, with the aim of making the platforms most comparable to each other. Finally, we develop a platform comparison approach based on agreement with known annotation. Overall, we find that the performance improves with more data points per locus, coupled with statistical scoring approaches that properly take advantage of this, where this larger number of data points arises from higher genomic tiling density and the use of replicate arrays and mismatches. While we do find significant differences in the performance of the two high-density platforms, we also find that they complement each other to some extent. Finally, our experiments reveal a significant amount of novel transcription outside of known genes, and an appreciable sample of this was validated by independent experiments.

  • 36.
    Emanuelsson, Olof
    et al.
    Stockholms unviersitet.
    Nielsen, H
    Center for Biological Sequence Analysis, Technical University of Denmark.
    Brunak, S
    Center for Biological Sequence Analysis, Technical University of Denmark.
    von Heijne, G
    Stockholms universitet.
    Predicting subcellular localization of proteins based on their N-terminal amino acid sequence.2000In: Journal of Molecular Biology, ISSN 0022-2836, E-ISSN 1089-8638, Vol. 300, no 4, p. 1005-1016Article in journal (Refereed)
    Abstract [en]

    A neural network-based tool, TargetP, for large-scale subcellular location prediction of newly identified proteins has been developed. Using N-terminal sequence information only, it discriminates between proteins destined for the mitochondrion, the chloroplast, the secretory pathway, and "other" localizations with a success rate of 85% (plant) or 90% (non-plant) on redundancy-reduced test sets. From a TargetP analysis of the recently sequenced Arabidopsis thaliana chromosomes 2 and 4 and the Ensembl Homo sapiens protein set, we estimate that 10% of all plant proteins are mitochondrial and 14% chloroplastic, and that the abundance of secretory proteins, in both Arabidopsis and Homo, is around 10%. TargetP also predicts cleavage sites with levels of correctly predicted sites ranging from approximately 40% to 50% (chloroplastic and mitochondrial presequences) to above 70% (secretory signal peptides). TargetP is available as a web-server at http://www.cbs.dtu.dk/services/TargetP/.

  • 37.
    Emanuelsson, Olof
    et al.
    Stockholms unviersitet.
    Nielsen, H
    Stockholms universitet.
    von Heijne, G
    Stockholms universitet.
    ChloroP, a neural network-based method for predicting chloroplast transit peptides and their cleavage sites.1999In: Protein Science, ISSN 0961-8368, E-ISSN 1469-896X, Vol. 8, no 5, p. 978-984Article in journal (Refereed)
    Abstract [en]

    We present a neural network based method (ChloroP) for identifying chloroplast transit peptides and their cleavage sites. Using cross-validation, 88% of the sequences in our homology reduced training set were correctly classified as transit peptides or nontransit peptides. This performance level is well above that of the publicly available chloroplast localization predictor PSORT. Cleavage sites are predicted using a scoring matrix derived by an automatic motif-finding algorithm. Approximately 60% of the known cleavage sites in our sequence collection were predicted to within +/-2 residues from the cleavage sites given in SWISS-PROT. An analysis of 715 Arabidopsis thaliana sequences from SWISS-PROT suggests that the ChloroP method should be useful for the identification of putative transit peptides in genome-wide sequence data. The ChloroP predictor is available as a web-server at http://www.cbs.dtu.dk/services/ChloroP/.

  • 38.
    Emanuelsson, Olof
    et al.
    Stockholms unviersitet.
    von Heijne, G
    Stockholms universitet.
    Prediction of organellar targeting signals.2001In: Biochimica et Biophysica Acta. Molecular Cell Research, ISSN 0167-4889, E-ISSN 1879-2596, Vol. 1541, no 1-2, p. 114-119Article in journal (Refereed)
    Abstract [en]

    The subcellular location of a protein is an important characteristic with functional implications, and hence the problem of predicting subcellular localization from the amino acid sequence has received a fair amount of attention from the bioinformatics community. This review attempts to summarize the present state of the art in the field.

  • 39.
    Emanuelsson, Olof
    et al.
    Stockholms unviersitet.
    von Heijne, G
    Stockholms universitet.
    Schneider, G
    F. Hoffmann-La Roche Ltd., Pharmaceutical Division, Basel.
    Analysis and prediction of mitochondrial targeting peptides.2001In: Methods in Cell Biology, ISSN 0091-679X, Vol. 65, p. 175-187Article in journal (Refereed)
  • 40.
    Ensterö, Mats
    et al.
    Department of Molecular Biology and Functional Genomics, Stockholm University.
    Åkerborg, Örjan
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
    Lundin, Daniel
    Department of Molecular Biology and Functional Genomics, Stockholm University.
    Wang, Bei
    Department of Computer Science, Duke University, Durham, United States.
    Furey, Terrence S.
    Department of Computer Science, Duke University, Durham, United States.
    Öhman, Marie
    Department of Molecular Biology and Functional Genomics, Stockholm University.
    Lagergren, Jens
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
    A computational screen for site selective A-to-I editing detects novel sites in neuron specific Hu proteins2010In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 11Article in journal (Refereed)
    Abstract [en]

    Background: Several bioinformatic approaches have previously been used to find novel sites of ADAR mediated A-to-I RNA editing in human. These studies have discovered thousands of genes that are hyper-edited in their non-coding intronic regions, especially in alu retrotransposable elements, but very few substrates that are site-selectively edited in coding regions. Known RNA edited substrates suggest, however, that site selective A-to-I editing is particularly important for normal brain development in mammals. Results: We have compiled a screen that enables the identification of new sites of site-selective editing, primarily in coding sequences. To avoid hyper-edited repeat regions, we applied our screen to the alu-free mouse genome. Focusing on the mouse also facilitated better experimental verification. To identify candidate sites of RNA editing, we first performed an explorative screen based on RNA structure and genomic sequence conservation. We further evaluated the results of the explorative screen by determining which transcripts were enriched for A-G mismatches between the genomic template and the expressed sequence since the editing product, inosine (I), is read as guanosine (G) by the translational machinery. For expressed sequences, we only considered coding regions to focus entirely on re-coding events. Lastly, we refined the results from the explorative screen using a novel scoring scheme based on characteristics for known A-to-I edited sites. The extent of editing in the final candidate genes was verified using total RNA from mouse brain and 454 sequencing. Conclusions: Using this method, we identified and confirmed efficient editing at one site in the Gabra3 gene. Editing was also verified at several other novel sites within candidates predicted to be edited. Five of these sites are situated in genes coding for the neuron-specific RNA binding proteins HuB and HuD.

  • 41. Euskirchen, Ghia M
    et al.
    Rozowsky, Joel S
    Wei, Chia-Lin
    Lee, Wah Heng
    Zhang, Zhengdong D
    Hartman, Stephen
    Emanuelsson, Olof
    Yale University.
    Stolc, Viktor
    Weissman, Sherman
    Gerstein, Mark B
    Ruan, Yijun
    Snyder, Michael
    Mapping of transcription factor binding regions in mammalian cells by ChIP: comparison of array- and sequencing-based technologies.2007In: Genome Research, ISSN 1088-9051, E-ISSN 1549-5469, Vol. 17, no 6, p. 898-909Article in journal (Refereed)
    Abstract [en]

    Recent progress in mapping transcription factor (TF) binding regions can largely be credited to chromatin immunoprecipitation (ChIP) technologies. We compared strategies for mapping TF binding regions in mammalian cells using two different ChIP schemes: ChIP with DNA microarray analysis (ChIP-chip) and ChIP with DNA sequencing (ChIP-PET). We first investigated parameters central to obtaining robust ChIP-chip data sets by analyzing STAT1 targets in the ENCODE regions of the human genome, and then compared ChIP-chip to ChIP-PET. We devised methods for scoring and comparing results among various tiling arrays and examined parameters such as DNA microarray format, oligonucleotide length, hybridization conditions, and the use of competitor Cot-1 DNA. The best performance was achieved with high-density oligonucleotide arrays, oligonucleotides >/=50 bases (b), the presence of competitor Cot-1 DNA and hybridizations conducted in microfluidics stations. When target identification was evaluated as a function of array number, 80%-86% of targets were identified with three or more arrays. Comparison of ChIP-chip with ChIP-PET revealed strong agreement for the highest ranked targets with less overlap for the low ranked targets. With advantages and disadvantages unique to each approach, we found that ChIP-chip and ChIP-PET are frequently complementary in their relative abilities to detect STAT1 targets for the lower ranked targets; each method detected validated targets that were missed by the other method. The most comprehensive list of STAT1 binding regions is obtained by merging results from ChIP-chip and ChIP-sequencing. Overall, this study provides information for robust identification, scoring, and validation of TF targets using ChIP-based technologies.

  • 42.
    Fagerberg, Linn
    KTH, School of Biotechnology (BIO), Proteomics.
    Mapping the human proteome using bioinformatic methods2011Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    The fundamental goal of proteomics is to gain an understanding of the expression and function of the proteome on the level of individual proteins, on the level of defined cell types and on the level of the entire organism. In this thesis, the human proteome is explored using membrane protein topology prediction methods to define the human membrane proteome and by global protein expression profiling, which relies on a complex study of the location and expression levels of proteins in tissues and cells.

    A whole-proteome analysis was performed based on the predicted protein-coding genes of humans using a selection of membrane protein topology prediction methods. The study used a majority decision-based method, which estimated that approximately 26% of the human genes encode for a membrane protein. The prediction results are displayed in a visualization tool to facilitate the selection of antigens to be used for antibody generation.

    Global protein expression profiles in a large number of cells and tissues in the human body were analyzed for more than 4000 protein targets, based on data from the antibody-based immunohistochemistry and immunofluorescence methods within the framework of the Human Protein Atlas project. The results revealed few cell-type specific proteins and a high fraction of human proteins expressed in most cells, suggesting that cell and tissue specificity is attained by a fine-tuned regulation of protein levels. The expression profiles were also used to analyze the relationship between 45 cell lines by hierarchical clustering and principal component analysis. The global protein expression patterns overall reflected the tumor origin of the cells, and also allowed for identification of proteins of importance for distinguishing different categories of cell lines, as defined by phenotype of progenitor cell. In addition, the protein distribution in 16 subcellular compartments in three of the human cell lines was mapped. A large fraction of proteins were localized in two or more compartments and, in line with previous results, a majority of proteins were detected in all three cell lines.

    Finally, mass spectrometry-based protein expression levels were compared to RNA-seq-based transcript expression levels in three cell lines. Highly ubiquitous mRNA expression was found and the changes of expression levels between the cell lines showed high correlations between proteins and transcripts. Large general differences in abundance of proteins from various functional classes were observed. A comparison between categories based on expression levels revealed that, in general, genes with varying expression levels between the cell lines or only expressed in one cell line were highly enriched for cell-surface proteins.

    These studies show a path for a systematic analysis to characterize the proteome in human cells, tissues and organs.

  • 43.
    Fasterius, Erik
    KTH, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), Protein Science, Systems Biology.
    Analysis of public RNA-sequencing data reveals biological consequences of genetic heterogeneity in cell line populations2018In: Scientific Reports, ISSN 2045-2322, E-ISSN 2045-2322, Vol. 8, article id 11226Article in journal (Refereed)
    Abstract [en]

    Meta-analysis of datasets available in public repositories are used to gather and summarise experiments performed across laboratories, as well as to explore consistency of scientific findings. As data quality and biological equivalency across samples may obscure such analyses and consequently their conclusions, we investigated the comparability of 85 public RNA-seq cell line datasets. Thousands of pairwise comparisons of single nucleotide variants in 139 samples revealed variable genetic heterogeneity of the eight cell line populations analysed as well as variable data quality. The H9 and HCT116 cell lines were found to be remarkably stable across laboratories (with median concordances of 99.2% and 98.5%, respectively), in contrast to the highly variable HeLa cells (89.3%). We show that the genetic heterogeneity encountered greatly affects gene expression between same-cell comparisons, highlighting the importance of interrogating the biological equivalency of samples when comparing experimental datasets. Both the number of differentially expressed genes and the expression levels negatively correlate with the genetic heterogeneity. Finally, we demonstrate how comparing genetically heterogeneous datasets affect gene expression analyses and that high dissimilarity between same-cell datasets alters the expression of more than 300 cancer-related genes, which are often the focus of studies using cell lines.

  • 44.
    Fasterius, Erik
    KTH, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH).
    Exploring genetic heterogeneity in cancer using high-throughput DNA and RNA sequencing2018Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    High-throughput sequencing (HTS) technology has revolutionised the biomedical sciences, where it is used to analyse the genetic makeup and gene expression patterns of both primary patient tissue samples and models cultivated in vitro. This makes it especially useful for research on cancer, a disease that is characterised by its deadliness and genetic heterogeneity. This inherent genetic variation is an important aspect that warrants exploration, and the depth and breadth that HTS possesses makes it well-suited to investigate this facet of cancer.

    The types of analyses that may be accomplished with HTS technologies are many, but they may be divided into two groups: those that analyse the DNA of the sample in question, and those that work on the RNA. While DNA-based methods give information regarding the genetic landscape of the sample, RNA-based analyses yield data regarding gene expression patterns; both of these methods have already been used to investigate the heterogeneity present in cancer. While RNA-based methods are traditionally used exclusively for expression analyses, the data they yield may also be utilised to investigate the genetic variation present in the samples. This type of RNA-based analysis is seldom performed, however, and valuable information is thus ignored.

    The aim of this thesis is the development and application of DNA- and RNA- based HTS methods for analysing genetic heterogeneity within the context of cancer. The present investigation demonstrates that not only may RNA-based sequencing be used to successfully differentiate different in vitro cancer models through their genetic makeup, but that this may also be done for primary patient data. A pipeline for these types of analyses is established and evaluated, showing it to be both robust to several technical parameters as well as possess a broad scope of analytical possibilities. Genetic variation within cancer models in public databases are evaluated and demonstrated to affect gene expression in several cases. Both inter- and intra-patient genetic heterogeneity is shown using the established pipeline, in addition to demonstrating that cancerous cells are more heterogeneous than their normal neighbours. Finally, two bioinformatic open source software packages are presented.

    The results presented herein demonstrate that genetic analyses using RNA-based methods represent excellent complements to already existing DNA-based techniques, and further increase the already large scope of how HTS technologies may be utilised.

  • 45.
    Fasterius, Erik
    KTH, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), Protein Science, Systems Biology.
    seqCAT: a Bioconductor R-package for variant analysis of high throughput sequencing dataManuscript (preprint) (Other academic)
  • 46.
    Fasterius, Erik
    et al.
    KTH, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), Protein Science, Systems Biology.
    Uhlén, Mathias
    KTH, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), Protein Science, Systems Biology.
    Al-Khalili Szigyarto, Cristina
    KTH, Superseded Departments (pre-2005), Biotechnology.
    Single cell RNA-seq variant analysis for exploration of inter- and intra-tumour genetic heterogeneityManuscript (preprint) (Other academic)
  • 47. Friedländer, Marc R
    et al.
    Chen, Wei
    Adamidi, Catherine
    Maaskola, Jonas
    Max Delbrück Centrum für Molekulare Medizin, Robert-Rössle-Strasse 10, D-13125 Berlin-Buch, Germany.
    Einspanier, Ralf
    Knespel, Signe
    Rajewsky, Nikolaus
    Discovering microRNAs from deep sequencing data using miRDeep2008In: Nature Biotechnology, ISSN 1087-0156, E-ISSN 1546-1696, Vol. 26, no 4Article in journal (Refereed)
    Abstract [en]

    The capacity of highly parallel sequencing technologies to detect small RNAs at unprecedented depth suggests their value in systematically identifying microRNAs (miRNAs). However, the identification of miRNAs from the large pool of sequenced transcripts from a single deep sequencing run remains a major challenge. Here, we present an algorithm, miRDeep, which uses a probabilistic model of miRNA biogenesis to score compatibility of the position and frequency of sequenced RNA with the secondary structure of the miRNA precursor. We demonstrate its accuracy and robustness using published Caenorhabditis elegans data and data we generated by deep sequencing human and dog RNAs. miRDeep reports altogether approximately 230 previously unannotated miRNAs, of which four novel C. elegans miRNAs are validated by northern blot analysis.

  • 48.
    Gasser, Christian
    KTH, School of Engineering Sciences (SCI), Solid Mechanics (Dept.).
    Histomechanical modeling of thewall of abdominal aortic aneurysm2016In: Structure-Based Mechanics of Tissues and Organs, Springer, 2016, p. 57-78Chapter in book (Other academic)
    Abstract [en]

    Vascular diseases are already the leading cause of death in the industrialized countries and many of the associated risk factors are increasing. A multidisciplinary approach including biomechanics is needed to better understand and more effectively treat these diseases. Specifically, constitutive modeling is critical in understanding the biomechanics of the vascular wall and to uncover pathologies like Abdominal Aortic Aneurysms (AAAs), i.e. local dilatations of the infrarenal aorta. Aneurysms are formed through irreversible pathological remodeling of the vascular wall and integrating this biological process in the constitutive description could improve our current understanding of aneurysm disease. It might also increase the predictability of biomechanical simulations towards augmenting clinical decisions. The present chapter develops histomechanical constitutive models for the AAA wall according to Lanir’s pioneering approach. Consequently, macroscopic properties were derived through an integration of distributed fibers, where collagen was regarded as the most important protein of the aneurysmatic Extra Cellular Matrix (ECM). Collagen organization was quantified through Polarized Light Microscopy (PLM) of picrosirius red stained histological slices from tissue samples harvested during elective open AAA repair. This histological information was either directly integrated in the constitutive description or used to qualitatively validate the predicted remodeling of the AAA wall. Specifically, two descriptions for the AAA wall were used, where collagen was regarded either as a purely passive entity of the ECM or as an active entity. The suggested constitutive models were able to successfully capture salient features of the AAA wall, but a rigorous validation against detailed experimental data was beyond the scope of this chapter.

  • 49. Gerstein, Mark B
    et al.
    Bruce, Can
    Rozowsky, Joel S
    Zheng, Deyou
    Du, Jiang
    Korbel, Jan O
    Emanuelsson, Olof
    Yale University.
    Zhang, Zhengdong D
    Weissman, Sherman
    Snyder, Michael
    What is a gene, post-ENCODE?: History and updated definition2007In: Genome Research, ISSN 1088-9051, E-ISSN 1549-5469, Vol. 17, no 6, p. 669-681Article in journal (Refereed)
    Abstract [en]

    While sequencing of the human genome surprised us with how many protein-coding genes there are, it did not fundamentally change our perspective on what a gene is. In contrast, the complex patterns of dispersed regulation and pervasive transcription uncovered by the ENCODE project, together with non-genic conservation and the abundance of noncoding RNA genes, have challenged the notion of the gene. To illustrate this, we review the evolution of operational definitions of a gene over the past century--from the abstract elements of heredity of Mendel and Morgan to the present-day ORFs enumerated in the sequence databanks. We then summarize the current ENCODE findings and provide a computational metaphor for the complexity. Finally, we propose a tentative update to the definition of a gene: A gene is a union of genomic sequences encoding a coherent set of potentially overlapping functional products. Our definition side-steps the complexities of regulation and transcription by removing the former altogether from the definition and arguing that final, functional gene products (rather than intermediate transcripts) should be used to group together entities associated with a single gene. It also manifests how integral the concept of biological function is in defining genes.

  • 50.
    Gholami, Ali
    et al.
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Dowling, Jim
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC. KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    A security framework for population-scale genomics analysis2015In: Proceedings of the 2015 International Conference on High Performance Computing and Simulation, HPCS 2015, IEEE conference proceedings, 2015, p. 106-114Conference paper (Refereed)
    Abstract [en]

    Biobanks store genomic material from identifiable individuals. Recently many population-based studies have started sequencing genomic data from biobank samples and cross-linking the genomic data with clinical data, with the goal of discovering new insights into disease and clinical treatments. However, the use of genomic data for research has far-reaching implications for privacy and the relations between individuals and society. In some jurisdictions, primarily in Europe, new laws are being or have been introduced to legislate for the protection of sensitive data relating to individuals, and biobank-specific laws have even been designed to legislate for the handling of genomic data and the clear definition of roles and responsibilities for the owners and processors of genomic data. This paper considers the security questions raised by these developments. We introduce a new threat model that enables the design of cloud-based systems for handling genomic data according to privacy legislation. We also describe the design and implementation of a security framework using our threat model for BiobankCloud, a platform that supports the secure storage and processing of genomic data in cloud computing environments.

1234 1 - 50 of 152
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf