GROMACS is one of the most widely used open-source and free software codes in chemistry, used primarily for dynamical simulations of biomolecules. It provides a rich set of calculation types, preparation and analysis tools. Several advanced techniques for free-energy calculations are supported. In version 5, it reaches new performance heights, through several new and enhanced parallelization algorithms. These work on every level; SIMD registers inside cores, multithreading, heterogeneous CPU–GPU acceleration, state-of-the-art 3D domain decomposition, and ensemble-level parallelization through built-in replica exchange and the separate Copernicus framework. The latest best-in-class compressed trajectory storage format is supported.
Background Low total diversity of the gut microbiota during the first year of life is associated with allergic diseases in infancy, but little is known how early microbial diversity is related to allergic disease later in school age. Objective To assess microbial diversity and characterize the dominant bacteria in stool during the first year of life in relation to the prevalence of different allergic diseases in school age, such as asthma, allergic rhinoconjunctivitis (ARC) and eczema. Methods The microbial diversity and composition was analysed with barcoded 16S rDNA 454 pyrosequencing in stool samples at 1week, 1month and 12months of age in 47 infants which were subsequently assessed for allergic disease and skin prick test reactivity at 7years of age (ClinicalTrials.gov ID NCT01285830). Results Children developing asthma (n=8) had a lower diversity of the total microbiota than non-asthmatic children at 1week (P=0.04) and 1month (P=0.003) of age, whereas allergic rhinoconjunctivitis (n=13), eczema (n=12) and positive skin prick reactivity (n=14) at 7years of age did not associate with the gut microbiota diversity. Neither was asthma associated with the microbiota composition later in infancy (at 12months). Children having IgE-associated eczema in infancy and subsequently developing asthma had lower microbial diversity than those that did not. There were no significant differences, however, in relative abundance of bacterial phyla and genera between children with or without allergic disease. Conclusion and Clinical Relevance Low total diversity of the gut microbiota during the first month of life was associated with asthma but not ARC in children at 7years of age. Measures affecting microbial colonization of the infant during the first month of life may impact asthma development in childhood.
BACKGROUND: It is debated whether a low total diversity of the gut microbiota in early childhood is more important than an altered prevalence of particular bacterial species for the increasing incidence of allergic disease. The advent of powerful, cultivation-free molecular methods makes it possible to characterize the total microbiome down to the genus level in large cohorts. OBJECTIVE: We sought to assess microbial diversity and characterize the dominant bacteria in stool during the first year of life in relation to atopic eczema development. METHODS: Microbial diversity and composition were analyzed with barcoded 16S rDNA 454-pyrosequencing in stool samples at 1 week, 1 month, and 12 months of age in 20 infants with IgE-associated eczema and 20 infants without any allergic manifestation until 2 years of age (ClinicalTrials.gov ID NCT01285830). RESULTS: Infants with IgE-associated eczema had a lower diversity of the total microbiota at 1 month (P= .004) and a lower diversity of the bacterial phylum Bacteroidetes and the genus Bacteroides at 1 month (P= .02 and P= .01) and the phylum Proteobacteria at 12 months of age (P= .02). The microbiota was less uniform at 1 month than at 12 months of age, with a high interindividual variability. At 12 months, when the microbiota had stabilized, Proteobacteria, comprising gram-negative organisms, were more abundant in infants without allergic manifestation (Empirical Analysis of Digital Gene Expression in R edgeR test: P= .008, q= 0.02). CONCLUSION: Low intestinal microbial diversity during the first month of life was associated with subsequent atopic eczema.
Recent understandings in the development and spread of cancer have led to the realization of novel single cell analysis platforms focused on circulating tumor cells (CTCs). A simple, rapid, and inexpensive analytical platform capable of providing genetic information on these rare cells is highly desirable to support clinicians and researchers alike to either support the selection or adjustment of therapy or provide fundamental insights into cell function and cancer progression mechanisms. We report on the genetic profiling of single cancer cells, exploiting a combination of multiplex ligation-dependent probe amplification (MLPA) and electrochemical detection. Cells were isolated using laser capture and lysed, and the mRNA was extracted and transcribed into DNA. Seven markers were amplified by MLPA, which allows for the simultaneous amplification of multiple targets with a single primer pair, using MLPA probes containing unique barcode sequences. Capture probes complementary to each of these barcode sequences were immobilized on a printed circuit board (PCB) manufactured electrode array and exposed to single-stranded MLPA products and subsequently to a single stranded DNA reporter probe bearing a HRP molecule, followed by substrate addition and fast electrochemical pulse amperometric detection. We present asimple, rapid, flexible, and inexpensive approach for the simultaneous quantification of multiple breast cancer related mRNA markers, with single tumor cell sensitivity.
To develop novel strategies for prevention and treatment of dyslipidemia, it is essential to understand the pathophysiology of dyslipoproteinemia in humans. Lipoprotein metabolism is a complex system in which abnormal concentrations of various lipoprotein particles can result from alterations in their rates of production, conversion, and/or catabolism. Traditional methods that measure plasma lipoprotein concentrations only provide static estimates of lipoprotein metabolism and hence limited mechanistic information. By contrast, the use of tracers labeled with stable isotopes and mathematical modeling, provides us with a powerful tool for probing lipid and lipoprotein kinetics in vivo and furthering our understanding of the pathogenesis of dyslipoproteinemia.
Neuropeptide S (NPS) is a regulatory peptide with potent pharmacological effects. In rodents, NPS is expressed in a few pontine cell clusters. Its receptor (NPSR1) is, however, widely distributed in the brain. The anxiolytic and arousal promoting effects of NPS make the NPS NPSR1 system an interesting potential drug target in mood-related disorders. However, so far possible disease-related mechanisms involving NPS have only been studied in rodents. To validate the relevance of these animal studies for i.a. drug development, we have explored the distribution of NPS-expressing neurons in the human pons using in situ hybridization and stereological methods and we compared the distribution of NPS mRNA expressing neurons in the human and rat brain. The calculation revealed a total number of 22,317 +/- 2411 NPS mRNA-positive neurons in human, bilaterally. The majority of cells (84%) were located in the parabrachial area in human: in the extension of the medial and lateral parabrachial nuclei, in the Kolliker-Fuse nucleus and around the adjacent lateral lemniscus. In human, in sharp contrast to the rodents, only very few NPS-positive cells (5%) were found close to the locus coeruleus. In addition, we identified a smaller cell cluster (11% of all NPS cells) in the pontine central gray matter both in human and rat, which has not been described previously even in rodents. We also examined the distribution of NPSR1 mRNA-expressing neurons in the human pons. These cells were mainly located in the rostral laterodorsal tegmental nucleus, the cuneiform nucleus, the microcellular tegmental nucleus region and in the periaqueductal gray. Our results show that both NPS and NPSR1 in the human pons are preferentially localized in regions of importance for integration of visceral autonomic information and emotional behavior. The reported interspecies differences must, however, be considered when looking for targets for new pharmacotherapeutical interventions.
Neuropeptide S (NPS) is a regulatory peptide expressed by limited number of neurons in the brainstem. The simultaneous anxiolytic and arousal-promoting effect of NPS suggests an involvement in mood control and vigilance, making the NPS-NPS receptor system an interesting potential drug target. Here we examined, in detail, the distribution of NPS-immunoreactive (IR) fiber arborizations in brain regions of rat known to be involved in the regulation of sleep and arousal. Such nerve terminals were frequently apposed to GABAergic/galaninergic neurons in the ventro-lateral preoptic area (VLPO) and to tyrosine hydroxylase-IR neurons in all hypothalamic/thalamic dopamine cell groups. Then we applied the single platform-on-water (mainly REM) sleep deprivation method to study the functional role of NPS in the regulation of arousal. Of the three pontine NPS cell clusters, the NPS transcript levels were increased only in the peri-coerulear group in sleep-deprived animals, but not in stress controls. The density of NPS-IR fibers was significantly decreased in the median preoptic nucleus-VLPO region after the sleep deprivation, while radioimmunoassay and mass spectrometry measurements showed a parallel increase of NPS in the anterior hypothalamus. The expression of the NPS receptor was, however, not altered in the VLPO-region. The present results suggest a selective activation of one of the three NPS-expressing neuron clusters as well as release of NPS in distinct forebrain regions after sleep deprivation. Taken together, our results emphasize a role of the peri-coerulear cluster in the modulation of arousal, and the importance of preoptic area for the action of NPS on arousal and sleep.
Alzheimer's disease and other age-related neurodegenerative disorders are associated with deterioration of the noradrenergic locus coeruleus (LC), a probable trigger for mood and memory dysfunction. LC noradrenergic neurons exhibit particularly high levels of somatostatin binding sites. This is noteworthy since cortical and hypothalamic somatostatin content is reduced in neurodegenerative pathologies. Yet a possible role of a somatostatin signal deficit in the maintenance of noradrenergic projections remains unknown. Here, we deployed tissue microarrays, immunohistochemistry, quantitative morphometry and mRNA profiling in a cohort of Alzheimer's and age-matched control brains in combination with genetic models of somatostatin receptor deficiency to establish causality between defunct somatostatin signalling and noradrenergic neurodegeneration. In Alzheimer's disease, we found significantly reduced somatostatin protein expression in the temporal cortex, with aberrant clustering and bulging of tyrosine hydroxylase-immunoreactive afferents. As such, somatostatin receptor 2 (SSTR2) mRNA was highly expressed in the human LC, with its levels significantly decreasing from Braak stages III/IV and onwards, i.e., a process preceding advanced Alzheimer's pathology. The loss of SSTR2 transcripts in the LC neurons appeared selective, since tyrosine hydroxylase, dopamine beta-hydroxylase, galanin or galanin receptor 3 mRNAs remained unchanged. We modeled these pathogenic changes in Sstr2 (-/-) mice and, unlike in Sstr1 (-/-) or Sstr4 (-/-) genotypes, they showed selective, global and progressive degeneration of their central noradrenergic projections. However, neuronal perikarya in the LC were found intact until late adulthood (< 8 months) in Sstr2 (-/-) mice. In contrast, the noradrenergic neurons in the superior cervical ganglion lacked SSTR2 and, as expected, the sympathetic innervation of the head region did not show any signs of degeneration. Our results indicate that SSTR2-mediated signaling is integral to the maintenance of central noradrenergic projections at the system level, and that early loss of somatostatin receptor 2 function may be associated with the selective vulnerability of the noradrenergic system in Alzheimer's disease.
Despite decades of accumulated knowledge about proteins and their post-translational modifications (PTMs), numerous questions remain regarding their molecular composition and biological function. One of the most fundamental queries is the extent to which the combinations of DNA-, RNA-and PTM-level variations explode the complexity of the human proteome. Here, we outline what we know from current databases and measurement strategies including mass spectrometry-based proteomics. In doing so, we examine prevailing notions about the number of modifications displayed on human proteins and how they combine to generate the protein diversity underlying health and disease. We frame central issues regarding determination of protein-level variation and PTMs, including some paradoxes present in the field today. We use this framework to assess existing data and to ask the question, "How many distinct primary structures of proteins (proteoforms) are created from the 20,300 human genes?" We also explore prospects for improving measurements to better regularize protein-level biology and efficiently associate PTMs to function and phenotype.
Motivation: Liquid chromatography is frequently used as a means to reduce the complexity of peptide-mixtures in shotgun proteomics. For such systems, the time when a peptide is released from a chromatography column and registered in the mass spectrometer is referred to as the peptide's retention time. Using heuristics or machine learning techniques, previous studies have demonstrated that it is possible to predict the retention time of a peptide from its amino acid sequence. In this paper, we are applying Gaussian Process Regression to the feature representation of a previously described predictor ELUDE. Using this framework, we demonstrate that it is possible to estimate the uncertainty of the prediction made by the model. Here we show how this uncertainty relates to the actual error of the prediction. Results: In our experiments, we observe a strong correlation between the estimated uncertainty provided by Gaussian Process Regression and the actual prediction error. This relation provides us with new means for assessment of the predictions. We demonstrate how a subset of the peptides can be selected with lower prediction error compared to the whole set. We also demonstrate how such predicted standard deviations can be used for designing adaptive windowing strategies.
Sexual dimorphism has been used to describe morphological differences between the sexes, but can be extended to any biologically related process that varies between males and females. The synaptonemal complex (SC) is a tripartite structure that connects homologous chromosomes in meiosis. Here, aided by superresolution microscopy techniques, we show that the SC is subject to sexual dimorphism, in mouse germ cells. We have identified a significantly narrower SC in oocytes and have established that this difference does not arise from a different organization of the lateral elements nor from a different isoform of transverse filament protein SYCP1. Instead, we provide evidence for the existence of a narrower central element and a different integration site for the C-termini of SYCP1, in females. In addition to these female-specific features, we speculate that post-translation modifications affecting the SYCP1 coiled-coil region could render a more compact conformation, thus contributing to the narrower SC observed in females.
During meiosis, cohesin complexes mediate sister chromatid cohesion (SCC), synaptonemal complex (SC) assembly and synapsis. Here, using super-resolution microscopy, we imaged sister chromatid axes in mouse meiocytes that have normal or reduced levels of cohesin complexes, assessing the relationship between localization of cohesin complexes, SCC and SC formation. We show that REC8 foci are separated from each other by a distance smaller than 15% of the total chromosome axis length in wild-type meiocytes. Reduced levels of cohesin complexes result in a local separation of sister chromatid axial elements (LSAEs), as well as illegitimate SC formation at these sites. REC8 but not RAD21 or RAD21L cohesin complexes flank sites of LSAEs, whereas RAD21 and RAD21L appear predominantly along the separated sister-chromatid axes. Based on these observations and a quantitative distribution analysis of REC8 along sister chromatid axes, we propose that the high density of randomly distributed REC8 cohesin complexes promotes SCC and prevents illegitimate SC formation.
In higher eukaryotes many genes encode protein isoforms whose properties and biological roles are often poorly characterized. Here we describe systematic approaches for detection of either distinct isoforms, or separate pools of the same isoform, with differential biological properties. Using information from ion intensities we have estimated protein abundance levels and using rates of change in stable isotope labeling with amino acids in cell culture isotope ratios we measured turnover rates and subcellular distribution for the HeLa cell proteome. Protein isoforms were detected using three data analysis strategies that evaluate differences between stable isotope labeling with amino acids in cell culture isotope ratios for specific groups of peptides within the total set of peptides assigned to a protein. The candidate approach compares stable isotope labeling with amino acids in cell culture isotope ratios for predicted isoform- specific peptides, with ratio values for peptides shared by all the isoforms. The rule of thirds approach compares the mean isotope ratio values for all peptides in each of three equal segments along the linear length of the protein, assessing differences between segment values. The three in a row approach compares mean isotope ratio values for each sequential group of three adjacent peptides, assessing differences with the mean value for all peptides assigned to the protein. Protein isoforms were also detected and their properties evaluated by fractionating cell extracts on one- dimensional SDS- PAGE prior to trypsin digestion and MS analysis and independently evaluating isotope ratio values for the same peptides isolated from different gel slices. The effect of protein phosphorylation on turnover rates was analyzed by comparing mean turnover values calculated for all peptides assigned to a protein, either including, or excluding, values for cognate phosphopeptides. Collectively, these experimental and analytical approaches provide a framework for expanding the func- tional annotation of the genome.
Accumulating evidence shows that oxidative stress is involved in a wide variety of human diseases: rheumatoid arthritis, Alzheimers disease, Parkinsons disease, cancers, etc. Here, we discuss the significance of oxidative conditions in different disease, with the focus on neurodegenerative disease including Parkinsons disease, which is mainly caused by oxidative stress. Reactive oxygen and nitrogen species (ROS and RNS, respectively), collectively known as RONS, are produced by cellular enzymes such as myeloperoxidase, NADPH-oxidase (nicotinamide adenine dinucleotide phosphate-oxidase) and nitric oxide synthase (NOS). Natural antioxidant systems are categorized into enzymatic and non-enzymatic antioxidant groups. The former includes a number of enzymes such as catalase and glutathione peroxidase, while the latter contains a number of antioxidants acquired from dietary sources including vitamin C, carotenoids, flavonoids and polyphenols. There are also scavengers used for therapeutic purposes, such as 3,4-dihydroxyphenylalanine (L-DOPA) used routinely in the treatment of Parkinsons disease (not as a free radical scavenger), and 3-methyl-1-phenyl-2-pyrazolin-5-one (Edaravone) that acts as a free radical detoxifier frequently used in acute ischemic stroke. The cell surviving properties of L-DOPA and Edaravone against oxidative stress conditions rely on the alteration of a number of stress proteins such as Annexin A1, Peroxiredoxin-6 and PARK7/DJ-1 (Parkinson disease protein 7, also known as Protein deglycase DJ-1). Although they share the targets in reversing the cytotoxic effects of H2O2, they seem to have distinct mechanism of function. Exposure to L-DOPA may result in hypoxia condition and further induction of ORP150 (150-kDa oxygen-regulated protein) with its concomitant cytoprotective effects but Edaravone seems to protect cells via direct induction of Peroxiredoxin-2 and inhibition of apoptosis.
Soil microorganisms living in close contact with minerals play key roles in the biogeochemical cycling of elements, soil formation, and plant nutrition. Yet, the composition of microbial communities inhabiting the mineralosphere (i.e., the soil surrounding minerals) is poorly understood. Here, we explored the composition of soil microbial communities associated with different types of minerals in various soil horizons. To this effect, a field experiment was set up in which mineral specimens of apatite, biotite, and oligoclase were buried in the organic, eluvial, and upper illuvial horizons of a podzol soil. After an incubation period of two years, the soil attached to the mineral surfaces was collected, and microbial communities were analyzed by means of Illumina MiSeq sequencing of the 16S (prokaryotic) and 18S (eukaryotic) ribosomal RNA genes. We found that both composition and diversity of bacterial, archaeal, and fungal communities varied across the different mineral surfaces, and that mineral type had a greater influence on structuring microbial assemblages than soil horizon. Thus, our findings emphasize the importance of mineral surfaces as ecological niches in soils.
Introduction: Inflammation is an important risk-associated component of many diseases and can be diagnosed by molecular imaging of specific molecules. The aim of this study was to evaluate the possibility of targeting adhesion molecules on inflammation-activated endothelial cells and macrophages using an innovative multimodal polyvinyl alcohol-based microbubble (MB) contrast agent developed for diagnostic use in ultrasound, magnetic resonance, and nuclear imaging. Methods: We assessed the binding efficiency of antibody-conjugated multimodal contrast to inflamed murine or human endothelial cells (ECs), and to peritoneal macrophages isolated from rats with peritonitis, utilizing the fluorescence characteristics of the MBs. Single-photon emission tomography (SPECT) was used to illustrate 99m Tc-labeled MB targeting and distribution in an experimental in vivo model of inflammation. Results: Flow cytometry and confocal microscopy showed that binding of antibody-targeted MBs to the adhesion molecules ICAM-1, VCAM-1, or E-selectin, expressed on cytokine-stimulated ECs, was up to sixfold higher for human and 12-fold higher for mouse ECs, compared with that of non-targeted MBs. Under flow conditions, both VCAM-1- and E-selectin-targeted MBs adhered more firmly to stimulated human ECs than to untreated cells, while VCAM-1-targeted MBs adhered best to stimulated murine ECs. SPECT imaging showed an approximate doubling of signal intensity from the abdomen of rats with peritonitis, compared with healthy controls, after injection of anti-ICAM-1-MBs. Conclusions: This novel multilayer contrast agent can specifically target adhesion molecules expressed as a result of inflammatory stimuli in vitro, and has potential for use in disease-specific multimodal diagnostics in vivo using antibodies against targets of interest.
We here present a comparative genome, transcriptome and functional network analysis of three human cancer cell lines (A431, U251MG and U2OS), and investigate their relation to protein expression. Gene copy numbers significantly influenced corresponding transcript levels; their effect on protein levels was less pronounced. We focused on genes with altered mRNA and/or protein levels to identify those active in tumor maintenance. We provide comprehensive information for the three genomes and demonstrate the advantage of integrative analysis for identifying tumor-related genes amidst numerous background mutations by relating genomic variation to expression/protein abundance data and use gene networks to reveal implicated pathways.
Recent efforts to sequence the genomes and transcriptomes of several gymnosperm species have revealed an increased complexity in certain gene families in gymnosperms as compared to angiosperms. One example of this is the gymnosperm sister Glade to angiosperm TM3-like MADS-box genes, which at least in the conifer lineage has expanded in number of genes. We have previously identified a member of this subclade, the conifer gene DEFICIENS AGAMOUS LIKE 19 (DAL19), as being specifically upregulated in cone-setting shoots. Here, we show through Sanger sequencing of mRNA-derived cDNA and mapping to assembled conifer genomic sequences that DAL19 produces six mature mRNA splice variants in Picea abies. These splice variants use alternate first and last exons, while their four central exons constitute a core region present in all six transcripts. Thus, they are likely to be transcript isoforms. Quantitative Real-Time PCR revealed that two mutually exclusive first DAL19 exons are differentially expressed across meristems that will form either male or female cones, or vegetative shoots. Furthermore, mRNA in situ hybridization revealed that two mutually exclusive last DAL19 exons were expressed in a cell-specific pattern within bud meristems. Based on these findings in DAL19, we developed a sensitive approach to transcript isoform assembly from short-read sequencing of mRNA. We applied this method to 42 putative MADS-box core regions in P abies, from which we assembled 1084 putative transcripts. We manually curated these transcripts to arrive at 933 assembled transcript isoforms of 38 putative MADS-box genes. 152 of these isoforms, which we assign to 28 putative MADS-box genes, were differentially expressed across eight female, male, and vegetative buds. We further provide evidence of the expression of 16 out of the 38 putative MADS-box genes by mapping PacBio Iso-Seq circular consensus reads derived from pooled sample sequencing to assembled transcripts. In summary, our analyses reveal the use of mutually exclusive exons of MADS-box gene isoforms during early bud development in P. abies, and we find that the large number of identified MADS-box transcripts in P. abies results not only from expansion of the gene family through gene duplication events but also from the generation of numerous splice variants.
Sequencing of complete nuclear genomes of Neanderthal and Denisovan stimulated studies about their relationship with modern humans demonstrating, in particular, that DNA alleles from both Neanderthal and Denisovan genomes are present in genomes of modern humans. The Papuan genome is a unique object because it contains both Neanderthal and Denisovan alleles. Here, we have shown that the Papuan genomes contain different gene functional groups inherited from each of the ancient people. The Papuan genomes demonstrate a relative prevalence of Neanderthal alleles in genes responsible for the regulation of transcription and neurogenesis. The enrichment of specific functional groups with Denisovan alleles is less pronounced; these groups are responsible for bone and tissue remodeling. This analysis shows that introgression of alleles from Neanderthals and Denisovans to Papuans occurred independently and retention of these alleles may carry specific adaptive advantages.
Background: Gene-set enrichment analyses (GEA or GSEA) are commonly used for biological characterization of an experimental gene-set. This is done by finding known functional categories, such as pathways or Gene Ontology terms, that are over-represented in the experimental set; the assessment is based on an overlap statistic. Rich biological information in terms of gene interaction network is now widely available, but this topological information is not used by GEA, so there is a need for methods that exploit this type of information in high-throughput data analysis. Results: We developed a method of network enrichment analysis (NEA) that extends the overlap statistic in GEA to network links between genes in the experimental set and those in the functional categories. For the crucial step in statistical inference, we developed a fast network randomization algorithm in order to obtain the distribution of any network statistic under the null hypothesis of no association between an experimental gene-set and a functional category. We illustrate the NEA method using gene and protein expression data from a lung cancer study. Conclusions: The results indicate that the NEA method is more powerful than the traditional GEA, primarily because the relationships between gene sets were more strongly captured by network connectivity rather than by simple overlaps.
Background: Sampling genomes with Fosmid vectors and sequencing of pooled Fosmid libraries on the Illumina platform for massive parallel sequencing is a novel and promising approach to optimizing the trade-off between sequencing costs and assembly quality. Results: In order to sequence the genome of Norway spruce, which is of great size and complexity, we developed and applied a new technology based on the massive production, sequencing, and assembly of Fosmid pools (FP). The spruce chromosomes were sampled with similar to 40,000 bp Fosmid inserts to obtain around two-fold genome coverage, in parallel with traditional whole genome shotgun sequencing (WGS) of haploid and diploid genomes. Compared to the WGS results, the contiguity and quality of the FP assemblies were high, and they allowed us to fill WGS gaps resulting from repeats, low coverage, and allelic differences. The FP contig sets were further merged with WGS data using a novel software package GAM-NGS. Conclusions: By exploiting FP technology, the first published assembly of a conifer genome was sequenced entirely with massively parallel sequencing. Here we provide a comprehensive report on the different features of the approach and the optimization of the process. We have made public the input data (FASTQ format) for the set of pools used in this study: ftp://congenie.org/congenie/Nystedt_2013/Assembly/ProcessedData/FosmidPools/.(alternatively accessible via http://congenie.org/downloads).The software used for running the assembly process is available at http://research.scilifelab.se/andrej_alexeyenko/downloads/fpools/.
FunCoup (http://FunCoup.sbc.su.se) is a database that maintains and visualizes global gene/protein networks of functional coupling that have been constructed by Bayesian integration of diverse high-throughput data. FunCoup achieves high coverage by orthology-based integration of data sources from different model organisms and from different platforms. We here present release 2.0 in which the data sources have been updated and the methodology has been refined. It contains a new data type Genetic Interaction, and three new species: chicken, dog and zebra fish. As FunCoup extensively transfers functional coupling information between species, the new input datasets have considerably improved both coverage and quality of the networks. The number of high-confidence network links has increased dramatically. For instance, the human network has more than eight times as many links above confidence 0.5 as the previous release. FunCoup provides facilities for analysing the conservation of subnetworks in multiple species. We here explain how to do comparative interactomics on the FunCoup website.
Human noroviruses are the leading causative agents of epidemic and sporadic viral gastroenteritis and childhood diarrhoea worldwide. Human histo-blood group antigens (HBGA) serve as receptors for norovirus capsid protein attachment and play a critical role in infection. This makes HBGA-norovirus binding a promising target for drug development. Recently solved crystal structures of norovirus bound to HBGA have provided a structural basis for identification of potential anti-norovirus drugs and subsequently performed in silico and in vitro drug screens have identified compounds that block norovirus binding and may thereby serve as structural templates for design of therapeutic norovirus inhibitors. This review explores norovirus therapeutic options based on the strategy of blocking norovirus-HBGA binding.
Background: MCMC-based methods are important for Bayesian inference of phylogeny and related parameters. Although being computationally expensive, MCMC yields estimates of posterior distributions that are useful for estimating parameter values and are easy to use in subsequent analysis. There are, however, sometimes practical difficulties with MCMC, relating to convergence assessment and determining burn-in, especially in large-scale analyses. Currently, multiple software are required to perform, e.g., convergence, mixing and interactive exploration of both continuous and tree parameters. Results: We have written a software called VMCMC to simplify post-processing of MCMC traces with, for example, automatic burn-in estimation. VMCMC can also be used both as a GUI-based application, supporting interactive exploration, and as a command-line tool suitable for automated pipelines. Conclusions: VMCMC is a free software available under the New BSD License. Executable jar files, tutorial manual and source code can be downloaded from https://bitbucket.org/rhali/visualmcmc/.
Kindlin proteins represent a novel family of evolutionarily conserved FERM domain containing proteins (FDCPs) and are members of B4.1 superfamily. Kindlins consist of three conserved protein homologs in vertebrates: Kindlin-1, Kindlin-2 and Kindlin-3. All three homologs are associated with focal adhesions and are involved in Integrin activation. FERM domain of each Kindlin is bipartite and plays a key role in Integrin activation. A single ancestral Kindlin protein can be traced back to earliest metazoans, e.g., to Parazoa. This protein underwent multiple rounds of duplication in vertebrates, leading to the present Kindlin family. In this study, we trace phylogenetic and evolutionary history of Kindlin FERM domain with respect to FERM domain of other FDCPs. We show that FERM domain in Kindlin homologs is conserved among Kindlins but amount of conservation is less in comparison with FERM domain of other members in B4.1 superfamily. Furthermore, insertion of Pleckstrin Homology like domain in Kindlin FERM domain has important evolutionary and functional consequences. Important residues in Kindlins are traced and ranked according to their evolutionary significance. The structural and functional significance of high ranked residues is highlighted and validated by their known involvement in Kindlin associated diseases. In light of these findings, we hypothesize that FERM domain originated from a proto-Talin protein in unicellular or proto-multicellular organism and advent of multi-cellularity was accompanied by burst of FDCPs, which supported multi-cellularity functions required for complex organisms. This study helps in developing a better understanding of evolutionary history of FERM domain of FDCPs and the role of FERM domain in metazoan evolution.
Background: Clustering sequences into families has long been an important step in characterization of genes and proteins. There are many algorithms developed for this purpose, most of which are based on either direct similarity between gene pairs or some sort of network structure, where weights on edges of constructed graphs are based on similarity. However, conserved synteny is an important signal that can help distinguish homology and it has not been utilized to its fullest potential. Results: Here, we present GenFamClust, a pipeline that combines the network properties of sequence similarity and synteny to assess homology relationship and merge known homologs into groups of gene families. GenFamClust identifies homologs in a more informed and accurate manner as compared to similarity based approaches. We tested our method against the Neighborhood Correlation method on two diverse datasets consisting of fully sequenced genomes of eukaryotes and synthetic data. Conclusions: The results obtained from both datasets confirm that synteny helps determine homology and GenFamClust improves on Neighborhood Correlation method. The accuracy as well as the definition of synteny scores is the most valuable contribution of GenFamClust.
Fibroblasts are a main player in the tumor-inhibitory microenvironment. Upon tumor initiation and progression, fibroblasts can lose their tumor-inhibitory capacity and promote tumor growth. The molecular mechanisms that underlie this switch have not been defined completely. Previously, we identified four proteins over-expressed in cancer-associated fibroblasts and linked to Rho GTPase signaling. Here, we show that knocking out the Ras homolog family member A (RhoA) gene in normal fibroblasts decreased their tumor-inhibitory capacity, as judged by neighbor suppression in vitro and accompanied by promotion of tumor growth in vivo. This also induced PC3 cancer cell motility and increased colony size in 2D cultures. RhoA knockout in fibroblasts induced vimentin intermediate filament reorganization, accompanied by reduced contractile force and increased stiffness of cells. There was also loss of wide F-actin stress fibers and large focal adhesions. In addition, we observed a significant loss of a-smooth muscle actin, which indicates a difference between RhoA knockout fibroblasts and classic cancer-associated fibroblasts. In 3D collagen matrix, RhoA knockout reduced fibroblast branching and meshwork formation and resulted in more compactly clustered tumor-cell colonies in coculture with PC3 cells, which might boost tumor stem-like properties. Coculturing RhoA knockout fibroblasts and PC3 cells induced expression of proinflammatory genes in both. Inflammatory mediators may induce tumor cell stemness. Network enrichment analysis of transcriptomic changes, however, revealed that the Rho signaling pathway per se was significantly triggered only after coculturing with tumor cells. Taken together, our findings in vivo and in vitro indicate that Rho signaling governs the inhibitory effects by fibroblasts on tumor-cell growth.
The newly launched Affinity Binder Knockdown Initiative encourages antibody suppliers and users to join this public-private partnership, which uses crowdsourcing to collect characterization data on antibodies. Researchers are asked to share validation data from experiments where gene-editing techniques (such as siRNA or CRISPR) have been used to verify antibody binding. The initiative is launched under the aegis of Antibodypedia, a database designed to allow comparisons and scoring of publicly available antibodies towards human protein targets. What is known about an antibody is the foundation of the scoring and ranking system in Antibodypedia.
Antibodies are crucial for the study of human proteins and have been defined as one of the three pillars in the human chromosome-centric Human Proteome Project (CHPP). In this article the chromosome-centric structure has been used to analyze the availability of antibodies as judged by the presence within the portal Antibodypedia, a database designed to allow comparisons and scoring of publicly available antibodies toward human protein targets. This public database displays antibody data from more than one million antibodies toward human protein targets. A summary of the content in this knowledge resource reveals that there exist more than 10 antibodies to over 70% of all the putative human genes, evenly distributed over the 24 human chromosomes. The analysis also shows that at present, less than 10% of the putative human protein-coding genes (n = 1882) predicted from the genome sequence lack antibodies, suggesting that focused efforts from the antibody-based and mass spectrometry-based proteomic communities should be encouraged to pursue the analysis of these missing proteins. We show that Antibodypedia may be used to track the development of available and validated antibodies to the individual chromosomes, and thus the database is an attractive tool to identify proteins with no or few antibodies yet generated.
Microbial organisms are a vital part of our global ecosystem. Yet, our knowledge of them is still lacking. Direct sequencing of microbial communities, i.e. metagenomics, have enabled detailed studies of these microscopic organisms by inspection of their DNA sequences without the need to culture them. Furthermore, the development of modern high- throughput sequencing technologies have made this approach more powerful and cost-effective. Taken together, this has shifted the field of microbiology from previously being centered around microscopy and culturing studies, to largely consist of computational analyses of DNA sequences. One such computational analysis which is the main focus of this thesis, aims at reconstruction of the complete DNA sequence of an organism, i.e. its genome, directly from short metagenomic sequences.
This thesis consists of an introduction to the subject followed by five papers. Paper I describes a large metagenomic data resource spanning the Baltic Sea microbial communities. This dataset is complemented with a web-interface allowing researchers to easily extract and visualize detailed information. Paper II introduces a bioinformatic method which is able to reconstruct genomes from metagenomic data. This method, which is termed CONCOCT, is applied on Baltic Sea metagenomics data in Paper III and Paper V. This enabled the reconstruction of a large number of genomes. Analysis of these genomes in Paper III led to the proposal of, and evidence for, a global brackish microbiome. Paper IV presents a comparison between genomes reconstructed from metagenomes with single-cell sequenced genomes. This further validated the technique presented in Paper II as it was found to produce larger and more complete genomes than single-cell sequencing.
Aquatic microorganism are key drivers of global biogeochemical cycles and form the basis of aquatic food webs. However, there is still much left to be learned about these organisms and their interaction within specific environments, such as the Baltic Sea. Crucial information for such an understanding can be found within the genome sequences of organisms within the microbial community.
In this study, the previous set of Baltic Sea clusters, constructed by Hugert et al., is greatly expanded using a large set of metagenomic samples, spanning the environmental gradients of the Baltic Sea. In total, 124 samples were individually assembled and binned to obtain 2,032 Metagenome Assembled Genomes (MAGs), clustered into 353 prokaryotic and 14 eukaryotic species- level clusters. The prokaryotic genomes were widely distributed over the prokaryotic tree of life, representing 20 different phyla, while the eukaryotic genomes were mostly limited to the division of Chlorophyta. The large number of reconstructed genomes allowed us to identify key factors determining the quality of the genome reconstructions.
The Baltic Sea is heavily influenced of human activities of which we might not see the full implications. The genomes reported within this study will greatly aid further studies in our strive for an understanding of the Baltic Sea microbial ecosystem.
Shotgun sequencing enables the reconstruction of genomes from complex microbial communities, but because assembly does not reconstruct entire genomes, it is necessary to bin genome fragments. Here we present CONCOCT, a new algorithm that combines sequence composition and coverage across multiple samples, to automatically cluster contigs into genomes. We demonstrate high recall and precision on artificial as well as real human gut metagenome data sets.
Background: Prokaryotes dominate the biosphere and regulate biogeochemical processes essential to all life. Yet, our knowledge about their biology is for the most part limited to the minority that has been successfully cultured. Molecular techniques now allow for obtaining genome sequences of uncultivated prokaryotic taxa, facilitating in-depth analyses that may ultimately improve our understanding of these key organisms. Results: We compared results from two culture-independent strategies for recovering bacterial genomes: single-amplified genomes and metagenome-assembled genomes. Single-amplified genomes were obtained from samples collected at an offshore station in the Baltic Sea Proper and compared to previously obtained metagenome-assembled genomes from a time series at the same station. Among 16 single-amplified genomes analyzed, seven were found to match metagenome-assembled genomes, affiliated with a diverse set of taxa. Notably, genome pairs between the two approaches were nearly identical (average 99.51% sequence identity; range 98.77-99.84%) across overlapping regions (30-80% of each genome). Within matching pairs, the single-amplified genomes were consistently smaller and less complete, whereas the genetic functional profiles were maintained. For the metagenome-assembled genomes, only on average 3.6% of the bases were estimated to be missing from the genomes due to wrongly binned contigs. Conclusions: The strong agreement between the single-amplified and metagenome-assembled genomes emphasizes that both methods generate accurate genome information from uncultivated bacteria. Importantly, this implies that the research questions and the available resources are allowed to determine the selection of genomics approach for microbiome studies.
Background: Prokaryotes dominate the biosphere and regulate biogeochemical processes essential to all life. Yet, our knowledge about their biology is for the most part limited to the minority that has been successfully cultured. Molecular techniques now allow for obtaining genome sequences of uncultivated prokaryotic taxa, facilitating in-depth analyses that may ultimately improve our understanding of these key organisms.
Results: We compared results from two culture-independent strategies for recovering bacterial genomes: single-amplified genomes and metagenome-assembled genomes. Single-amplified genomes were obtained from samples collected at an offshore station in the Baltic Sea Proper and compared to previously obtained metagenome-assembled genomes from a time series at the same station. Among 16 single-amplified genomes analyzed, seven were found to match metagenome-assembled genomes, affiliated with a diverse set of taxa. Notably, genome pairs between the two approaches were nearly identical (>98.7% identity) across overlapping regions (30-80% of each genome). Within matching pairs, the single-amplified genomes were consistently smaller and less complete, whereas the genetic functional profiles were maintained. For the metagenome-assembled genomes, only on average 3.6% of the bases were estimated to be missing from the genomes due to wrongly binned contigs; the metagenome assembly was found to cause incompleteness to a higher degree than the binning procedure.
Conclusions: The strong agreement between the single-amplified and metagenome-assembled genomes emphasizes that both methods generate accurate genome information from uncultivated bacteria. Importantly, this implies that the research questions and the available resources are allowed to determine the selection of genomics approach for microbiome studies.
The Baltic Sea is one of the world’s largest brackish water bodies and is characterised by pronounced physicochemical gradients where microbes are the main biogeochemical catalysts. Meta-omic methods provide rich information on the composition of, and activities within microbial ecosystems, but are computationally heavy to perform. We here present the BAltic Sea Reference Metagenome (BARM), complete with annotated genes to facilitate further studies with much less computational effort. The assembly is constructed using 2.6 billion metagenomic reads from 81 water samples, spanning both spatial and temporal dimensions, and contains 6.8 million genes that have been annotated for function and taxonomy. The assembly is useful as a reference, facilitating taxonomic and functional annotation of additional samples by simply mapping their reads against the assembly. This capability is demonstrated by the successful mapping and annotation of 24 external samples. In addition, we present a public web interface, BalticMicrobeDB, for interactive exploratory analysis of the dataset.
The Baltic Sea is one of the world's largest brackish water bodies and is characterised by pronounced physicochemical gradients where microbes are the main biogeochemical catalysts. Meta-omic methods provide rich information on the composition of, and activities within, microbial ecosystems, but are computationally heavy to perform. We here present the Baltic Sea Reference Metagenome (BARM), complete with annotated genes to facilitate further studies with much less computational effort. The assembly is constructed using 2.6 billion metagenomic reads from 81 water samples, spanning both spatial and temporal dimensions, and contains 6.8 million genes that have been annotated for function and taxonomy. The assembly is useful as a reference, facilitating taxonomic and functional annotation of additional samples by simply mapping their reads against the assembly. This capability is demonstrated by the successful mapping and annotation of 24 external samples. In addition, we present a public web interface, BalticMicrobeDB, for interactive exploratory analysis of the dataset.
Here we describe the SweGen data set, a comprehensive map of genetic variation in the Swedish population. These data represent a basic resource for clinical genetics laboratories as well as for sequencing-based association studies by providing information on genetic variant frequencies in a cohort that is well matched to national patient cohorts. To select samples for this study, we first examined the genetic structure of the Swedish population using high-density SNP-array data from a nation-wide cohort of over 10 000 Swedish-born individuals included in the Swedish Twin Registry. A total of 1000 individuals, reflecting a cross-section of the population and capturing the main genetic structure, were selected for whole-genome sequencing. Analysis pipelines were developed for automated alignment, variant calling and quality control of the sequencing data. This resulted in a genome-wide collection of aggregated variant frequencies in the Swedish population that we have made available to the scientific community through the website https://swefreq.nbis.se. A total of 29.2 million single-nucleotide variants and 3.8 million indels were detected in the 1000 samples, with 9.9 million of these variants not present in current databases. Each sample contributed with an average of 7199 individual-specific variants. In addition, an average of 8645 larger structural variants (SVs) were detected per individual, and we demonstrate that the population frequencies of these SVs can be used for efficient filtering analyses. Finally, our results show that the genetic diversity within Sweden is substantial compared with the diversity among continental European populations, underscoring the relevance of establishing a local reference data set.
Viral potassium channels (Kcv) are homologous to the pore module of complex -selective ion channels of cellular organisms. Due to their relative simplicity, they have attracted interest towards understanding the principles of conduction and channel gating. In this work, we construct a homology model of the open state, which we validate by studying the binding of known blockers and by monitoring ion conduction through the channel. Molecular dynamics simulations of this model reveal that the re-orientation of selectivity filter carbonyl groups coincides with the transport of potassium ions, suggesting a possible mechanism for fast gating. In addition, we show that the voltage sensitivity of this mechanism can originate from the relocation of potassium ions inside the selectivity filter. We also explore the interaction of with the surrounding bilayer and observe the binding of lipids in the area between two adjacent subunits. The model is available to the scientific community to further explore the structure/function relationship of Kcv channels.
Background: Species of the crenarchaeon Sulfolobus harbour three replication origins in their single circular chromosome that are synchronously initiated during replication. Results: We demonstrate that global gene expression in two Sulfolobus species is highly biased, such that early replicating genome regions are more highly expressed at all three origins. The bias by far exceeds what would be anticipated by gene dosage effects alone. In addition, early replicating regions are denser in archaeal core genes (enriched in essential functions), display lower intergenic distances, and are devoid of mobile genetic elements. Conclusion: The strong replication-biased structuring of the Sulfolobus chromosome implies that the multiple replication origins serve purposes other than simply shortening the time required for replication. The higher-level chromosomal organisation could be of importance for minimizing the impact of DNA damage, and may also be linked to transcriptional regulation.
Detailed knowledge of protein changes in cerebrospinal fluid (CSF) across healthy and diseased individuals would provide a better understanding of the onset and progression of neurodegenerative disorders. In this study, we selected 20 brain-enriched proteins previously identified in CSF by antibody suspension bead arrays (SBA) to be potentially biomarkers for Alzheimer's disease (AD) and verified these using an orthogonal approach. We examined the same set of 94 CSF samples from patients affected by AD (including preclinical and prodromal), mild cognitive impairment (MCI), non-AD dementia and healthy individuals, which had previously been analyzed by SBA. Twenty-eight parallel reaction monitoring (PRM) assays were developed and 13 of them could be validated for protein quantification. Antibody profiles were verified by PRM. For seven proteins, the antibody profiles were highly correlated with the PRM results (r > 0.7) and GAP43, VCAM1 and PSAP were identified as potential markers of preclinical AD. In conclusion, we demonstrate the usefulness of targeted mass spectrometry as a tool for the orthogonal verification of antibody profiling data, suggesting that these complementary methods can be successfully applied for comprehensive exploration of CSF protein levels in neurodegenerative disorders.
Background: Reduced membranous expression of the cytoskeleton-associated protein ezrin has previously been demonstrated to correlate with tumour progression and poor prognosis in patients with T1G3 urothelial cell carcinoma of the bladder treated with non-maintenance Bacillus Calmette-Guerin (n = 92), and the associations with adverse clinicopathological factors have been validated in another, unselected, cohort (n = 104). In the present study, we examined the prognostic significance of ezrin expression in urothelial bladder cancer in a total number of 442 tumours from two independent patient cohorts. Methods: Immunohistochemical expression of ezrin was evaluated in tissue microarrays with tumours from one retrospective cohort of bladder cancer (n = 110; cohort I) and one population-based cohort (n = 342; cohort II). Classification regression tree analysis was applied for selection of prognostic cutoff. Kaplan-Meier analysis, log rank test and Cox regression proportional hazards' modeling were used to evaluate the impact of ezrin on 5-year overall survival (OS), disease-specific survival (DSS) and progression-free survival (PFS). Results: Ezrin expression could be evaluated in tumours from 100 and 342 cases, respectively. In both cohorts, reduced membranous ezrin expression was significantly associated with more advanced T-stage (p < 0.001), high grade tumours (p < 0.001), female sex (p = 0.040 and p = 0.013), and membranous expression of podocalyxin-like protein (p < 0.001 and p = 0.009). Moreover, reduced ezrin expression was associated with a significantly reduced 5-year OS in both cohorts (HR = 3.09 95% CI 1.71-5.58 and HR = 2.15(1.51-3.06), and with DSS in cohort II (HR = 2.77, 95% CI 1.78-4.31). This association also remained significant in adjusted analysis in Cohort I (HR1.99, 95% CI 1.05-3.77) but not in Cohort II. In pTa and pT1 tumours in cohort II, there was no significant association between ezrin expression and time to progression. Conclusions: The results from this study validate previous findings of reduced membranous ezrin expression in urothelial bladder cancer being associated with unfavourable clinicopathological characteristics and an impaired survival. The utility of ezrin as a prognostic biomarker in transurethral resection specimens merits further investigation.
Background: The sequencing of the human genome has opened doors for global gene expression profiling, and the immense amount of data will lay an important ground for future studies of normal and diseased tissues. The Human Protein Atlas project aims to systematically map the human gene and protein expression landscape in a multitude of normal healthy tissues as well as cancers, enabling the characterization of both housekeeping genes and genes that display a tissue-specific expression pattern. This article focuses on identifying and describing genes with an elevated expression in four lymphohematopoietic tissue types (bone marrow, lymph node, spleen and appendix), based on the Human Protein Atlas-strategy that combines high throughput transcriptomics with affinity-based proteomics. Results: An enriched or enhanced expression in one or more of the lymphohematopoietic tissues, compared to other tissue-types, was seen for 693 out of 20,050 genes, and the highest levels of expression were found in bone marrow for neutrophilic and erythrocytic genes. A majority of these genes were found to constitute well-characterized genes with known functions in lymphatic or hematopoietic cells, while others are not previously studied, as exemplified by C19ORF59. Conclusions: In this paper we present a strategy of combining next generation RNA-sequencing with in situ affinity-based proteomics in order to identify and describe new gene targets for further research on lymphatic or hematopoietic cells and tissues. The results constitute lists of genes with enriched or enhanced expression in the four lymphohematopoietic tissues, exemplified also on protein level with immunohistochemical images.
The discovery of oestrogen receptor beta (ER beta/ESR2) was a landmark discovery. Its reported expression and homology with breast cancer pharmacological target ER alpha (ESR1) raised hopes for improved endocrine therapies. After 20 years of intense research, this has not materialized. We here perform a rigorous validation of 13 anti-ER beta antibodies, using well-characterized controls and a panel of validation methods. We conclude that only one antibody, the rarely used monoclonal PPZ0506, specifically targets ER beta in immunohistochemistry. Applying this antibody for protein expression profiling in 44 normal and 21 malignant human tissues, we detect ER beta protein in testis, ovary, lymphoid cells, granulosa cell tumours, and a subset of malignant melanoma and thyroid cancers. We do not find evidence of expression in normal or cancerous human breast. This expression pattern aligns well with RNA-seq data, but contradicts a multitude of studies. Our study highlights how inadequately validated antibodies can lead an exciting field astray.
Scientists gathered to discuss the necessity, feasibility, and challenges of generating a quantitative catalog of the components in human cells that is essential for our understanding of human physiology in health and disease and to support future breakthroughs in treating diseases. This report summarizes the discussion that emerged at the Human Quantitative Dynamics Workshop held in Bethesda, MD, USA, in December 2015.