Despite decades of accumulated knowledge about proteins and their post-translational modifications (PTMs), numerous questions remain regarding their molecular composition and biological function. One of the most fundamental queries is the extent to which the combinations of DNA-, RNA-and PTM-level variations explode the complexity of the human proteome. Here, we outline what we know from current databases and measurement strategies including mass spectrometry-based proteomics. In doing so, we examine prevailing notions about the number of modifications displayed on human proteins and how they combine to generate the protein diversity underlying health and disease. We frame central issues regarding determination of protein-level variation and PTMs, including some paradoxes present in the field today. We use this framework to assess existing data and to ask the question, "How many distinct primary structures of proteins (proteoforms) are created from the 20,300 human genes?" We also explore prospects for improving measurements to better regularize protein-level biology and efficiently associate PTMs to function and phenotype.
In higher eukaryotes many genes encode protein isoforms whose properties and biological roles are often poorly characterized. Here we describe systematic approaches for detection of either distinct isoforms, or separate pools of the same isoform, with differential biological properties. Using information from ion intensities we have estimated protein abundance levels and using rates of change in stable isotope labeling with amino acids in cell culture isotope ratios we measured turnover rates and subcellular distribution for the HeLa cell proteome. Protein isoforms were detected using three data analysis strategies that evaluate differences between stable isotope labeling with amino acids in cell culture isotope ratios for specific groups of peptides within the total set of peptides assigned to a protein. The candidate approach compares stable isotope labeling with amino acids in cell culture isotope ratios for predicted isoform- specific peptides, with ratio values for peptides shared by all the isoforms. The rule of thirds approach compares the mean isotope ratio values for all peptides in each of three equal segments along the linear length of the protein, assessing differences between segment values. The three in a row approach compares mean isotope ratio values for each sequential group of three adjacent peptides, assessing differences with the mean value for all peptides assigned to the protein. Protein isoforms were also detected and their properties evaluated by fractionating cell extracts on one- dimensional SDS- PAGE prior to trypsin digestion and MS analysis and independently evaluating isotope ratio values for the same peptides isolated from different gel slices. The effect of protein phosphorylation on turnover rates was analyzed by comparing mean turnover values calculated for all peptides assigned to a protein, either including, or excluding, values for cognate phosphopeptides. Collectively, these experimental and analytical approaches provide a framework for expanding the func- tional annotation of the genome.
We here present a comparative genome, transcriptome and functional network analysis of three human cancer cell lines (A431, U251MG and U2OS), and investigate their relation to protein expression. Gene copy numbers significantly influenced corresponding transcript levels; their effect on protein levels was less pronounced. We focused on genes with altered mRNA and/or protein levels to identify those active in tumor maintenance. We provide comprehensive information for the three genomes and demonstrate the advantage of integrative analysis for identifying tumor-related genes amidst numerous background mutations by relating genomic variation to expression/protein abundance data and use gene networks to reveal implicated pathways.
Fibroblasts are a main player in the tumor-inhibitory microenvironment. Upon tumor initiation and progression, fibroblasts can lose their tumor-inhibitory capacity and promote tumor growth. The molecular mechanisms that underlie this switch have not been defined completely. Previously, we identified four proteins over-expressed in cancer-associated fibroblasts and linked to Rho GTPase signaling. Here, we show that knocking out the Ras homolog family member A (RhoA) gene in normal fibroblasts decreased their tumor-inhibitory capacity, as judged by neighbor suppression in vitro and accompanied by promotion of tumor growth in vivo. This also induced PC3 cancer cell motility and increased colony size in 2D cultures. RhoA knockout in fibroblasts induced vimentin intermediate filament reorganization, accompanied by reduced contractile force and increased stiffness of cells. There was also loss of wide F-actin stress fibers and large focal adhesions. In addition, we observed a significant loss of a-smooth muscle actin, which indicates a difference between RhoA knockout fibroblasts and classic cancer-associated fibroblasts. In 3D collagen matrix, RhoA knockout reduced fibroblast branching and meshwork formation and resulted in more compactly clustered tumor-cell colonies in coculture with PC3 cells, which might boost tumor stem-like properties. Coculturing RhoA knockout fibroblasts and PC3 cells induced expression of proinflammatory genes in both. Inflammatory mediators may induce tumor cell stemness. Network enrichment analysis of transcriptomic changes, however, revealed that the Rho signaling pathway per se was significantly triggered only after coculturing with tumor cells. Taken together, our findings in vivo and in vitro indicate that Rho signaling governs the inhibitory effects by fibroblasts on tumor-cell growth.
The newly launched Affinity Binder Knockdown Initiative encourages antibody suppliers and users to join this public-private partnership, which uses crowdsourcing to collect characterization data on antibodies. Researchers are asked to share validation data from experiments where gene-editing techniques (such as siRNA or CRISPR) have been used to verify antibody binding. The initiative is launched under the aegis of Antibodypedia, a database designed to allow comparisons and scoring of publicly available antibodies towards human protein targets. What is known about an antibody is the foundation of the scoring and ranking system in Antibodypedia.
Antibodies are crucial for the study of human proteins and have been defined as one of the three pillars in the human chromosome-centric Human Proteome Project (CHPP). In this article the chromosome-centric structure has been used to analyze the availability of antibodies as judged by the presence within the portal Antibodypedia, a database designed to allow comparisons and scoring of publicly available antibodies toward human protein targets. This public database displays antibody data from more than one million antibodies toward human protein targets. A summary of the content in this knowledge resource reveals that there exist more than 10 antibodies to over 70% of all the putative human genes, evenly distributed over the 24 human chromosomes. The analysis also shows that at present, less than 10% of the putative human protein-coding genes (n = 1882) predicted from the genome sequence lack antibodies, suggesting that focused efforts from the antibody-based and mass spectrometry-based proteomic communities should be encouraged to pursue the analysis of these missing proteins. We show that Antibodypedia may be used to track the development of available and validated antibodies to the individual chromosomes, and thus the database is an attractive tool to identify proteins with no or few antibodies yet generated.
Quantitative optical microscopy-an emerging, transformative approach to single-cell biology-has seen dramatic methodological advancements over the past few years. However, its impact has been hampered by challenges in the areas of data generation, management, and analysis. Here we outline these technical and cultural challenges and provide our perspective on the trajectory of this field, ushering in a new era of quantitative, data-driven microscopy. We also contrast it to the three decades of enormous advances in the field of genomics that have significantly enhanced the reproducibility and wider adoption of a plethora of genomic approaches.
Information on protein localization on the subcellular level is important to map and characterize the proteome and to better understand cellular functions of proteins. Here we report on a pilot study of 466 proteins in three human cell lines aimed to allow large scale confocal microscopy analysis using protein-specific antibodies. Approximately 3000 high resolution images were generated, and more than 80% of the analyzed proteins could be classified in one or multiple subcellular compartment(s). The localizations of the proteins showed, in many cases, good agreement with the Gene Ontology localization prediction model. This is the first large scale antibody-based study to localize proteins into subcellular compartments using antibodies and confocal microscopy. The results suggest that this approach might be a valuable tool in conjunction with predictive models for protein localization.
An attractive path forward in proteomics is to experimentally annotate the human protein complement of the genome in a genecentric manner. Using antibodies, it might be possible to design protein-specific probes for a representative protein from every protein-coding gene and to subsequently use the antibodies for systematical analysis of cellular distribution and subcellular localization of proteins in normal and disease tissues. A new version (4.0) of the Human Protein Atlas has been developed in a genecentric manner with the inclusion of all human genes and splice variants predicted from genome efforts together with a visualization of each protein with characteristics such as predicted membrane regions, signal peptide, and protein domains and new plots showing the uniqueness (sequence similarity) of every fraction of each protein toward all other human proteins. The new version is based on tissue profiles generated from 6120 antibodies with more than five million immunohistochemistry-based images covering 5067 human genes, corresponding to similar to 25% of the human genome. Version 4.0 includes a putative list of members in various protein classes, both functional classes, such as kinases, transcription factors, G-protein-coupled receptors, etc., and project-related classes, such as candidate genes for cancer or cardiovascular diseases. The exact antigen sequence for the internally generated antibodies has also been released together with a visualization of the application-specific validation performed for each antibody, including a protein array assay, Western blot analysis, immunohistochemistry, and, for a large fraction, immunofluorescence-based confocal microscopy. New search functionalities have been added to allow complex queries regarding protein expression profiles, protein classes, and chromosome location. The new version of the protein atlas thus is a resource for many areas of biomedical research, including protein science and biomarker discovery.
Glioblastoma is the most common primary Glioblastoma Cell Surface Capturing brain tumor in adults with low average survival time after diagnosis. In order to improve glioblastoma treatment, new drug-accessible targets need to be identified. Cell surface glycoproteins are prime drug targets due to their accessibility at the surface of cancer cells. To overcome the limited availability of suitable antibodies for cell surface protein detection, we performed a comprehensive mass spectrometric investigation of the glioblastoma surfaceome. Our combined cell surface capturing analysis of primary ex vivo glioblastoma cell lines in combination with established glioblastoma cell lines revealed 633 N-glycoproteins, which vastly extends the known data of surfaceome drug targets at subcellular resolution. We provide direct evidence of common glioblastoma cell surface glycoproteins and an approximate estimate of their abundances, information that could not be derived from genomic and/or transcriptomic glioblastoma studies. Apart from our pharmaceutically valuable repertoire of already and potentially drug-accessible cell surface glycoproteins, we built a mass-spectrometry-based toolbox enabling directed, sensitive, and repetitive glycoprotein measurements for clinical follow-up studies. The included Skyline Glioblastoma SRM assay library provides an elevated starting point for parallel testing of the abundance level of the detected glioblastoma surfaceome members in future drug perturbation experiments.
Our previous studies revealed that the expression of the 19-kDa protein prenylated Rab acceptor 1 domain family, member 2 (PRAF2) is elevated in cancer tissues of the breast, colon, lung, and ovary, when compared to noncancerous tissues of paired samples. PRAF2 mRNA expression also correlated with several genetic and clinical features and is a candidate prognostic marker in the pediatric cancer neuroblastoma. The PRAF2-related proteins, PRAF1 and PRAF3, play multiple roles in cellular processes, including endo/exocytic vesicle trafficking and glutamate uptake. PRAF2 shares a high sequence homology with these family members, but its function remains unknown. In this study, we examined PRAF2 mRNA and protein expression in 20 different human cancer types using Affymetrix microarray and human tissue microarray (TMA) analyses, respectively. In addition, we investigated the subcellular distribution of PRAF2 by immunofluorescence microscopy and cell fractionation studies. PRAF2 mRNA and protein expression was elevated in several cancer tissues with highest levels in malignant glioma. At the molecular level, we detected native PRAF2 in small, vesicle-like structures throughout the cytoplasm as well as in and around cell nuclei of U-87 malignant glioma cells. We further found that monomeric and dimeric forms of PRAF2 are associated with different cell compartments, suggesting possible functional differences. Importantly, PRAF2 down-regulation by RNA interference significantly reduced the cell viability, migration, and invasiveness of U-87 cells. This study shows that PRAF2 expression is elevated in various tumors with exceptionally high expression in malignant gliomas, and PRAF2 therefore presents a candidate molecular target for therapeutic intervention. (Cancer Sci 2010).
The cell cycle coordinates core functions such as replication and cell division. However, cell-cycle-regulated transcription in the control of non-core functions, such as cell identity maintenance through specific transcription factors (TFs) and signalling pathways remains unclear. Here, we provide a resource consisting of mapped transcriptomes in unsynchro-nized HeLa and U2OS cancer cells sorted for cell cycle phase by Fucci reporter expression. We developed a novel algorithm for data analysis that enables efficient visualization and data comparisons and identified cell cycle synchronization of Notch signalling and TFs associated with development. Furthermore, the cell cycle synchronizes with the circadian clock, providing a possible link between developmental transcriptional networks and the cell cycle. In conclusion we find that cell cycle synchronized transcriptional patterns are temporally compartmentalized and more complex than previously anticipated, involving genes, which control cell identity and development.
An important topic of discussion in proteomics is the degree of correlation of RNA and protein levels in cells, tissues and organs. In this study, the difference in protein and mRNA levels for a number of selected gene targets were investigated across six human cell lines using quantitative proteomics and next generation sequencing-based transcriptomics. The copy numbers of 32 proteins were determined using an absolute quantitative proteomics approach (PrEST-SILAC), where heavy isotope-labeled protein fragments were used as internal standards. A cross evaluation of protein copy numbers determined by mass spectrometry and staining profiles using immunohistochemistry showed good correlation. The mRNA levels were determined using RNA sequencing based on digital counting of sequencing reads and the levels determined as FPKM values. Comparison of the relative variations in mRNA and protein levels for individual genes across the six cell lines showed correlation between protein and mRNA levels, including six genes with high variability in expression levels in the six cell lines resulting in an average correlation of 0.9 (Spearman's rank coefficient). In summary, the analysis of the selected protein targets supports the conclusion that the translation rate across cell lines correlates for a particular gene, suggesting that individual protein levels can be predicted from the respective mRNA levels by defining the relation between protein and mRNA, specific for each human gene.
All human diseases involve proteins, yet our current tools to characterize and quantify them are limited. To better elucidate proteins across space, time, and molecular composition, we provide a >10 years of projection for technologies to meet the challenges that protein biology presents. With a broad perspective, we discuss grand opportunities to transition the science of proteomics into a more propulsive enterprise. Extrapolating recent trends, we describe a next generation of approaches to define, quantify, and visualize the multiple dimensions of the proteome, thereby transforming our understanding and interactions with human disease in the coming decade.
Imaging is a powerful approach for studying protein expression and has the advantage over other methodologies in providing spatial informationin situat single cell level. Using immunofluorescence and confocal microscopy, detailed information of subcellular distribution of proteins can be obtained. While adherent cells of different tissue origin are relatively easy to prepare for imaging applications, non-adherent cells from hematopoietic origin, present a challenge due to their poor attachment to surfaces and subsequent loss of a substantial fraction of the cells. Still, these cell types represent an important part of the human proteome and express genes that are not expressed in adherent cell types. In the era of cell mapping efforts, overcoming the challenge with suspension cells for imaging applications would enable systematic profiling of hematopoietic cells. In this work, we successfully established an immunofluorescence protocol for preparation of suspension cell lines, peripheral blood mononucleated cells (PBMC) and human platelets on an adherent surface. The protocol is based on a multi-well plate format with automated sample preparation, allowing for robust high throughput imaging applications. In combination with confocal microscopy, the protocol enables systematic exploration of protein localization to all major subcellular structures.
The NUDIX enzymes are involved in cellular metabolism and homeostasis, as well as mRNA processing. Although highly conserved throughout all organisms, their biological roles and biochemical redundancies remain largely unclear. To address this, we globally resolve their individual properties and inter-relationships. We purify 18 of the human NUDIX proteins and screen 52 substrates, providing a substrate redundancy map. Using crystal structures, we generate sequence alignment analyses revealing four major structural classes. To a certain extent, their substrate preference redundancies correlate with structural classes, thus linking structure and activity relationships. To elucidate interdependence among the NUDIX hydrolases, we pairwise deplete them generating an epistatic interaction map, evaluate cell cycle perturbations upon knockdown in normal and cancer cells, and analyse their protein and mRNA expression in normal and cancer tissues. Using a novel FUSION algorithm, we integrate all data creating a comprehensive NUDIX enzyme profile map, which will prove fundamental to understanding their biological functionality.
In tumor tissues, hypoxia is a commonly observed feature resulting from rapidly proliferating cancer cells outgrowing their surrounding vasculature network. Transformed cancer cells are known to exhibit phenotypic alterations, enabling continuous proliferation despite a limited oxygen supply. The four-step isogenic BJ cell model enables studies of defined steps of tumorigenesis: the normal, immortalized, transformed, and metastasizing stages. By transcriptome profiling under atmospheric and moderate hypoxic (3% O2) conditions, we observed that despite being highly similar, the four cell lines of the BJ model responded strikingly different to hypoxia. Besides corroborating many of the known responses to hypoxia, we demonstrate that the transcriptome adaptation to moderate hypoxia resembles the process of malignant transformation. The transformed cells displayed a distinct capability of metabolic switching, reflected in reversed gene expression patterns for several genes involved in oxidative phosphorylation and glycolytic pathways. By profiling the stage-specific responses to hypoxia, we identified ASS1 as a potential prognostic marker in hypoxic tumors. This study demonstrates the usefulness of the BJ cell model for highlighting the interconnection of pathways involved in malignant transformation and hypoxic response.
After a century of research, the human centrosome continues to fascinate. Based on immunofluorescence and confocal microscopy, an extensive inventory of the protein components of the human centrosome, and the centriolar satellites, with the important contribution of over 300 novel proteins localizing to these compartments is presented. A network of candidate centrosome proteins involved in ubiquitination, including six interaction partners of the Kelch-like protein 21, and an additional network of protein phosphatases, together supporting the suggested role of the centrosome as an interactive hub for cell signaling, is identified. Analysis of multi-localization across cellular organelles analyzed within the Human Protein Atlas (HPA) project shows how multi-localizing proteins are particularly overrepresented in centriolar satellites, supporting the dynamic nature and wide range of functions for this compartment. In summary, the spatial dissection of the human centrosome and centriolar satellites described here provides a comprehensive knowledgebase for further exploration of their proteomes.
The transformation of normal cells to malignant, metastatic tumor cells is a multistep process caused by the sequential acquirement of genetic changes. To identify these changes, we compared the transcriptomes and levels and distribution of proteins in a four-stage cell model of isogenically matched normal, immortalized, transformed, and metastatic human cells, using deep transcriptome sequencing and immunofluorescence microscopy. The data show that similar to 6% (n = 1,357) of the human protein-coding genes are differentially expressed across the stages in the model. Interestingly, the majority of these genes are down-regulated, linking malignant transformation to dedifferentiation. The up-regulated genes are mainly components that control cellular proliferation, whereas the down-regulated genes consist of proteins exposed on or secreted from the cell surface. As many of the identified gene products control basic cellular functions that are defective in cancers, the data provide candidates for follow-up studies to investigate their functional roles in tumor formation. When we further compared the expression levels of four of the identified proteins in clinical cancer cohorts, similar differences were observed between benign and cancer cells, as in the cell model. This shows that this comprehensive demonstration of the molecular changes underlying malignant transformation is a relevant model to study the process of tumor formation.
One of the major challenges of a chromosome-centric proteome project is to explore in a systematic manner the potential proteins identified from the chromosomal genome sequence, but not yet characterized on a protein level. Here, we describe the use of RNA deep sequencing to screen human cell lines for RNA profiles and to use this information to select cell lines suitable for characterization of the corresponding gene product. In this manner, the subcellular localization of proteins can be analyzed systematically using antibody-based confocal microscopy. We demonstrate the usefulness of selecting cell lines with high expression levels of RNA transcripts to increase the likelihood of high quality immunofluorescence staining and subsequent successful subcellular localization of the corresponding protein. The results show a path to combine transcriptomics with affinity proteomics to characterize the proteins in a gene- or chromosome-centric manner.
Autophagy is one of the major intracellular catabolic pathways, but little is known about the composition of autophagosomes. To study the associated proteins, we isolated autophagosomes from human breast cancer cells using two different biochemical methods and three stimulus types: amino acid deprivation or rapamycin or concanamycin A treatment. The autophagosome- associated proteins were dependent on stimulus, but a core set of proteins was stimulus- independent. Remarkably, proteasomal proteins were abundant among the stimulus- independent common autophagosome- associated proteins, and the activation of autophagy significantly decreased the cellular proteasome level and activity supporting interplay between the two degradation pathways. A screen of yeast strains defective in the orthologs of the human genes encoding for a common set of autophagosome- associated proteins revealed several regulators of autophagy, including subunits of the retromer complex. The combined spatiotemporal proteomic and genetic data sets presented here provide a basis for further characterization of autophagosome biogenesis and cargo selection.
Autoimmune diseases disproportionately affect females more than males. The XX sex chromosome complement is strongly associated with susceptibility to autoimmunity. Xist long non-coding RNA (lncRNA) is expressed only in females to randomly inactivate one of the two X chromosomes to achieve gene dosage compensation. Here, we show that the Xist ribonucleoprotein (RNP) complex comprising numerous autoantigenic components is an important driver of sex-biased autoimmunity. Inducible transgenic expression of a non-silencing form of Xist in male mice introduced Xist RNP complexes and sufficed to produce autoantibodies. Male SJL/J mice expressing transgenic Xist developed more severe multi-organ pathology in a pristane-induced lupus model than wild-type males. Xist expression in males reprogrammed T and B cell populations and chromatin states to more resemble wild-type females. Human patients with autoimmune diseases displayed significant autoantibodies to multiple components of XIST RNP. Thus, a sex-specific lncRNA scaffolds ubiquitous RNP components to drive sex-biased immunity.
AThe combination of immuno-based methods and mass spectrometry detection has great potential in the field of quantitative proteomics. Here, we describe a new method (immuno-SILAC) for the absolute quantification of proteins in complex samples based on polyclonal antibodies and stable isotope-labeled recombinant protein fragments to allow affinity enrichment prior to mass spectrometry analysis and accurate quantification. We took advantage of the antibody resources publicly available from the Human Protein Atlas project covering more than 80% of all human protein-coding genes. Epitope mapping revealed that a majority of the polyclonal antibodies recognized multiple linear epitopes, and based on these results, a semi-automated method was developed for peptide enrichment using polyclonal antibodies immobilized on protein A-coated magnetic beads. A protocol based on the simultaneous multiplex capture of more than 40 protein targets showed that approximately half of the antibodies enriched at least one functional peptide detected in the subsequent mass spectrometry analysis. The approach was further developed to also generate quantitative data via the addition of heavy isotope-labeled recombinant protein fragment standards prior to trypsin digestion. Here, we show that we were able to use small amounts of antibodies (50 ng per target) in this manner for efficient multiplex analysis of quantitative levels of proteins in a human HeLa cell lysate. The results suggest that polyclonal antibodies generated via immunization of recombinant protein fragments could be used for the enrichment of target peptides to allow for rapid mass spectrometry analysis taking advantage of a substantial reduction in sample complexity. The possibility of building up a proteome-wide resource for immuno-SILAC assays based on publicly available antibody resources is discussed.
An important issue for molecular biology is to establish if transcript levels of a given gene can be used as proxies for the corresponding protein levels. Here, we have developed a targeted proteomics approach for a set of human non-secreted proteins based on Parallel Reaction Monitoring to measure, at steady-state conditions, absolute protein copy numbers across human tissues and cell lines and compared these levels with the corresponding mRNA levels using transcriptomics. The study shows that the transcript and protein levels do not correlate well unless a gene-specific RNA-to-protein (RTP) conversion factor independent of the tissue-type is introduced, thus significantly enhancing the predictability of protein copy numbers from RNA levels. The results show that the RTP-ratio varies significantly with a few hundred copies per mRNA molecule for some genes to several hundred thousands protein copies per mRNA molecule for others. In conclusion, our data suggests that transcriptome analysis can be used as a tool to predict the protein copy numbers per cell, thus forming an attractive link between the field of genomics and proteomics.
An important issue for molecular biology is to establish whether transcript levels of a given gene can be used as proxies for the corresponding protein levels. Here, we have developed a targeted proteomics approach for a set of human non-secreted proteins based on parallel reaction monitoring to measure, at steady-state conditions, absolute protein copy numbers across human tissues and cell lines and compared these levels with the corresponding mRNA levels using transcriptomics. The study shows that the transcript and protein levels do not correlate well unless a gene-specific RNA-to-protein (RTP) conversion factor independent of the tissue type is introduced, thus significantly enhancing the predictability of protein copy numbers from RNA levels. The results show that the RTP ratio varies significantly with a few hundred copies per mRNA molecule for some genes to several hundred thousands of protein copies per mRNA molecule for others. In conclusion, our data suggest that transcriptome analysis can be used as a tool to predict the protein copy numbers per cell, thus forming an attractive link between the field of genomics and proteomics.
There is a great need for standardized validation methods for antibody specificity and selectivity. Here, we describe the use of orthogonal methods in which the specificity of an antibody in a particular application is determined based on correlation of protein abundance across several samples using an antibody-independent method. We show that pair-wise correlation between orthogonal samples can be used to score the specificity of antibodies in a standardized manner using a test panel of human cell lines. Here, we investigated two independent methods for validation of antibodies in Western blot applications, namely transcriptomics and targeted proteomics and we show that the two methods yield similar, but not identical results. The orthogonal methods can also be used to investigate on- and off- target binding for antibodies with multiple bands in the Western blot assay. In conclusion, orthogonal methods for antibody validation provide an attractive strategy for systematic validation of antibodies in a quantitative manner.
A gene-centric Human Proteome Project has been proposed to characterize the human protein-coding genes in a chromosome-centered manner to understand human biology and disease. Here, we report on the protein evidence for all genes predicted from the genome sequence based on manual annotation from literature (UniProt), antibody-based profiling in cells, tissues and organs and analysis of the transcript profiles using next generation sequencing in human cell lines of different origins. We estimate that there is good evidence for protein existence for 69% (n = 13985) of the human protein-coding genes, while 23% have only evidence on the RNA level and 7% still lack experimental evidence. Analysis of the expression patterns shows few tissue-specific proteins and approximately half of the genes expressed in all the analyzed cells. The status for each gene with regards to protein evidence is visualized in a chromosome-centric manner as part of a new version of the Human Protein Atlas (www.proteinatlas.org).
The subcellular locations of proteins are closely related to their function and constitute an essential aspect for understanding the complex machinery of living cells. A systematic effort has been initiated to map the protein distribution in three functionally different cell lines with the aim to provide a subcellular localization index for at least one representative protein from all human protein-encoding genes. Here, we present the results of over 4,000 proteins mapped to 16 subcellular compartments. The results indicate a ubiquitous protein expression with a majority of the proteins found in all three cell lines and a large portion localized to two or more compartments. The inter-relationships between the subcellular compartments are visualized in a protein-compartment network based on all detected proteins. Hierarchical clustering was performed to determine how closely related the organelles are in terms of protein constituents and compare the proteins detected in each cell type. Our results show distinct organelle proteomes, well conserved across the cell types, and demonstrate that biochemically similar organelles are grouped together.
The mitochondrial human proteome project (mt-HPP) was initiated by the Italian HPP group as a part of both the chromosome-centric initiative (C-HPP) and the "biology and disease driven" initiative (B/D-HPP). In recent years several reports highlighted how mitochondrial biology and disease are regulated by specific interactions with non-mitochondrial proteins. Thus, it is of great relevance to extend our present view of the mitochondrial proteome not only to those proteins that are encoded by or transported to mitochondria, but also to their interactors that take part in mitochondria functionality. Here, we propose a graphical representation of the functional mitochondrial proteome by retrieving mitochondrial proteins from the NeXtProt database and adding to the network their interactors as annotated in the IntAct database. Notably, the network may represent a reference to map all the proteins that are currently being identified in mitochondrial proteomics studies.
Cellular heterogeneity is an important biological phenomenon observed across space and time in human tissues. Imaging-based spatial proteomic technologies can provide fruitful new readouts of phenotypic states for individual cells at subcellular resolution, which may help unravel the roles of non-genetic cellular heterogeneity in tumorigenesis and drug resistance.
Development of molecules with the ability to selectively inhibit particular protein-protein interactions is important in providing tools for understanding cell biology In this work, we describe efforts to select small Ras- and Raf-specific three-helix bundle affibody binding proteins capable of inhibiting the interaction between H-Ras and Raf-1, from a combinatorial library displayed on bacteriophage Target-specific variants with typically high nanomolar or low micromolar affinities (K-D) could be selected successfully against both proteins, as shown by dot blot, ELISA and real-time biospecific interaction analyses Affibody molecule variants selected against H-Ras were shown to bind epitopes overlapping each other at a site that differed from that at which H-Ras interacts with Raf-1 In contrast, an affibody molecule isolated during selection against Raf-1 was shown to effectively inhibit the interaction between H-Ras and Raf-1 in a dose-dependent manner Possible intracellular applications of the selected affibody molecules are discussed
Huge amounts of data are generated in genome wide experiments, designed to investigate diseases with complex genetic causes. Follow up of all potential leads produced by such experiments is currently cost prohibitive and time consuming. Gene prioritization tools alleviate these constraints by directing further experimental efforts towards the most promising candidate targets. Recently a gene prioritization tool called MaxLink was shown to outperform other widely used state-of-the-art prioritization tools in a large scale in silico benchmark. An experimental validation of predictions made by MaxLink has however been lacking. In this study we used Fluorescence Resonance Energy Transfer, an established experimental technique for detection of protein-protein interactions, to validate potential cancer genes predicted by MaxLink. Our results provide confidence in the use of MaxLink for selection of new targets in the battle with polygenic diseases.
DeepImageJ is a user-friendly solution that enables the generic use of pre-trained deep learning models for biomedical image analysis in ImageJ. The deepImageJ environment gives access to the largest bioimage repository of pre-trained deep learning models (BioImage Model Zoo). Hence, nonexperts can easily perform common image processing tasks in life-science research with deep learning-based tools including pixel and object classification, instance segmentation, denoising or virtual staining. DeepImageJ is compatible with existing state of the art solutions and it is equipped with utility tools for developers to include new models. Very recently, several training frameworks have adopted the deepImageJ format to deploy their work in one of the most used softwares in the field (ImageJ). Beyond its direct use, we expect deepImageJ to contribute to the broader dissemination and reuse of deep learning models in life sciences applications and bioimage informatics.
Tissues and organs are composed of distinct cell types that must operate in concert to perform physiological functions. Efforts to create high-dimensional biomarker catalogs of these cells have been largely based on single-cell sequencing approaches, which lack the spatial context required to understand critical cellular communication and correlated structural organization. To probe in situ biology with sufficient depth, several multiplexed protein imaging methods have been recently developed. Though these technologies differ in strategy and mode of immunolabeling and detection tags, they commonly utilize antibodies directed against protein biomarkers to provide detailed spatial and functional maps of complex tissues. As these promising antibody-based multiplexing approaches become more widely adopted, new frameworks and considerations are critical for training future users, generating molecular tools, validating antibody panels, and harmonizing datasets. In this Perspective, we provide essential resources, key considerations for obtaining robust and reproducible imaging data, and specialized knowledge from domain experts and technology developers.
A method is described to generate and validate antibodies based on mapping the linear epitopes of a polyclonal antibody followed by sequential epitope-specific capture using synthetic peptides. Polyclonal antibodies directed towards four proteins RBM3, SATB2, ANLN, and CNDP1, potentially involved in human cancers, were selected and antibodies to several non-overlapping epitopes were generated and subsequently validated by Western blot, immunohistochemistry, and immunofluorescence. For all four proteins, a dramatic difference in functionality could be observed for these monospecific antibodies directed to the different epitopes. In each case, at least one antibody was obtained with full functionality across all applications, while other epitope-specific fractions showed no or little functionality. These results present a path forward to use the mapped binding sites of polyclonal antibodies to generate epitope-specific antibodies, providing an attractive approach for large-scale efforts to characterize the human proteome by antibodies.
This paper summarizes the recent activities of the Chromosome-Centric Human Proteome Project (C-HPP) consortium, which develops new technologies to identify yet-to-be annotated proteins (termed "missing proteins") in biological samples that lack sufficient experimental evidence at the protein level for confident protein identification. The C-HPP also aims to identify new protein forms that may be caused by genetic variability, post-translational modifications, and alternative splicing. Proteogenomic data integration forms the basis of the C-HPP's activities; therefore, we have summarized some of the key approaches and their roles in the project. We present new analytical technologies that improve the chemical space and lower detection limits coupled to bioinformatics tools and some publicly available resources that can be used to improve data analysis or support the development of analytical assays. Most of this paper's content has been compiled from posters, slides, and discussions presented in the series of C-HPP workshops held during 2014. All data (posters, presentations) used are available at the C-HPP Wild (http://c-hpp.webhosting.rug.nl/) and in the Supporting Information.
The development of a reference atlas of the healthy human body requires automated image segmentation of major anatomical structures across multiple organs based on spatial bioimages generated from various sources with differences in sample preparation. We present the setup and results of the Hacking the Human Body machine learning algorithm development competition hosted by the Human Biomolecular Atlas (HuBMAP) and the Human Protein Atlas (HPA) teams on the Kaggle platform. We create a dataset containing 880 histology images with 12,901 segmented structures, engaging 1175 teams from 78 countries in community-driven, open-science development of machine learning models. Tissue variations in the dataset pose a major challenge to the teams which they overcome by using color normalization techniques and combining vision transformers with convolutional models. The best model will be productized in the HuBMAP portal to process tissue image datasets at scale in support of Human Reference Atlas construction.
Centrioles are microtubule-based scaffolds that are essential for the formation of centrosomes, cilia, and flagella with important functions throughout the cell cycle, in physiology and during development. The ability to purify centriole-containing organelles on a large scale, combined with advances in protein identification using mass spectrometry-based proteomics, have revealed multiple centriole-associated proteins that are conserved during evolution in eukaryotes. Despite these advances, the molecular basis for the plethora of processes coordinated by cilia and centrosomes is not fully understood. Considering the complexity and dynamics of centriole-related proteomes and the first-pass analyses reported so far, it is likely that further insight might come from more thorough proteome analyses under various cellular and physiological conditions. To this end, we here describe methods to isolate centrosomes from human cells and strategies to selectively identify and study the properties of the associated proteins using quantitative mass spectrometry-based proteomics.