We describe a novel method for transcript profiling based on high-throughput parallel sequencing of signature tags using a non-gel-based microtiter plate format. The method relies on the identification of cDNA clones by pyrosequencing of the region corresponding to the 3'-end of the mRNA preceding the poly(A) tail. Simultaneously, the method can be used for gene discovery, since tags corresponding to unknown genes can be further characterized by extended sequencing. The protocol was validated using a model system for human atherosclerosis. Two 3'-tagged cDNA libraries, representing macrophages and foam cells, which are key components in the development of atherosclerotic plaques, were constructed using a solid phase approach. The libraries were analyzed by pyrosequencing, giving on average 25 bases. As a control, conventional expressed sequence tag (EST) sequencing using slab gel electrophoresis was performed. Homology searches were used to identify the genes corresponding to each tag. Comparisons with EST sequencing showed identical, unique matches in the majority of cases when the pyrosignature was at least 18 bases. A visualization tool was developed to facilitate differential analysis using a virtual chip format. The analysis resulted in identification of genes with possible relevance for development of atherosclerosis. The use of the method for automated massive parallel signature sequencing is discussed.
PR-39, a proline/arginine-rich peptide antibiotic, has been purified from pig intestine and later shown to originate in the bone marrow. Intending to isolate a clone for a human counterpart to PR-39, we synthesized a PCR probe derived from the PR-39 gene. However, when this probe was used to screen a human bone marrow cDNA library, eight clones were obtained with information for another putative human peptide antibiotic, designated FALL-39 after the first four residues. FALL-39 is a 39-residue peptide lacking cysteine and tryptophan. All human peptide antibiotics previously isolated (or predicted) belong to the defensin family and contain three disulfide bridges. The clone for prepro-FALL-39 encodes a cathelin-like precursor protein with 170 amino acid residues. We have postulated a dibasic processing site for the mature FALL-39 and chemically synthesized the putative peptide. In basal medium E, synthetic FALL-39 was highly active against Escherichia coli and Bacillus megaterium. Residues 13-34 in FALL-39 can be predicted to form a perfect amphiphatic helix, and CD spectra showed that medium E induced 30% helix formation in FALL-39. RNA blot analyses disclosed that the gene for FALL-39 is expressed mainly in human bone marrow and testis.
This report describes a single-step extension approach suitable for high-throughput single-nucleotide polymorphism typing applications. The method relies on extension of paired allele-specific primers and we demonstrate that the reaction kinetics were slower for mismatched configurations compared with matched configurations. In our approach we employ apyrase, a nucleotide degrading enzyme, to allow accurate discrimination between matched and mismatched primer-template configurations. This apyrase-mediated allele-specific extension (AMASE) protocol allows incorporation of nucleotides when the reaction kinetics are fast (matched 3'-end primer) but degrades the nucleotides before extension when the reaction kinetics are slow (mismatched 3'-end primer). Thus, AMASE circumvents the major limitation of previous allele-specific extension assays in which slow reaction kinetics will still give rise to extension products from mismatched 3'-end primers, hindering proper discrimination. It thus represents a significant improvement of the allele-extension method. AMASE was evaluated by a bioluminometric assay in which successful incorporation of unmodified nucleotides is monitored in real-time using an enzymatic cascade.
Squamous cell carcinoma (SCC) of the skin represents a group of neoplasms which is associated with exposure to UV light. Recently, we obtained data suggesting that invasive skin cancer and its precursors derive from one original neoplastic clone. Here, the analysis were extended by loss of heterozygosity (LOH) analysis in the chromosome 9q22.3 region. A total of 85 samples, taken from twenty-two sections of sun-exposed sites, corresponding to normal epidermis, morphological normal cells with positive immuno-staining for the p53 protein (p53 patches), dysplasias, cancer in situ (CIS) and squamous cell carcinomas (SCC) of the skin were analysed. Overall, about 70% of p53 patches had mutations in the p53 gene but not LOH in the p53 gene or 9q22.3 region. Approximately 70% of the dysplasias showed p53 mutations of which about 40% had LOH in the p53 region but not in the 9q22.3 region. In contrast, about 65% of SCC and CIS displayed LOH in the 9q22.3 region, as well as frequent (80%) mutations and/or LOH in the p53 gene. These findings strongly suggest that alterations in the p53 gene is an early event in the progression towards SCC, whereas malignant development involves LOH and alterations in at least one (or several) tumor suppressor genes located in chromosome 9q22.3.
Identification of functional genetic variation associated with increased susceptibility to complex diseases can elucidate genes and underlying biochemical mechanisms linked to disease onset and progression. For genes linked to genetic diseases, most identified causal mutations alter an encoded protein sequence. Technological advances for measuring RNA abundance suggest that a significant number of undiscovered causal mutations may alter the regulation of gene transcription. However, it remains a challenge to separate causal genetic variations from linked neutral variations. Here we present an in silico driven approach to identify possible genetic variation in regulatory sequences. The approach combines phylogenetic footprinting and transcription factor binding site prediction to identify variation in candidate cis-regulatory elements. The bioinformatics approach has been tested on a set of SNPs that are reported to have a regulatory function, as well as background SNPs. In the absence of additional information about an analyzed gene, the poor specificity of binding site prediction is prohibitive to its application. However, when additional data is available that can give guidance on which transcription factor is involved in the regulation of the gene, the in silico binding site prediction improves the selection of candidate regulatory polymorphisms for further analyses. The bioinformatics software generated for the analysis has been implemented as a Web-based application system entitled RAVEN ( regulatory analysis of variation in enhancers). The RAVEN system is available at http://www.cisreg.ca for all researchers interested in the detection and characterization of regulatory sequence variation.
Background: CD36 is a membrane glycoprotein involved in a variety of cellular processes such as lipid transport, immune regulation, hemostasis, adhesion, angiogenesis and atherosclerosis. It is expressed in many tissues and cell types, with a tissue specific expression pattern that is a result of a complex regulation for which the molecular mechanisms are not yet fully understood. There are several alternative mRNA isoforms described for the gene. We have investigated the expression patterns of five alternative first exons of the CD36 gene in several human tissues and cell types, to better understand the molecular details behind its regulation.
Results: We have identified one novel alternative first exon of the CD36 gene, and confirmed the expression of four previously known alternative first exons of the gene. The alternative transcripts are all expressed in more than one human tissue and their expression patterns vary highly in skeletal muscle, heart, liver, adipose tissue, placenta, spinal cord, cerebrum and monocytes. All alternative first exons are upregulated in THP-1 macrophages in response to oxidized low density lipoproteins. The alternative promoters lack TATA-boxes and CpG islands. The upstream region of exon 1b contains several features common for house keeping gene and monocyte specific gene promoters.
Conclusion: Tissue-specific expression patterns of the alternative first exons of CD36 suggest that the alternative first exons of the gene are regulated individually and tissue specifically. At the same time, the fact that all first exons are upregulated in THP-1 macrophages in response to oxidized low density lipoproteins may suggest that the alternative first exons are coregulated in this cell type and environmental condition. The molecular mechanisms regulating CD36 thus appear to be unusually complex, which might reflect the multifunctional role of the gene in different tissues and cellular conditions.
Objectives: To analyze the early gene expression in macrophages accompanying the phenotypic changes into foam cells upon exposure to oxidized low-density lipoprotein. To identify candidate genes and markers for further studies into the pathogenesis of atherosclerosis. Methods: Cells of the monocytic cell line THP-1 were activated by PMA and exposed to oxidized low-density lipoprotein. Gene expression profiles were investigated after 24 h, using a solid phase cDNA representational difference analysis (RDA) method and shotgun sequencing. Results were verified by microarray hybridization, and analyzed in the virtual chip display of a novel software tool for transcript profile exploration. Results: By comparing transcript profiles of exposed/unexposed cells, 1,984 transcript sequences, representing a total of 921 genes with altered expression levels in response to oxidized low-density lipoprotein exposure, were identified. Genes that are central to cell cycle control and proliferation, inflammatory response, and of pathways not previously implicated in atherosclerosis were identified. The data obtained is also made available on-line at http:// biobase.biotech.kth.se/thp1a for further exploration. Conclusion: The identification of new candidate genes for atherosclerotic disease through RDA-based transcript profiling facilitates further functional genomic studies in coronary artery disease. Candidate genetic polymorphism markers of potential clinical relevance can be identified by filtering information in genome variation databases through the virtual chip analysis of the transcript profiles and subsequently tested in association studies.
Monitoring of differential gene expression is an important step towards understanding of gene function. We describe a comparison of the representational difference analysis (RDA) subtraction process with corresponding microarray analysis. The subtraction steps are followed in a quantitative manner using a shotgun cloning and sequencing procedure that includes over 1900 gene sequences. In parallel, the enriched transcripts are spotted onto microarrays facilitating large scale hybridization analysis of the representations and the difference products. We show by the shotgun procedure that there is a high diversity of gene fragments represented in the iterative RDA products (92-67% singletons) with a low number of shared sequences (<9%) between subsequent subtraction cycles. A non redundant set of 1141 RDA clones were immobilized on glass slides and the majority of these clones (97%) gave repeated good fluorescent signals in a subsequent hybridization of the labelled and amplified original cDNA. We observed only a low number of false positives (<2%) and a more than twofold differential expression for 32% (363) of the immobilized RDA clones. In conclusion, we show that by random sequencing of the difference products we obtained an accurate transcript profile of the individual steps and that large-scale confirmation of the obtained transcripts can be achieved by microarray analysis.
Various approaches to the study of differential gene expression are applied to compare cell lines and tissue samples in a wide range of biological contexts. The compromise between focusing on only the important genes in certain cellular processes and achieving a complete picture is critical for the selection of strategy. We demonstrate how global microarray technology can be used for the exploration of the differentially expressed genes extracted through representational difference analysis (RDA). The subtraction of ubiquitous gene fragments from the two samples was demonstrated using cDNA microarrays including more than 32 000 spotted, PCR-amplified human clones. Hybridizations indicated the expression of 9100 of the microarray elements in a macrophage/foam cell atherosclerosis model system, of which many were removed during the RDA process. The stepwise subtraction procedure was demonstrated to yield an efficient enrichment of gene fragments overrepresented in either sample (18% in the representations, 86% after the first subtraction, and 88% after the second subtraction), many of which were impossible to detect in the starting material. Interestingly, the method allowed for the observation of the differential expression of several members of the low-abundant nuclear receptor gene family. We also observed a certain background level in the difference products of nondifferentially expressed gene fragments, warranting a verification strategy for selected candidate genes. The differential expression of several genes was verified by real-time PCR.
The potential for Grid technologies in applied bioinformatics is largely unexplored. We have developed a model for solving computationally demanding bioinformatics tasks in distributed Grid environments, designed to ease the usability for scientists unfamiliar with Grid computing. With a script-based implementation that uses a strategy of temporary installations of databases and existing executables on remote nodes at submission, we propose a generic solution that do not rely on predefined Grid runtime environments and that can easily be adapted to other bioinformatics tasks suitable for parallelization. This implementation has been successfully applied to whole proteome sequence similarity analyses and to genome-wide genotype simulations, where computation time was reduced from years to weeks. We conclude that computational Grid technology is a useful resource for solving high compute tasks in genetics and proteomics using existing algorithms.
In genetics, with increasing data sizes and more advanced algorithms for mining complex data, a point is reached where increased computational capacity or alternative solutions becomes unavoidable. Most contemporary methods for linkage analysis are based on the Lander-Green hidden Markov model (HMM), which scales exponentially with the number of pedigree members. In whole genome linkage analysis, genotype simulations become prohibitively time consuming to perform on single computers. We have developed 'Grid-Allegro', a Grid aware implementation of the Allegro software, by which several thousands of genotype simulations can be performed in parallel in short time. With temporary installations of the Allegro executable and datasets on remote nodes at submission, the need of predefined Grid run-time environments is circumvented. We evaluated the performance, efficiency and scalability of this implementation in a genome scan on Swedish multiplex Alzheimer's disease families. We demonstrate that 'Grid-Allegro' allows for the full exploitation of the features available in Allegro for genome-wide linkage. The implementation of existing bioinformatics applications on Grids (Distributed Computing) represent a cost-effective alternative for addressing highly resource-demanding and data-intensive bioinformatics task, compared to acquiring and setting up clusters of computational hardware in house (Parallel Computing), a resource not available to most geneticists today.
For several applications and algorithms used in applied bioinformatics, a bottle neck in terms of computational time may arise when scaled up to facilitate analyses of large datasets and databases. Re-codification, algorithm modification or sacrifices in sensitivity and accuracy may be necessary to accommodate for limited computational capacity of single work stations. Grid computing offers an alternative model for solving massive computational problems by parallel execution of existing algorithms and software implementations. We present the implementation of a Grid-aware model for solving computationally intensive bioinformatic analyses exemplified by a blastp sliding window algorithm for whole proteome sequence similarity analysis, and evaluate the performance in comparison with a local cluster and a single workstation. Our strategy involves temporary installations of the BLAST executable and databases on remote nodes at submission, accommodating for dynamic Grid environments as it avoids the need of predefined runtime environments (preinstalled software and databases at specific Grid-nodes). Importantly, the implementation is generic where the BLAST executable can be replaced by other software tools to facilitate analyses suitable for parallelisation. This model should be of general interest in applied bioinformatics. Scripts and procedures are freely available from the authors.
In the post-genome era, there is a great need for protein-specific affinity reagents to explore the human proteome. Antibodies are suitable as reagents, but generation of antibodies with low cross-reactivity to other human proteins requires careful selection of antigens. Here we show the results from a proteomewide effort to map linear epitopes based on uniqueness relative to the entire human proteome. The analysis was based on a sliding window sequence similarity search using short windows (8, 10, and 12 amino acid residues). A comparison of exact string matching (Hamming distance) and a heuristic method (BLAST) was performed, showing that the heuristic method combined with a grid strategy allows for whole proteome analysis with high accuracy and feasible run times. The analysis shows that it is possible to find unique antigens for a majority of the human proteins, with relatively strict rules involving low sequence identity of the possible linear epitopes. The implications for human antibody-based proteomics efforts are discussed.
Rare variant association tests (RVAT) have been developed to study the contribution of rare variants widely accessible through high-throughput sequencing technologies. RVAT require to aggregate rare variants in testing units and to filter variants to retain only the most likely causal ones. In the exome, genes are natural testing units and variants are usually filtered based on their functional consequences. However, when dealing with whole-genome sequence (WGS) data, both steps are challenging. No natural biological unit is available for aggregating rare variants. Sliding windows procedures have been proposed to circumvent this difficulty, however they are blind to biological information and result in a large number of tests. We propose a new strategy to perform RVAT on WGS data: "RAVA-FIRST" (RAre Variant Association using Functionally-InfoRmed STeps) comprising three steps. (1) New testing units are defined genome-wide based on functionally-adjusted Combined Annotation Dependent Depletion (CADD) scores of variants observed in the gnomAD populations, which are referred to as "CADD regions". (2) A region-dependent filtering of rare variants is applied in each CADD region. (3) A functionally-informed burden test is performed with sub-scores computed for each genomic category within each CADD region. Both on simulations and real data, RAVA-FIRST was found to outperform other WGS-based RVAT. Applied to a WGS dataset of venous thromboembolism patients, we identified an intergenic region on chromosome 18 enriched for rare variants in early-onset patients. This region that was missed by standard sliding windows procedures is included in a TAD region that contains a strong candidate gene. RAVA-FIRST enables new investigations of rare non-coding variants in complex diseases, facilitated by its implementation in the R package Ravages.
In this study, we have applied and evaluated a modified cDNA representational difference analysis (RDA) protocol based on magnetic bead technology to study the molecular effects of a candidate drug (N,N'-diacetyl-L-cystine, DiNAC) in a model for atherosclerosis. Alterations in a gene expression profile induced by DiNAC were investigated in a human monocytic cell line (THP-1) differentiated into macrophage-like cells by lipopolysaccharide and further exposed to DiNAC. Three rounds of subtraction have been performed and the difference products from the second and third rounds have been characterized in detail by analysis of over 1000 gene sequences. Two protocols for analysis of the subtraction products have been evaluated, a shotgun approach and size selection of both distinct fragments and band-patterned smear. We demonstrate that in order to obtain a representative view of the most abundant gene fragments, the shotgun procedure is preferred. The obtained sequences were analyzed against the UniGene and Expressed Gene Anatomy Database (EGAD) databases and the results were visualized and analyzed with the ExProView software enabling rapid pair-wise comparison and identification of individual genes or functional groups of genes with altered expression levels. The identified differentially expressed gene sequences were comprised of both genes with known involvement in atherosclerosis or cholesterol biosynthesis and genes previously not implicated in these processes. The applicability of a solid-phase shotgun RDA protocol, combined with virtual chip monitoring, results in new starting points for characterization of novel candidate drugs.
Objectives: Atherosclerotic plaques are known to develop and progress where the endothelium is subjected to turbulent blood flow. We have applied cDNA representational difference analysis (RDA) to study vascular gene expression in mouse aorta in a model for atherosclerosis. Methods: Gene expression profiles were investigated in plaque-prone and plaque-resistant localizations in the ascending aorta and arch in 8-week-oldApoE-/- and LDLR-/- mice. Total RNA was extracted and two rounds of subtraction were performed; the difference products were characterized in detail by shotgun cloning and analysis of more than 2,700 gene sequences. Results: The identified differentially expressed gene sequences include both genes with known involvement in vascular gene expression and genes previously not implicated in vascular processes. For example, CD36 and caveolin, previously reported for their participation in the progression of atherosclerosis, were found to have an increased expression in vessel localizations thought to be especially susceptible to plaque formation. Conclusions: This report provides new in vivo information of expressed genes that can be useful for further investigations of the molecular mechanisms in focal localization of atherosclerosis.
BackgroundFamily history of venous thromboembolism (VTE) has been suggested to be more useful in risk assessment than thrombophilia testing. ObjectivesWe investigated established genetic susceptibility variants for association with VTE and evaluated a genetic risk score in isolation and combined with known trigger factors, including family history of VTE. Patients/MethodA total of 18 single nucleotide polymorphisms (SNPs) selected from the literature were genotyped in 2835 women participating in a Swedish nationwide case-control study (the ThromboEmbolism Hormone Study [TEHS]). Association with VTE was assessed by odds ratios (ORs) with 95% confidence interval (CI) using logistic regression. Clinical and genetic predictors that contributed significantly to the fit of the logistic regression model were included in the prediction models. SNP-SNP interactions were investigated and incorporated into the models if found significant. Risk scores were evaluated by calculating the area under the receiver-operating characteristics curve (AUC). ResultsSeven SNPs (F5 rs6025, F2 rs1799963, ABO rs514659, FGG rs2066865, F11 rs2289252, PROC rs1799810 and KNG1 rs710446) with four SNP-SNP interactions contributed to the genetic risk score for VTE, with an AUC of 0.66 (95% CI, 0.64-0.68). After adding clinical risk factors, which included family history of VTE, the AUC reached 0.84 (95% CI, 0.82-0.85). The goodness of fit of the genetic and combined scores improved when significant SNP-SNP interaction terms were included. ConclusionPrediction of VTE in high-risk individuals was more accurate when a combination of clinical and genetic predictors with SNP-SNP interactions was included in a risk score.
There is a clear clinical need for high-specificity plasma biomarkers for predicting risk of venous thromboembolism (VTE), but thus far, such markers have remained elusive. Utilizing affinity reagents from the Human Protein Atlas project and multiplexed immuoassays, we extensively analyzed plasma samples from 2 individual studies to identify candidate protein markers associated with VTE risk. We screened plasma samples from 88 VTE cases and 85 matched controls, collected as part of the Swedish ¡°Venous Thromboembolism Biomarker Study,¡± using suspension bead arrays composed of 755 antibodies targeting 408 candidate proteins. We identified significant associations between VTE occurrence and plasma levels of human immunodeficiency virus type I enhancer binding protein 1 (HIVEP1), von Willebrand factor (VWF), glutathione peroxidase 3 (GPX3), and platelet-derived growth factor β (PDGFB). For replication, we profiled plasma samples of 580 cases and 589 controls from the French FARIVE study. These results confirmed the association of VWF and PDGFB with VTE after correction for multiple testing, whereas only weak trends were observed for HIVEP1 and GPX3. Although plasma levels of VWF and PDGFB correlated modestly (p ~ 0.30) with each other, they were independently associated with VTE risk in a joint model in FARIVE (VWF P < .001; PDGFB P 5 .002). PDGF was verified as the target of the capture antibody by immunocapture mass spectrometry and sandwich enzyme-linked immunosorbent assay. In conclusion, we demonstrate that high-throughput affinity plasma proteomic profiling is a valuable research strategy to identify potential candidate biomarkers for thrombosis-related disorders, and our study suggests a novel association of PDGFB plasma levels with VTE.
Genetic associations for the reoccurrence of venous thromboembolism (VTE) are not well described. Our aim was to investigate if common genetic variants, previously found to contribute to the prediction of first time thrombosis in women, were associated with risk of recurrence. The Thromboembolism Hormone Study (TEHS) is a Swedish nationwide case-control study (2002-2009). A cohort of 1,010 women with first time VTE was followed up until a recurrent event, death or November 2011. The genetic variants in F5 rs6025, F2 rs1799963, ABO rs514659, FGG rs2066865, F11 rs2289252, PROC rs1799810 and KNG1 rs710446 were assessed together with clinical variables. Recurrence rate was calculated as the number of events over the accumulated patient-time. Cumulative recurrence was calculated by Kaplan-Meier curve. Cox proportional-hazard model was used to estimate hazard ratios (HR) and 95 % confidence intervals (95 % CI) between groups. A total of 101 recurrent events occurred during a mean follow-up time of five years. The overall recurrence rate was 20 per 1,000 person-years (95 % CI; 16-24). The recurrence rate was highest in women with unprovoked first event and obesity. Carriers of the risk alleles of F5 rs6025 (HR=1.7 (95 % CI; 1.1-2.6)) and F11 rs2289252 (HR=1.8 (95 % CI; 1.1-3.0)) had significantly higher rates of recurrence compared to non-carriers. The cumulative recurrence was 2.5-fold larger in carriers of both F5 rs6025 and F11 rs2289252 than in non-carriers at five years follow-up. In conclusion, F5 rs6025 and F11 rs2289252 contributed to the risk of recurrent VTE and the combination is of potential clinical relevance for risk prediction.
Introduction: We investigated whether genetic variations robustly associated with coronary artery disease are also associated with risk of venous thromboembolism in a well-defined, female case-control study (n = 2753) from Sweden. Materials and Methods: 39 single nucleotide polymorphisms in 32 loci associated with coronary artery disease in genome-wide association studies were identified in a literature search and genotyped in the ThromboEmbolism Hormone Study (TEHS). Association with venous thromboembolism was assessed by logistic regression. Results: Only rs579459 in the ABO locus demonstrated a significant association with VTE. A tentative association between ANRIL and VTE in the discovery analysis failed to replicate in a meta-analysis of 4 independent cohorts (total n = 7181). Conclusions: It appears that only the ABO locus is a shared risk factor for coronary artery disease and VTE.
Endothelial cells line blood vessels and regulate hemostasis, inflammation, and blood pressure. Proteins critical for these specialized functions tend to be predominantly expressed in endothelial cells across vascular beds. Here, we present a systems approach to identify a panel of human endothelial-enriched genes using global, body-wide transcriptomics data from 124 tissue samples from 32 organs. We identified known and unknown endothelial-enriched gene transcripts and used antibody-based profiling to confirm expression across vascular beds. The majority of identified transcripts could be detected in cultured endothelial cells from various vascular beds, and we observed maintenance of relative expression in early passage cells. In summary, we describe a widely applicable method to determine cell-type-specific transcriptome profiles in a whole-organism context, based on differential abundance across tissues. We identify potential vascular drug targets or endothelial biomarkers and highlight candidates for functional studies to increase understanding of the endothelium in health and disease.
Background: Intraplaque hemorrhage (IPH) is a hallmark of atherosclerotic plaque instability. Biliverdin reductase B (BLVRB) is enriched in plasma and plaques from patients with symptomatic carotid atherosclerosis and functionally associated with IPH. Objective: We explored the biomarker potential of plasma BLVRB through (1) its correlation with IPH in carotid plaques assessed by magnetic resonance imaging (MRI), and with recurrent ischemic stroke, and (2) its use for monitoring pharmacotherapy targeting IPH in a preclinical setting. Methods: Plasma BLVRB levels were measured in patients with symptomatic carotid atherosclerosis from the PARISK study (n = 177, 5 year follow-up) with and without IPH as indicated by MRI. Plasma BLVRB levels were also measured in a mouse vein graft model of IPH at baseline and following antiangiogenic therapy targeting vascular endothelial growth factor receptor 2 (VEGFR-2). Results: Plasma BLVRB levels were significantly higher in patients with IPH (737.32 & PLUSMN; 693.21 vs. 520.94 & PLUSMN; 499.43 mean fluorescent intensity (MFI), p = 0.033), but had no association with baseline clinical and biological parameters. Plasma BLVRB levels were also significantly higher in patients who developed recurrent ischemic stroke (1099.34 & PLUSMN; 928.49 vs. 582.07 & PLUSMN; 545.34 MFI, HR = 1.600, CI [1.092-2.344]; p = 0.016). Plasma BLVRB levels were significantly reduced following prevention of IPH by anti-VEGFR-2 therapy in mouse vein grafts (1189 & PLUSMN; 258.73 vs. 1752 & PLUSMN; 366.84 MFI; p = 0.004). Conclusions: Plasma BLVRB was associated with IPH and increased risk of recurrent ischemic stroke in patients with symptomatic low- to moderate-grade carotid stenosis, indicating the capacity to monitor the efficacy of IPH-preventive pharmacotherapy in an animal model. Together, these results suggest the utility of plasma BLVRB as a biomarker for atherosclerotic plaque instability.
Background: CD36 is a multiligand receptor involved in various metabolic pathways, including cellular uptake of long-chain fatty acids. Defect function or expression of CD36 can result in dyslipidemia or insulin resistance. We have previously shown that CD36 expression is female-predominant in rat liver. In the present study, hormonal and nutritional regulation of hepatic CD36 expression was examined in male and female rats. Since alternative transcription start sites have been described in murine and human Cd36, we investigated whether alternative CD36 transcripts are differentially regulated in rat liver during these conditions.
Results: Sequence information of the rat Cd36 5'-UTR was extended, showing that the gene structure of Cd36 in rat is similar to that previously described in mouse with at least two alternative first exons. The rat Cd36 exon 1a promoter was sequenced and found to be highly similar to murine and human Cd36. We show that alternative first exon usage is involved in the female-predominant expression of CD36 in rat liver and during certain hormonal states that induce CD36 mRNA abundance. Estrogen treatment or continuous infusion of growth hormone (GH) in male rats induced CD36 expression preferentially through the exon 1a promoter. Old age was associated with increased CD36 expression in male rats, albeit without any preferential first exon usage. Intermittent GH treatment in old male rats reversed this effect. Mild starvation (12 hours without food) reduced CD36 expression in female liver, whereas its expression was increased in skeletal muscle.
Conclusion: The results obtained in this study confirm and extend our previous observation that GH is an important regulator of hepatic CD36, and depending on the mode of treatment (continuous or intermittent) the gene might be either induced or repressed. We suggest that the effects of continuous GH secretion in females (which is stimulatory) and intermittent GH secretion in males (which is inhibitory) explains the sex-different expression of this gene. Furthermore, a female-specific repression of hepatic CD36 in response to food deprivation was found, which was in contrast to a stimulatory effect in skeletal muscle. This demonstrates a tissue-specific regulation of Cd36.
Background: Precision medicine approaches aim to tackle diseases on an individual level through molecular profiling. Despite the growing knowledge about diseases and the reported diversity of molecular phenotypes, the descriptions of human health on an individual level have been far less elaborate. Methods: To provide insights into the longitudinal protein signatures of well-being, we profiled blood plasma collected over one year from 101 clinically healthy individuals using multiplexed antibody assays. After applying an antibody validation scheme, we utilized > 700 protein profiles for in-depth analyses of the individuals’ short-term health trajectories. Findings: We found signatures of circulating proteomes to be highly individual-specific. Considering technical and longitudinal variability, we observed that 49% of the protein profiles were stable over one year. We also identified eight networks of proteins in which 11–242 proteins covaried over time. For each participant, there were unique protein profiles of which some could be explained by associations to genetic variants. Interpretation: This observational and non-interventional study identifyed noticeable diversity among clinically healthy subjects, and facets of individual-specific signatures emerged by monitoring the variability of the circulating proteomes over time. To enable more personal hence precise assessments of health states, longitudinal profiling of circulating proteomes can provide a valuable component for precision medicine approaches. Funding: This work was supported by the Erling Persson Foundation, the Swedish Heart and Lung Foundation, the Knut and Alice Wallenberg Foundation, Science for Life Laboratory, and the Swedish Research Council.
The intermediate filament protein nestin is expressed during embryonic development, but considered largely restricted to areas of regeneration in the adult. Here, we perform a body-wide transcriptome and protein-profiling analysis to reveal that nestin is constitutively, and highly-selectively, expressed in adult human endothelial cells (EC), independent of proliferative status. Correspondingly, we demonstrate that it is not a marker for tumour EC in multiple malignancy types. Imaging of EC from different vascular beds reveals nestin subcellular distribution is shear-modulated. siRNA inhibition of nestin increases EC proliferation, and nestin expression is reduced in atherosclerotic plaque neovessels. eQTL analysis reveals an association between SNPs linked to cardiovascular disease and reduced aortic EC nestin mRNA expression. Our study challenges the dogma that nestin is a marker of proliferation, and provides insight into its regulation and function in EC. Furthermore, our systems-based approach can be applied to investigate body-wide expression profiles of any candidate protein.
Changes in the endothelium of the cerebral vasculature can contribute to inflammatory, thrombotic, and malignant disorders. The importance of defining cell-type-specific genes and their modification in disease is increasingly recognized. Here, we develop a bioinformatics-based approach to identify normal brain cell-enriched genes, using bulk RNA sequencing (RNA-seq) data from 238 normal human cortex samples from 2 independent cohorts. We compare endothelial cell-enriched gene profiles with astrocyte, oligodendrocyte, neuron, and microglial cell profiles. Endothelial changes in malignant disease are explored using RNA-seq data from 516 lower-grade gliomas and 401 glioblastomas. Lower-grade gliomas appear to be an "endothelial intermediate'' between normal brain and glioblastoma. We apply our method for the prediction of glioblastoma-specific endothelial biomarkers, providing potential diagnostic or therapeutic targets. In summary, we provide a roadmap of endothelial cell identity in normal and malignant brain, using a method developed to resolve bulk RNA-seq into constituent cell-type-enriched profiles.
A State of the Art lecture titled "Proteomics in Thrombosis Research" was presented at the ISTH Congress in 2021. In clinical practice, there is a need for improved plasma biomarker-based tools for diagnosis and risk prediction of venous thromboembolism (VTE). Analysis of blood, to identify plasma proteins with potential utility for such tools, could enable an individualized approach to treatment and prevention. Technological advances to study the plasma proteome on a large scale allows broad screening for the identification of novel plasma biomarkers, both by targeted and nontargeted proteomics methods. However, assay limitations need to be considered when interpreting results, with orthogonal validation required before conclusions are drawn. Here, we review and provide perspectives on the application of affinity-and mass spectrometry-based methods for the identification and analysis of plasma protein biomarkers, with potential application in the field of VTE. We also provide a future perspective on discovery strategies and emerging technologies for targeted proteomics in thrombosis research. Finally, we summarize relevant new data on this topic, presented during the 2021 ISTH Congress.
Allele-specific expression (ASE) is the imbalance in transcription between maternal and paternal alleles at a locus and can be probed in single individuals using massively parallel DNA sequencing technology. Assessing ASE within a single sample provides a static picture of the ASE, but the magnitude of ASE for a given transcript may vary between different biological conditions in an individual. Such condition-dependent ASE could indicate a genetic variation with a functional role in the phenotypic difference. We investigated ASE through RNA-sequencing of primary white blood cells from eight human individuals before and after the controlled induction of an inflammatory response, and detected condition-dependent and static ASE at 211 and 13021 variants, respectively. We developed a method, GeneiASE, to detect genes exhibiting static or condition-dependent ASE in single individuals. GeneiASE performed consistently over a range of read depths and ASE effect sizes, and did not require phasing of variants to estimate haplotypes. We observed condition-dependent ASE related to the inflammatory response in 19 genes, and static ASE in 1389 genes. Allele-specific expression was confirmed by validation of variants through real-time quantitative RT-PCR, with RNA-seq and RT-PCR ASE effect-size correlations r = 0.67 and r = 0.94 for static and condition-dependent ASE, respectively.
BackgroundNeutrophil Extracellular Traps (NETs) are key mediators of immunothrombotic mechanisms and defective clearance of NETs from the circulation underlies an array of thrombotic, inflammatory, infectious, and autoimmune diseases. Efficient NET degradation depends on the combined activity of two distinct DNases, DNase1 and DNase1-like 3 (DNase1L3) that preferentially digest double-stranded DNA (dsDNA) and chromatin, respectively.
MethodsHere, we engineered a dual-active DNase with combined DNase1 and DNase1L3 activities and characterized the enzyme for its NET degrading potential in vitro. Furthermore, we produced a mouse model with transgenic expression of the dual-active DNase and analyzed body fluids of these animals for DNase1 and DNase 1L3 activities. We systematically substituted 20 amino acid stretches in DNase1 that were not conserved among DNase1 and DNase1L3 with homologous DNase1L3 sequences.
ResultsWe found that the ability of DNase1L3 to degrade chromatin is embedded into three discrete areas of the enzyme's core body, not the C-terminal domain as suggested by the state-of-the-art. Further, combined transfer of the aforementioned areas of DNase1L3 to DNase1 generated a dual-active DNase1 enzyme with additional chromatin degrading activity. The dual-active DNase1 mutant was superior to native DNase1 and DNase1L3 in degrading dsDNA and chromatin, respectively. Transgenic expression of the dual-active DNase1 mutant in hepatocytes of mice lacking endogenous DNases revealed that the engineered enzyme was stable in the circulation, released into serum and filtered to the bile but not into the urine.
ConclusionTherefore, the dual-active DNase1 mutant is a promising tool for neutralization of DNA and NETs with potential therapeutic applications for interference with thromboinflammatory disease states.
The intercalated disc (ICD) occupies a central position in the transmission of force, electrical continuity and chemical communication between cardiomyocytes. Changes in its structure and composition are strongly implicated in heart failure. ICD functions include: maintenance of electrical continuity across the ICD; physical links between membranes and the cytoskeleton; intercellular adhesion; maintenance of ICD structure and function; and growth. About 200 known proteins are associated with ICDs, 40% of which change in disease. We systemically reviewed cardiac immunohistochemical data on the Human Protein Atlas (HPA) web site, ExPASy protein binding data and published papers on ICDs. We identified 43 proteins not previously reported, and confirmed 37 proteins that have previously been described. In addition, 102 proteins not present on the HPA web site but were described in ICDs in the literature. We group these into clusters that demonstrate functionally interactive groups of proteins demonstrating that ICDs play a key role in cardiomyocyte function.
Global classification of the human proteins with regards to spatial expression patterns across organs and tissues is important for studies of human biology and disease. Here, we used a quantitative transcriptomics analysis (RNA-Seq) to classify the tissue-specific expression of genes across a representative set of all major human organs and tissues and combined this analysis with antibody- based profiling of the same tissues. To present the data, we launch a new version of the Human Protein Atlas that integrates RNA and protein expression data corresponding to 80% of the human protein-coding genes with access to the primary data for both the RNA and the protein analysis on an individual gene level. We present a classification of all human protein-coding genes with regards to tissue-specificity and spatial expression pattern. The integrative human expression map can be used as a starting point to explore the molecular constituents of the human body.
A gene-centric Human Proteome Project has been proposed to characterize the human protein-coding genes in a chromosome-centered manner to understand human biology and disease. Here, we report on the protein evidence for all genes predicted from the genome sequence based on manual annotation from literature (UniProt), antibody-based profiling in cells, tissues and organs and analysis of the transcript profiles using next generation sequencing in human cell lines of different origins. We estimate that there is good evidence for protein existence for 69% (n = 13985) of the human protein-coding genes, while 23% have only evidence on the RNA level and 7% still lack experimental evidence. Analysis of the expression patterns shows few tissue-specific proteins and approximately half of the genes expressed in all the analyzed cells. The status for each gene with regards to protein evidence is visualized in a chromosome-centric manner as part of a new version of the Human Protein Atlas (www.proteinatlas.org).
The peptide FA-LL-37, previously termed FALL-39, was originally predicted from on ORF of a cDNA clone isolated from a human bone marrow library. This peptide was synthesized and found to have antibacterial activity. We have now characterized and sequenced the complete gene for FA-LL-37, termed FALL39. It is a compact gene of 1963 bp with four exons. Exons 1-3 code for a signal sequence and the cathelin region. Exon 4 contains the information for the mature antibacterial peptide. Our results indicate that FALL39 is the only member of the cathelin gene family present in the human genome. Potential binding sites for acute-phase-response factors are identified in the promoter and in intron 2. A possible role for the cytokine interleukin-6 in the regulation of FALL 39 is discussed. Anti-(FA-LL-37) IgG located the peptide in granulocytes and we isolated the mature peptide from these cells after degranulation. Structural analysis determined the mature peptide to be LL-37. To obtain LL-37 for antibacterial assays, synthetic FA-LL-37 was degraded with dipeptidyl-peptidase I. This analysis showed that mature LL-37 is a potent antibacterial peptide.
To understand renal functions and disease, it is important to define the molecular constituents of the various compartments of the kidney. Here, we used comparative transcriptomic analysis of all major organs and tissues in the human body, in combination with kidney tissue micro array based immunohistochemistry, to generate a comprehensive description of the kidney-specific transcriptome and proteome. A special emphasis was placed on the identification of genes and proteins that were elevated in specific kidney subcompartments. Our analysis identified close to 400 genes that had elevated expression in the kidney, as compared to the other analysed tissues, and these were further subdivided, depending on expression levels, into tissue enriched, group enriched or tissue enhanced. Immunohistochemistry allowed us to identify proteins with distinct localisation to the glomeruli (n=11), proximal tubules (n=120), distal tubules (n=9) or collecting ducts (n=8). Among the identified kidney elevated transcripts, we found several proteins not previously characterised or identified as elevated in kidney. This description of the kidney specific transcriptome and proteome provides a resource for basic and clinical research to facilitate studies to understand kidney biology and disease.
Despite recognizing aging as a common risk factor of many human diseases, little is known about its molecular traits. To identify age-associated proteins circulating in human blood, we screened 156 individuals aged 50–92 using exploratory and multiplexed affinity proteomics assays. Profiling eight additional study sets (N = 3,987), performing antibody validation, and conducting a meta-analysis revealed a consistent age association (P = 6.61 × 10−6) for circulating histidine-rich glycoprotein (HRG). Sequence variants of HRG influenced how the protein was recognized in the immunoassays. Indeed, only the HRG profiles affected by rs9898 were associated with age and predicted the risk of mortality (HR = 1.25 per SD; 95% CI = 1.12–1.39; P = 6.45 × 10−5) during a follow-up period of 8.5 yr after blood sampling (IQR = 7.7–9.3 yr). Our affinity proteomics analysis found associations between the particular molecular traits of circulating HRG with age and all-cause mortality. The distinct profiles of this multipurpose protein could serve as an accessible and informative indicator of the physiological processes related to biological aging.
Objective: Endothelial cell (EC) dysfunction is a well-established response to cardiovascular disease risk factors, such as smoking and obesity. Risk factor exposure can modify EC signaling and behavior, leading to arterial and venous disease development. Here, we aimed to identify biomarker panels for the assessment of EC dysfunction, which could be useful for risk stratification or to monitor treatment response. Approach and Results: We used affinity proteomics to identify EC proteins circulating in plasma that were associated with cardiovascular disease risk factor exposure. Two hundred sixteen proteins, which we previously predicted to be EC-enriched across vascular beds, were measured in plasma samples (N=1005) from the population-based SCAPIS (Swedish Cardiopulmonary Bioimage Study) pilot. Thirty-eight of these proteins were associated with body mass index, total cholesterol, low-density lipoprotein, smoking, hypertension, or diabetes. Sex-specific analysis revealed that associations predominantly observed in female- or male-only samples were most frequently with the risk factors body mass index, or total cholesterol and smoking, respectively. We show a relationship between individual cardiovascular disease risk, calculated with the Framingham risk score, and the corresponding biomarker profiles. Conclusions: EC proteins in plasma could reflect vascular health status.
Macrophages play a critical role in innate immunity, and the expression of early response genes orchestrate much of the initial response of the immune system. Macrophages undergo extensive transcriptional reprogramming in response to inflammatory stimuli such as Lipopolysaccharide (LPS). To identify gene transcription regulation patterns involved in early innate immune responses, we used two genome-wide approaches - gene expression profiling and chromatin immunoprecipitation-sequencing (ChIP-seq) analysis. We examined the effect of 2 hrs LPS stimulation on early gene expression and its relation to chromatin remodeling (H3 acetylation; H3Ac) and promoter binding of Sp1 and RNA polymerase II phosphorylated at serine 5 (S5P RNAPII), which is a marker for transcriptional initiation. Our results indicate novel and alternative gene regulatory mechanisms for certain proinflammatory genes. We identified two groups of upregulated inflammatory genes with respect to chromatin modification and promoter features. One group, including highly up-regulated genes such as tumor necrosis factor (TNF), was characterized by H3Ac, high CpG content and lack of TATA boxes. The second group, containing inflammatory mediators (interleukins and CCL chemokines), was up-regulated upon LPS stimulation despite lacking H3Ac in their annotated promoters, which were low in CpG content but did contain TATA boxes. Genome-wide analysis showed that few H3Ac peaks were unique to either +/-LPS condition. However, within these, an unpacking/expansion of already existing H3Ac peaks was observed upon LPS stimulation. In contrast, a significant proportion of S5P RNAPII peaks (approx 40%) was unique to either condition. Furthermore, data indicated a large portion of previously unannotated TSSs, particularly in LPS-stimulated macrophages, where only 28% of unique S5P RNAPII peaks overlap annotated promoters. The regulation of the inflammatory response appears to occur in a very specific manner at the chromatin level for specific genes and this study highlights the level of fine-tuning that occurs in the immune response.
Venous thromboembolism (VTE) is a common, multi-causal disease with potentially serious short- and long-term complications. In clinical practice, there is a need for improved plasma biomarker-based tools for VTE diagnosis and risk prediction. Here we show, using proteomics profiling to screen plasma from patients with suspected acute VTE, and several case-control studies for VTE, how Complement Factor H Related 5 protein (CFHR5), a regulator of the alternative pathway of complement activation, is a VTE-associated plasma biomarker. In plasma, higher CFHR5 levels are associated with increased thrombin generation potential and recombinant CFHR5 enhanced platelet activation in vitro. GWAS analysis of ~52,000 participants identifies six loci associated with CFHR5 plasma levels, but Mendelian randomization do not demonstrate causality between CFHR5 and VTE. Our results indicate an important role for the regulation of the alternative pathway of complement activation in VTE and that CFHR5 represents a potential diagnostic and/or risk predictive plasma biomarker.
Systematic exploration of the dynamic human plasma proteome enables the discovery of novel protein biomarkers. Using state-of-the-art technologies holds the promise to facilitate a better diagnosis and risk prediction of diseases. Cardiovascular disease (CVD) pathophysiology is characterized for unbalancing of processes such as vascular inflammation, endothelial dysfunction, or lipid profiles among others. Such processes have a direct impact on the dynamic and complex composition of blood and hence the plasma proteome. Therefore, the study of the plasma proteome comprises an excellent exploratory source of biomarker research particularly for CVD. We describe the protocol for performing the discovery of protein biomarker candidates using the suspension bead array technology. The process does not require depletion steps to remove abundant proteins and consumes only a few microliters of sample from the body fluid of interest. The approach is scalable to measure many analytes as well as large numbers of samples. Moreover, we describe a bead-assisted antibody-labeling process that helps to develop quantitative assays for validation purposes and facilitate the translation of the identified candidates into clinical studies.
Advances in molecular profiling have opened up the possibility to map the expression of genes in cells, tissues, and organs in the human body. Here, we combined single-cell transcriptomics analysis with spatial antibody-based protein profiling to create a high-resolution single-cell type map of human tissues. An open access atlas has been launched to allow researchers to explore the expression of human protein-coding genes in 192 individual cell type clusters. An expression specificity classification was performed to determine the number of genes elevated in each cell type, allowing comparisons with bulk transcriptomics data. The analysis highlights distinct expression clusters corresponding to cell types sharing similar functions, both within the same organs and between organs.
Background and aims: Unstable carotid atherosclerosis causes stroke, but methods to identify patients and lesions at risk are lacking. We recently found enrichment of genes associated with calcification in carotid plaques from asymptomatic patients. Here, we hypothesized that calcification represents a stabilising feature of plaques and investigated how macro-calcification, as estimated by computed tomography (CT), correlates with gene expression profiles in lesions. Methods: Plaque calcification was measured in pre-operative CT angiographies. Plaques were sorted into high- and low-calcified, profiled with microarrays, followed by bioinformatic analyses. Immunohistochemistry and qPCR were performed to evaluate the findings in plaques and arteries with medial calcification from chronic kidney disease patients. Results: Smooth muscle cell (SMC) markers were upregulated in high-calcified plaques and calcified plaques from symptomatic patients, whereas macrophage markers were downregulated. The most enriched processes in high-calcified plaques were related to SMCs and extracellular matrix (ECM) organization, while inflammation, lipid transport and chemokine signaling were repressed. These findings were confirmed in arteries with high medial calcification. Proteoglycan 4 (PRG4) was identified as the most upregulated gene in association with plaque calcification and found in the ECM, SMA+ and CD68+/TRAP + cells. Conclusions: Macro-calcification in carotid lesions correlated with a transcriptional profile typical for stable plaques, with altered SMC phenotype and ECM composition and repressed inflammation. PRG4, previously not described in atherosclerosis, was enriched in the calcified ECM and localized to activated macrophages and smooth muscle-like cells. This study strengthens the notion that assessment of calcification may aid evaluation of plaque phenotype and stroke risk.
Background: There is an imperative need for SNP genotyping technologies that are cost-effective per sample with retained high accuracy, throughput and flexibility. We have developed a microarray-based technique and compared it to Pyrosequencing. In the protease-mediated allele-specific extension (PrASE), the protease constrains the elongation reaction and thus prevents incorrect nucleotide incorporation to mismatched 3'-termini primers.
Results: The assay is automated for 48 genotyping reactions in parallel followed by a tag-microarray detection system. A script automatically visualizes the results in cluster diagrams and assigns the genotypes. Ten polymorphic positions suggested as prothrombotic genetic variations were analyzed with Pyrosequencing and PrASE technologies in 442 samples and 99.8 % concordance was achieved. In addition to accuracy, the robustness and reproducibility of the technique has been investigated.
Conclusion: The results of this study strongly indicate that the PrASE technology can offer significant improvements in terms of accuracy and robustness and thereof increased number of typeable SNPs.