To construct a complete map of the human proteome landscape is a vital part of the total understanding of the human body. Such a map could enrich the mankind to the extent that many severe diseases could be fully understood and hence could be treated with appropriate methods.
In this study, immunohistochemical (IHC) data from ~6000 proteins, 65 cell types in 48 tissues and 47 cell lines has been used to investigate the human proteome regarding protein expression and localization. In order to analyze such a large data set, different statistical methods and algorithms has been applied and by using these tools, interesting features regarding the proteome was found. By using all available IHC data from 65 cell types in 48 tissues, it was found that the amount of tissue specific protein expression was surprisingly small, and the general impression from the analysis is that almost all proteins are present at all times in the cellular environment. Rather than tissue specific protein expression, the localization and minor concentration fluctuations of the proteins in the cell is responsible for molecular interaction and tissue specific cellular behavior. However, if a quarter of all proteins are used to distinguish different tissues types, there are a proportion of proteins that have certain expression profiles, which defines clusters of tissues of the same kind and embryonic origin.
The estimation of expression levels using IHC is a labor-intensive method, which suffers from large variation between manual annotators. An automated image software tool was developed to circumvent this problem. The automated image software was shown to be more robust then manual annotators, and the quantification of expressed protein levels of the stained imaged was in the same range as the manual annotations.
A more thorough investigation of the stained image estimations made by the automated software revield a significant correlation between the estimated protein expression and the cell size parameters provided by the automated software. To make it feasible to compare protein expression levels across different cell lines, without the cell line size bias, a normalization procedure was implemented and evaluated. It was found that when the normalization procedure was applied to the protein expression data, the correlation between protein expression values and cell size was minimized, and hence comparisons between cell lines regarding protein expression is possible.
In addition, using the normalized protein expression data, an analysis to investigate the degree of correlation between mRNA levels and proteins for 1065 gene products was performed. By using two individual microarray data sets for estimation of RNA levels, and normalized protein data measured by the automated software as estimation of the protein levels, a mean correlation of ~0.3 for was found. This result indicates that a significant proportion of the manufactured antibodies, when used in IHC setup, are indeed an accurate measurement of protein expression levels.
By using antibodies directed towards human proteins, plasma samples were investigated regarding metabolic dysfunctions. Since plasma is a complex sample, an optimization regarding protocol for quantification of expressed proteins was made. By using certain characteristics within the dataset, and by using a suspension bead microarray, the protocol could be evaluated. Expected characteristics within the dataset were found in the subsequent analysis, which showed that the protocol was functional. Using the same experimental outline will facilitate future applications, e.g. biomarker discovery.
Stockholm: KTH , 2008. , xi, 53 p.
Immunohistochemistry, protein expression, Antibody, Tissue microarray, protein quanitification, RNA and protein correlation
2008-10-10, Oscar Klein auditorium, Roslagstullsbacken 21, floor 4, Stockholm, 09:00 (English)