Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
The epitope space of the human proteome
KTH, School of Biotechnology (BIO).
KTH, School of Biotechnology (BIO).
KTH, School of Biotechnology (BIO).ORCID iD: 0000-0003-0996-1644
KTH, School of Biotechnology (BIO).ORCID iD: 0000-0001-8993-048X
2008 (English)In: Protein Science, ISSN 0961-8368, E-ISSN 1469-896X, Vol. 17, no 4, 606-613 p.Article in journal (Refereed) Published
Abstract [en]

In the post-genome era, there is a great need for protein-specific affinity reagents to explore the human proteome. Antibodies are suitable as reagents, but generation of antibodies with low cross-reactivity to other human proteins requires careful selection of antigens. Here we show the results from a proteomewide effort to map linear epitopes based on uniqueness relative to the entire human proteome. The analysis was based on a sliding window sequence similarity search using short windows (8, 10, and 12 amino acid residues). A comparison of exact string matching (Hamming distance) and a heuristic method (BLAST) was performed, showing that the heuristic method combined with a grid strategy allows for whole proteome analysis with high accuracy and feasible run times. The analysis shows that it is possible to find unique antigens for a majority of the human proteins, with relatively strict rules involving low sequence identity of the possible linear epitopes. The implications for human antibody-based proteomics efforts are discussed.

Place, publisher, year, edition, pages
2008. Vol. 17, no 4, 606-613 p.
Keyword [en]
proteomics; antigen; epitope; sequence similarity; antibody; grid; B-CELL EPITOPES; ANTIGENIC DETERMINANTS; DATA-BANK; PREDICTION; ANTIBODIES; MYOGLOBIN; PEPTIDES; SEQUENCE; LIBRARIES; PROTEINS
National Category
Industrial Biotechnology
Identifiers
URN: urn:nbn:se:kth:diva-8255DOI: 10.1110/ps.073347208ISI: 000254197800002Scopus ID: 2-s2.0-41649104373OAI: oai:DiVA.org:kth-8255DiVA: diva2:13529
Note
QC 20100705Available from: 2008-04-22 Created: 2008-04-22 Last updated: 2017-12-14Bibliographically approved
In thesis
1. Selection of antigens for antibody-based proteomics
Open this publication in new window or tab >>Selection of antigens for antibody-based proteomics
2008 (English)Doctoral thesis, comprehensive summary (Other scientific)
Abstract [en]

The human genome is predicted to contain ~20,500 protein-coding genes. The encoded proteins are the key players in the body, but the functions and localizations of most proteins are still unknown. Antibody-based proteomics has great potential for exploration of the protein complement of the human genome, but there are antibodies only to a very limited set of proteins. The Human Proteome Resource (HPR) project was launched in August 2003, with the aim to generate high-quality specific antibodies towards the human proteome, and to use these antibodies for large-scale protein profiling in human tissues and cells.

The goal of the work presented in this thesis was to evaluate if antigens can be selected, in a high-throughput manner, to enable generation of specific antibodies towards one protein from every human gene. A computationally intensive analysis of potential epitopes in the human proteome was performed and showed that it should be possible to find unique epitopes for most human proteins. The result from this analysis was implemented in a new web-based visualization tool for antigen selection. Predicted protein features important for antigen selection, such as transmembrane regions and signal peptides, are also displayed in the tool. The antigens used in HPR are named protein epitope signature tags (PrESTs). A genome-wide analysis combining different protein features revealed that it should be possible to select unique, 50 amino acids long PrESTs for ~80% of the human protein-coding genes.

The PrESTs are transferred from the computer to the laboratory by design of PrEST-specific PCR primers. A study of the success rate in PCR cloning of the selected fragments demonstrated the importance of controlled GC-content in the primers for specific amplification. The PrEST protein is produced in bacteria and used for immunization and subsequent affinity purification of the resulting sera to generate mono-specific antibodies. The antibodies are tested for specificity and approved antibodies are used for tissue profiling in normal and cancer tissues. A large-scale analysis of the success rates for different PrESTs in the experimental pipeline of the HPR project showed that the total success rate from PrEST selection to an approved antibody is 31%, and that this rate is dependent on PrEST length. A second PrEST on a target protein is somewhat less likely to succeed in the HPR pipeline if the first PrEST is unsuccessful, but the analysis shows that it is valuable to select several PrESTs for each protein, to enable generation of at least two antibodies, which can be used to validate each other.

Place, publisher, year, edition, pages
Stockholm: KTH, 2008. 65 p.
Series
Trita-BIO-Report, ISSN 1654-2312 ; 2008:5
Keyword
epitope, antigen, antibody, affinity, protein, proteome, proteomics, bioinformatics, prediction, primer design, sequence similarity
National Category
Industrial Biotechnology
Identifiers
urn:nbn:se:kth:diva-4706 (URN)978-91-7178-930-3 (ISBN)
Public defence
2008-05-09, F3, Lindstedsvägen 26, Stockholm, 10:00
Opponent
Supervisors
Note
QC 20100705Available from: 2008-04-22 Created: 2008-04-22 Last updated: 2010-09-15Bibliographically approved
2. Grid and High-Performance Computing for Applied Bioinformatics
Open this publication in new window or tab >>Grid and High-Performance Computing for Applied Bioinformatics
2007 (English)Doctoral thesis, comprehensive summary (Other scientific)
Abstract [en]

The beginning of the twenty-first century has been characterized by an explosion of biological information. The avalanche of data grows daily and arises as a consequence of advances in the fields of molecular biology and genomics and proteomics. The challenge for nowadays biologist lies in the de-codification of this huge and complex data, in order to achieve a better understanding of how our genes shape who we are, how our genome evolved, and how we function.

Without the annotation and data mining, the information provided by for example high throughput genomic sequencing projects is not very useful. Bioinformatics is the application of computer science and technology to the management and analysis of biological data, in an effort to address biological questions. The work presented in this thesis has focused on the use of Grid and High Performance Computing for solving computationally expensive bioinformatics tasks, where, due to the very large amount of available data and the complexity of the tasks, new solutions are required for efficient data analysis and interpretation.

Three major research topics are addressed; First, the use of grids for distributing the execution of sequence based proteomic analysis, its application in optimal epitope selection and in a proteome-wide effort to map the linear epitopes in the human proteome. Second, the application of grid technology in genetic association studies, which enabled the analysis of thousand of simulated genotypes, and finally the development and application of a economic based model for grid-job scheduling and resource administration.

The applications of the grid based technology developed in the present investigation, results in successfully tagging and linking chromosomes regions in Alzheimer disease, proteome-wide mapping of the linear epitopes, and the development of a Market-Based Resource Allocation in Grid for Scientific Applications.

Place, publisher, year, edition, pages
Stockholm: KTH, 2007
Series
Trita-BIO-Report, ISSN 1654-2312 ; 2007:9
Keyword
Grid computing, bioinformatics, genomics, proteomics
National Category
Bioinformatics (Computational Biology)
Identifiers
urn:nbn:se:kth:diva-4573 (URN)978-91-7178-782-8 (ISBN)
Public defence
2007-12-21, FD5, AlbaNova, oslagstullsbacken 21, Stockholm, 10:00
Opponent
Supervisors
Note
QC 20100622Available from: 2007-12-10 Created: 2007-12-10 Last updated: 2012-03-20Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textScopus

Authority records BETA

Uhlén, Mathias

Search in DiVA

By author/editor
Berglund, LisaAndrade, JorgeOdeberg, JacobUhlén, Mathias
By organisation
School of Biotechnology (BIO)
In the same journal
Protein Science
Industrial Biotechnology

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 129 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf