Change search
ReferencesLink to record
Permanent link

Direct link
Integrative annotation of 21,037 human genes validated by full-length cDNA clones
Show others and affiliations
2004 (English)In: PLoS biology, ISSN 1544-9173, E-ISSN 1545-7885, Vol. 2, no 6, 856-875 p.Article in journal (Refereed) Published
Abstract [en]

The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology.

Place, publisher, year, edition, pages
2004. Vol. 2, no 6, 856-875 p.
Keyword [en]
complementary DNA, microsatellite DNA, protein, RNA, alternative RNA splicing, article, cellular distribution, controlled study, data base, gene cassette, gene deletion, gene function, gene identification, gene insertion, gene location, gene locus, gene mapping, gene sequence, gene structure, genetic analysis, genetic polymorphism, genetic transcription, genetic variability, genome analysis, human, human genetics, international cooperation, metabolism, molecular cloning, nucleotide sequence, open reading frame, organization, phenotype, prediction, protein structure, qualitative analysis, reliability, RNA analysis, sequence analysis, single nucleotide polymorphism, structure analysis, validation process, biology, gene, genetic database, genetics, genome, Internet, methodology, physiology, protein tertiary structure, insertion sequences, Alternative Splicing, Computational Biology, Databases, Genetic, DNA, Complementary, Genes, Genome, Human, Humans, Microsatellite Repeats, Open Reading Frames, Polymorphism, Genetic, Polymorphism, Single Nucleotide, Protein Structure, Tertiary
National Category
Biochemistry and Molecular Biology
URN: urn:nbn:se:kth:diva-157181DOI: 10.1371/journal.pbio.0020162ISI: 000222380400025PubMedID: 15103394ScopusID: 2-s2.0-4344623260OAI: diva2:770626

QC 20141211

Available from: 2014-12-11 Created: 2014-12-08 Last updated: 2014-12-11Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textPubMedScopus

Search in DiVA

By author/editor
Unneberg, Per
By organisation
In the same journal
PLoS biology
Biochemistry and Molecular Biology

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 24 hits
ReferencesLink to record
Permanent link

Direct link