Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Transcript identification by analysis of short sequence tags-influence of tag length, restriction site and transcript database
KTH, Superseded Departments, Biotechnology.
KTH, Superseded Departments, Biotechnology.
2003 (English)In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 31, no 8, 2217-2226 p.Article in journal (Refereed) Published
Abstract [en]

There exist a number of gene expression profiling techniques that utilize restriction enzymes for generation of short expressed sequence tags. We have studied how the choice of restriction enzyme influences various characteristics of tags generated in an experiment. We have also investigated various aspects of in silico transcript identification that these profiling methods rely on. First, analysis of 14 248 mRNA sequences derived from the RefSeq transcript database showed that 1-30% of the sequences lack a given restriction enzyme recognition site. Moreover, 1-5% of the transcripts have recognition sites located less than 10 bases from the poly(A) tail. The uniqueness of 10 bp tags lies in the range 90-95%, which increases only slightly with longer tags, due to the existence of closely related transcripts. Furthermore, 3-30% of upstream 10 bp tags are identical to 3′ tags, introducing a risk of misclassification if upstream tags are present in a sample. Second, we found that a sequence length of 16-17 bp, including the recognition site, is sufficient for unique transcript identification by BLAST based sequence alignment to the UniGene Human non-redundant database. Third, we constructed a tag-to-gene mapping for UniGene and compared it to an existing mapping database. The mappings agreed to 79-83%, where the selection of representative sequences in the UniGene clusters is the main cause of the disagreement. The results of this study may serve to improve the interpretation of sequence-based expression studies and the design of hybridization arrays, by identifying short tags that have a high reliability and separating them from tags that carry an inherent ambiguity in their capacity to discriminate between genes. To this end, supplementary information in the form of a web companion to this paper is located at http://biobase.biotech.kth.se/tagseq.

Place, publisher, year, edition, pages
2003. Vol. 31, no 8, 2217-2226 p.
National Category
Industrial Biotechnology
Identifiers
URN: urn:nbn:se:kth:diva-5170DOI: 10.1093/nar/gkg313ISI: 000182161400026PubMedID: 12682372OAI: oai:DiVA.org:kth-5170DiVA: diva2:7976
Available from: 2004-10-08 Created: 2004-10-08 Last updated: 2017-12-04Bibliographically approved
In thesis
1. Computational approaches for in-depth analysis of cDNA sequence tags
Open this publication in new window or tab >>Computational approaches for in-depth analysis of cDNA sequence tags
2004 (English)Doctoral thesis, comprehensive summary (Other scientific)
Abstract [en]

Major recent improvements in biotechnology have led to an accelerated production of DNA sequences. The completion of the human genome sequence, along with the genomes of more than two hundred other species, has marked the arrival of the genome era. The ultimate goal is to understand the structure and function of genomes and their genes. This thesis has focused on the computational analysis of complementary DNA (cDNA) sequences. These are copies of mRNA transcripts that correspond to the coding regions of genomes.

Studying the expression patterns of genes is essential for understanding gene function. Many gene expression profiling techniques generate short sequence tags that derive from transcripts. A pilot study was performed to assess the feasibility of using the pyrosequencing platform for gene expression analysis. The sequences generated by pyrosequencing in most cases (≈ 85%) were long enough (> 18 nucleotides) to uniquely identify the corresponding transcripts through database searches. Aspects of transcript identification by short sequence tags were further investigated in a number of public databases, revealing that a tag length 16-17 nucleotides was sufficient for unique identifi- cation.

Longer transcript representations are obtained from expressed sequence tag (EST) sequencing. Method development for the analysis and maintenance of large EST data sets has been performed on data from poplar, which is a tree of commercial interest to the forest biotechnology industry. In 2003 a large ESTsequencing project reached > 100 000 reads, providing a unique resource for tree biology research. ESTs have been grouped into clusters and singletons that represent potential genes. Preliminary analyses have estimated gene content in Populus to be very similar to that of model organism Arabidopsis thaliana.

EST data collections provide a rich source for mining polymorphisms. A software application was developed and applied to EST data from two Populus species, and candidate single nucleotide polymorphisms (SNPs) were recorded. A study of genetic variation between the species revealed a striking similarity, with orthologous pairs being > 98% identical on the protein level.

Keywords: cDNA, EST, gene expression, SNP, SAGE, polymorphism, assembly, clustering, DNA sequencing, pyrosequencing, mRNA transcript, orthology, tree biotechnology, restriction enzyme

Place, publisher, year, edition, pages
Bioteknologi, 2004
Keyword
Biotechnology, cDNA, EST, gene expression, SNP, SAGE, polymorphism, Bioteknik
National Category
Industrial Biotechnology
Identifiers
urn:nbn:se:kth:diva-23 (URN)91-7283-837-X (ISBN)
Public defence
2004-10-08, E1, KTH, Lindstedsvägen 3, Stockholm, 13:00
Opponent
Supervisors
Available from: 2004-10-08 Created: 2004-10-08 Last updated: 2012-03-21Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textPubMed

Search in DiVA

By author/editor
Unneberg, PerLarsson, Magnus
By organisation
Biotechnology
In the same journal
Nucleic Acids Research
Industrial Biotechnology

Search outside of DiVA

GoogleGoogle Scholar

doi
pubmed
urn-nbn

Altmetric score

doi
pubmed
urn-nbn
Total: 38 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf