Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Computational approaches for in-depth analysis of cDNA sequence tags
KTH, Superseded Departments, Biotechnology.
2004 (English)Doctoral thesis, comprehensive summary (Other scientific)
Abstract [en]

Major recent improvements in biotechnology have led to an accelerated production of DNA sequences. The completion of the human genome sequence, along with the genomes of more than two hundred other species, has marked the arrival of the genome era. The ultimate goal is to understand the structure and function of genomes and their genes. This thesis has focused on the computational analysis of complementary DNA (cDNA) sequences. These are copies of mRNA transcripts that correspond to the coding regions of genomes.

Studying the expression patterns of genes is essential for understanding gene function. Many gene expression profiling techniques generate short sequence tags that derive from transcripts. A pilot study was performed to assess the feasibility of using the pyrosequencing platform for gene expression analysis. The sequences generated by pyrosequencing in most cases (≈ 85%) were long enough (> 18 nucleotides) to uniquely identify the corresponding transcripts through database searches. Aspects of transcript identification by short sequence tags were further investigated in a number of public databases, revealing that a tag length 16-17 nucleotides was sufficient for unique identifi- cation.

Longer transcript representations are obtained from expressed sequence tag (EST) sequencing. Method development for the analysis and maintenance of large EST data sets has been performed on data from poplar, which is a tree of commercial interest to the forest biotechnology industry. In 2003 a large ESTsequencing project reached > 100 000 reads, providing a unique resource for tree biology research. ESTs have been grouped into clusters and singletons that represent potential genes. Preliminary analyses have estimated gene content in Populus to be very similar to that of model organism Arabidopsis thaliana.

EST data collections provide a rich source for mining polymorphisms. A software application was developed and applied to EST data from two Populus species, and candidate single nucleotide polymorphisms (SNPs) were recorded. A study of genetic variation between the species revealed a striking similarity, with orthologous pairs being > 98% identical on the protein level.

Keywords: cDNA, EST, gene expression, SNP, SAGE, polymorphism, assembly, clustering, DNA sequencing, pyrosequencing, mRNA transcript, orthology, tree biotechnology, restriction enzyme

Place, publisher, year, edition, pages
Bioteknologi , 2004.
Keyword [en]
Biotechnology, cDNA, EST, gene expression, SNP, SAGE, polymorphism
Keyword [sv]
Bioteknik
National Category
Industrial Biotechnology
Identifiers
URN: urn:nbn:se:kth:diva-23ISBN: 91-7283-837-X (print)OAI: oai:DiVA.org:kth-23DiVA: diva2:7980
Public defence
2004-10-08, E1, KTH, Lindstedsvägen 3, Stockholm, 13:00
Opponent
Supervisors
Available from: 2004-10-08 Created: 2004-10-08 Last updated: 2012-03-21Bibliographically approved
List of papers
1. Gene expression analysis by signature pyrosequencing
Open this publication in new window or tab >>Gene expression analysis by signature pyrosequencing
Show others...
2002 (English)In: Gene, ISSN 0378-1119, E-ISSN 1879-0038, Vol. 289, no 1-2, 31-39 p.Article in journal (Refereed) Published
Abstract [en]

 We describe a novel method for transcript profiling based on high-throughput parallel sequencing of signature tags using a non-gel-based microtiter plate format. The method relies on the identification of cDNA clones by pyrosequencing of the region corresponding to the 3'-end of the mRNA preceding the poly(A) tail. Simultaneously, the method can be used for gene discovery, since tags corresponding to unknown genes can be further characterized by extended sequencing. The protocol was validated using a model system for human atherosclerosis. Two 3'-tagged cDNA libraries, representing macrophages and foam cells, which are key components in the development of atherosclerotic plaques, were constructed using a solid phase approach. The libraries were analyzed by pyrosequencing, giving on average 25 bases. As a control, conventional expressed sequence tag (EST) sequencing using slab gel electrophoresis was performed. Homology searches were used to identify the genes corresponding to each tag. Comparisons with EST sequencing showed identical, unique matches in the majority of cases when the pyrosignature was at least 18 bases. A visualization tool was developed to facilitate differential analysis using a virtual chip format. The analysis resulted in identification of genes with possible relevance for development of atherosclerosis. The use of the method for automated massive parallel signature sequencing is discussed.

Keyword
3 '-tagged cDNA library, virtual chip, atherosclerosis, DNA sequencing
National Category
Industrial Biotechnology
Identifiers
urn:nbn:se:kth:diva-5169 (URN)10.1016/S0378-1119(02)00548-6 (DOI)000177867000005 ()
Note
QC 20100915Available from: 2004-10-08 Created: 2004-10-08 Last updated: 2017-12-04Bibliographically approved
2. Transcript identification by analysis of short sequence tags-influence of tag length, restriction site and transcript database
Open this publication in new window or tab >>Transcript identification by analysis of short sequence tags-influence of tag length, restriction site and transcript database
2003 (English)In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 31, no 8, 2217-2226 p.Article in journal (Refereed) Published
Abstract [en]

There exist a number of gene expression profiling techniques that utilize restriction enzymes for generation of short expressed sequence tags. We have studied how the choice of restriction enzyme influences various characteristics of tags generated in an experiment. We have also investigated various aspects of in silico transcript identification that these profiling methods rely on. First, analysis of 14 248 mRNA sequences derived from the RefSeq transcript database showed that 1-30% of the sequences lack a given restriction enzyme recognition site. Moreover, 1-5% of the transcripts have recognition sites located less than 10 bases from the poly(A) tail. The uniqueness of 10 bp tags lies in the range 90-95%, which increases only slightly with longer tags, due to the existence of closely related transcripts. Furthermore, 3-30% of upstream 10 bp tags are identical to 3′ tags, introducing a risk of misclassification if upstream tags are present in a sample. Second, we found that a sequence length of 16-17 bp, including the recognition site, is sufficient for unique transcript identification by BLAST based sequence alignment to the UniGene Human non-redundant database. Third, we constructed a tag-to-gene mapping for UniGene and compared it to an existing mapping database. The mappings agreed to 79-83%, where the selection of representative sequences in the UniGene clusters is the main cause of the disagreement. The results of this study may serve to improve the interpretation of sequence-based expression studies and the design of hybridization arrays, by identifying short tags that have a high reliability and separating them from tags that carry an inherent ambiguity in their capacity to discriminate between genes. To this end, supplementary information in the form of a web companion to this paper is located at http://biobase.biotech.kth.se/tagseq.

National Category
Industrial Biotechnology
Identifiers
urn:nbn:se:kth:diva-5170 (URN)10.1093/nar/gkg313 (DOI)000182161400026 ()12682372 (PubMedID)
Available from: 2004-10-08 Created: 2004-10-08 Last updated: 2017-12-04Bibliographically approved
3. A Populus EST resource for plant functional genomics
Open this publication in new window or tab >>A Populus EST resource for plant functional genomics
Show others...
2004 (English)In: Proceedings of the National Academy of Sciences of the United States of America, ISSN 0027-8424, E-ISSN 1091-6490, Vol. 101, no 38, 13951-13956 p.Article in journal (Refereed) Published
Abstract [en]

Trees present a life form of paramount importance for terrestrial ecosystems and human societies because of their ecological structure and physiological function and provision of energy and industrial materials. The genus Populus is the internationally accepted model for molecular tree biology. We have analyzed 102,019 Populus ESTs that clustered into 11,885 clusters and 12,759 singletons. We also provide >4,000 assembled full clone sequences to serve as a basis for the upcoming annotation of the Populus genome sequence. A public web-based EST database (POPULUSDB) provides digital expression profiles for 18 tissues that comprise the majority of differentiated organs. The coding content of Populus and Arabidopsis genomes shows very high similarity, indicating that differences between these annual and perennial angiosperm life forms result primarily from differences in gene regulation. The high similarity between Populus and Arabidopsis will allow studies of Populus to directly benefit from the detailed functional genomic information generated for Arabidopsis, enabling detailed insights into tree development and adaptation. These data will also valuable for functional genomic efforts in Arabidopsis.

Keyword
gene-expression, draft sequence, arabidopsis, poplar, evolution, biology
National Category
Biological Sciences
Identifiers
urn:nbn:se:kth:diva-23752 (URN)10.1073/pnas.0401641101 (DOI)000224069800046 ()2-s2.0-4644282569 (Scopus ID)
Note
QC 20100525 QC 20110916Available from: 2010-08-10 Created: 2010-08-10 Last updated: 2017-12-12Bibliographically approved
4. SNP discovery using advanced algorithms and neural networks
Open this publication in new window or tab >>SNP discovery using advanced algorithms and neural networks
2005 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 21, no 10, 2528-2530 p.Article in journal (Refereed) Published
Abstract [en]

Forage is an application which uses two neural networks for detecting single nucleotide polymorphisms (SNPs). Potential SNP candidates are identified in multiple alignments. Each candidate is then represented by a vector of features, which is classified as SNP or monomorphic by the networks. A validated dataset of SNPs was constructed from experimentally verified SNP data and used for network training and method evalutation.

Keyword
access to information, algorithm, article, artificial neural network, forage, gene frequency, information processing, priority journal, single nucleotide polymorphism, statistical analysis, validation process
National Category
Industrial Biotechnology
Identifiers
urn:nbn:se:kth:diva-5172 (URN)10.1093/bioinformatics/bti354 (DOI)000229285600053 ()15746291 (PubMedID)2-s2.0-19544386177 (Scopus ID)
Note
QC 20100929. Uppdaterad från Manuskript till Artikel (20100929). Tidigare titel: "SNP discovery usin advanced algorithms and nuural networks".Available from: 2004-10-08 Created: 2004-10-08 Last updated: 2017-12-04Bibliographically approved
5. Unravelling differences in the transcriptome of two closely related populus special
Open this publication in new window or tab >>Unravelling differences in the transcriptome of two closely related populus special
Show others...
(English)Manuscript (Other academic)
National Category
Industrial Biotechnology
Identifiers
urn:nbn:se:kth:diva-5173 (URN)
Note
QC 20110214Available from: 2004-10-08 Created: 2004-10-08 Last updated: 2011-02-15Bibliographically approved

Open Access in DiVA

fulltext(392 kB)1746 downloads
File information
File name FULLTEXT01.pdfFile size 392 kBChecksum MD5
7e5714a4596e66b0cbd70b5f3e570a9a07d122726385016e38a1e5039af1ffa5ca289998
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Unneberg, Per
By organisation
Biotechnology
Industrial Biotechnology

Search outside of DiVA

GoogleGoogle Scholar
Total: 1746 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 671 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf