Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Detection and quantitative estimation of spurious double stranded DNA formation during reverse transcription in bateria using tagRNA-seq
KTH, School of Computer Science and Communication (CSC), Computational Biology, CB. (Computational Biological Physics, CBP)
INRA, UMR1319 Micalis, Domaine de Vilvert, F-78352, Jouy-en-Josas, France}\affiliation{AgroParisTech, UMR Micalis, Domaine de Vilvert, F-78350, Jouy-en-Josas, France.
KTH, School of Computer Science and Communication (CSC), Computational Biology, CB. (Computational Biological Physics, CBP)
2015 (English)In: RNA Biology, ISSN 1547-6286, E-ISSN 1555-8584Article in journal (Refereed) Published
Abstract [en]

Standard RNA-seq has a well know tendency to generate "ghost" antisense reads due to formation of spurious second strand cDNA in the sequencing process. We recently reported on a novel variant of RNA-seq coined "tagRNA-seq" introduced for the purpose of distinguishing primary from processed transcripts in bacteria. Incidentally, the additional information provided by the tag is also very suitable for detection of true anti-sense RNA transcripts and quantification of spurious antisense signals in a sample. We briefly explain how to perform such a detection and illustrate on previously published datasets.

Place, publisher, year, edition, pages
Taylor & Francis, 2015.
Keyword [en]
tagRNA-seq, spurious second strand cDNA, antisense RNA, complementary DNA, transcriptome, transcript discovery
National Category
Bioinformatics and Systems Biology Microbiology
Research subject
Biotechnology; Biological Physics
Identifiers
URN: urn:nbn:se:kth:diva-171378DOI: 10.1080/15476286.2015.1071010ISI: 000361473300018Scopus ID: 2-s2.0-84949803433OAI: oai:DiVA.org:kth-171378DiVA: diva2:843538
Note

QC 20150811

Available from: 2015-07-29 Created: 2015-07-29 Last updated: 2017-12-04Bibliographically approved
In thesis
1. Data Analysis and Next Generation Sequencing : Applications in Microbiology.
Open this publication in new window or tab >>Data Analysis and Next Generation Sequencing : Applications in Microbiology.
2015 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Next Generation Sequencing (NGS) is a new technology that has revolutionized the way we study living organisms. Where previously only a few genes could be studied at a time through targeted direct probing, NGS offers the possibility to perform measurements for a whole genome at once. The drawback is that the amount of data generated in the process is large and extracting useful information from it requires new methods to process and analyze it.

The main contribution of this thesis is the development of a novel experimental method coined tagRNA-seq, combining 5’tagRACE, a previously developed technique, with RNA-sequencing technology. Briefly, tagRNA-seq makes it possible to identify the 5’ ends of RNAs in bacteria and directly probe for their type, primary or processed, by ligating short RNA sequences, the tags, to the beginnings of RNA molecules. We used the method to directly probe for transcription start and processing sites in two bacterial species, Escherichiacoli and Enterococcus faecalis. It was also used to study polyadenylation in E. coli, where the ability to identify processed RNA molecules proved to be useful to separate direct and indirect regulatory effects of this mechanism. We also demonstrate how data from tagRNA-seq experiments can be used to increase confidence on the discovery of anti-sense transcripts in bacteria. Analyses of RNA-seq data obtained in the context of these experiments revealed subtle artifacts in the coverage signal towards gene ends, that we were able to explain and quantify based Kolmogorov’s broken stick model. We also discovered evidences for circularization of a few RNA transcripts, both in our own data sets and publicly available data.

Designing the tags used in tagRNA-seq led us to the problem of words absent from a text. We focus on a particular subset of these, the minimal absent words (MAWs), and develop a theory providing a complete description of their size distribution in random text. We also show that MAWs in genomes from viruses and living organisms almost always exhibit a behavior different from random texts in the tail of the distribution, and that MAWs from this tail are closely related to sequences present in the genome that preferentially appear in regions with important regulatory functions.

Finally, and independently from tagRNA-seq, we propose a new approach to the problem of bacterial community reconstruction in metagenomic, based on techniques from compressed sensing. We provide a novel algorithm competing with state-of-the-art techniques in the field.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2015. xviii, 154 p.
Series
TRITA-CSC-A, ISSN 1653-5723 ; 2015:15
Keyword
RNA-seq, tagRNA-seq, primary and processed RNA, Enterococcus faecalis, Complex transcription, Metagenomics, 5'tagRACE, minimal absent words, compressed sensing, metagenomics, bacterial community reconstruction
National Category
Bioinformatics (Computational Biology) Microbiology Other Biological Topics Genetics
Research subject
Biological Physics
Identifiers
urn:nbn:se:kth:diva-173219 (URN)978-91-7595-699-2 (ISBN)
Public defence
2015-10-30, FA32, Roslagstullsbacken 21, Stockholm, 14:00 (English)
Opponent
Supervisors
Note

QC 20150930

Available from: 2015-09-30 Created: 2015-09-07 Last updated: 2015-11-06Bibliographically approved

Open Access in DiVA

fulltext(1351 kB)129 downloads
File information
File name FULLTEXT01.pdfFile size 1351 kBChecksum SHA-512
66bf250c09a6a96cfce4cba7f8f66319aa4e89cf80bcd4d386c4c2083145109ae2cd5a0703902719414127cd04d6f535b2ec9026e396735756d4e3bbae03c2ae
Type fulltextMimetype application/pdf

Other links

Publisher's full textScopusTandfonline

Search in DiVA

By author/editor
Innocenti, NicolasAurell, Erik
By organisation
Computational Biology, CB
In the same journal
RNA Biology
Bioinformatics and Systems BiologyMicrobiology

Search outside of DiVA

GoogleGoogle Scholar
Total: 129 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 185 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf