kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
ClusTrast: a short read de novo transcript isoform assembler guided by clustered contigs
KTH, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.ORCID iD: 0000-0002-6937-9245
KTH, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab. Department of Medicine Huddinge, Center for Hematology and Regenerative Medicine (HERM), Karolinska Institute, 141 52, Flemingsberg, Sweden.ORCID iD: 0000-0002-2575-0807
KTH, Centres, Science for Life Laboratory, SciLifeLab. KTH, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), Gene Technology.ORCID iD: 0000-0002-8879-9245
2024 (English)In: BMC Bioinformatics, E-ISSN 1471-2105, Vol. 25, no 1, article id 54Article in journal (Refereed) Published
Abstract [en]

Background: Transcriptome assembly from RNA-sequencing data in species without a reliable reference genome has to be performed de novo, but studies have shown that de novo methods often have inadequate ability to reconstruct transcript isoforms. We address this issue by constructing an assembly pipeline whose main purpose is to produce a comprehensive set of transcript isoforms. Results: We present the de novo transcript isoform assembler ClusTrast, which takes short read RNA-seq data as input, assembles a primary assembly, clusters a set of guiding contigs, aligns the short reads to the guiding contigs, assembles each clustered set of short reads individually, and merges the primary and clusterwise assemblies into the final assembly. We tested ClusTrast on real datasets from six eukaryotic species, and showed that ClusTrast reconstructed more expressed known isoforms than any of the other tested de novo assemblers, at a moderate reduction in precision. For recall, ClusTrast was on top in the lower end of expression levels (<15% percentile) for all tested datasets, and over the entire range for almost all datasets. Reference transcripts were often (35–69% for the six datasets) reconstructed to at least 95% of their length by ClusTrast, and more than half of reference transcripts (58–81%) were reconstructed with contigs that exhibited polymorphism, measuring on a subset of reliably predicted contigs. ClusTrast recall increased when using a union of assembled transcripts from more than one assembly tool as primary assembly. Conclusion: We suggest that ClusTrast can be a useful tool for studying isoforms in species without a reliable reference genome, in particular when the goal is to produce a comprehensive transcriptome set with polymorphic variants.

Place, publisher, year, edition, pages
Springer Nature , 2024. Vol. 25, no 1, article id 54
Keywords [en]
De novo transcriptome assembly, Guiding contigs, Isoform assembly, Recall/sensitivity, RNA-seq
National Category
Bioinformatics (Computational Biology)
Identifiers
URN: urn:nbn:se:kth:diva-343497DOI: 10.1186/s12859-024-05663-3ISI: 001155913600002PubMedID: 38302873Scopus ID: 2-s2.0-85183701397OAI: oai:DiVA.org:kth-343497DiVA, id: diva2:1837870
Note

Not duplicate with DiVA 1810720

QC 20240215

Available from: 2024-02-15 Created: 2024-02-15 Last updated: 2024-02-27Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textPubMedScopus

Authority records

Westrin, Karl JohanKretzschmar, Warren W.Emanuelsson, Olof

Search in DiVA

By author/editor
Westrin, Karl JohanKretzschmar, Warren W.Emanuelsson, Olof
By organisation
Gene TechnologyScience for Life Laboratory, SciLifeLab
In the same journal
BMC Bioinformatics
Bioinformatics (Computational Biology)

Search outside of DiVA

GoogleGoogle Scholar

doi
pubmed
urn-nbn

Altmetric score

doi
pubmed
urn-nbn
Total: 47 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf