Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Gap filling as exact path length problem
KTH, School of Computer Science and Communication (CSC), Computational Biology, CB. KTH, Centres, Science for Life Laboratory, SciLifeLab.ORCID iD: 0000-0001-7378-2320
2015 (English)In: Research in Computational Molecular Biology. 19th Annual International Conference, RECOMB 2015. Proceedings, 2015, 281-292 p.Conference paper, Published paper (Refereed)
Abstract [en]

One of the last steps in a genome assembly project is filling the gaps between consecutive contigs in the scaffolds. This problem can be naturally stated as finding an s-t path in a directed graph whose sum of arc costs belongs to a given range (the estimate on the gap length). Here s and t are any two contigs flanking a gap. This problem is known to be NP-hard in general. Here we derive a simpler dynamic programming solution than already known, pseudo-polynomial in the maximum value of the input range. We implemented various practical optimizations to it, and compared our exact gap filling solution experimentally to popular gap filling tools. Summing over all the bacterial assemblies considered in our experiments, we can in total fill 28% more gaps than the best previous tool and the gaps filled by our method span 80% more sequence. Furthermore, the error level of the newly introduced sequence is comparable to that of the previous tools.

Place, publisher, year, edition, pages
2015. 281-292 p.
Series
Lecture Notes in Computer Science, 9029
Keyword [en]
Directed graphs, Filling, Molecular biology, Polynomials, Arc costs, Error levels, Gap filling, Gap length, Genome assembly, NP-hard, Path length, Programming solutions, Dynamic programming
National Category
Bioinformatics (Computational Biology)
Identifiers
URN: urn:nbn:se:kth:diva-168327DOI: 10.1007/978-3-319-16706-0_29ISI: 000361983900029Scopus ID: 2-s2.0-84926361099ISBN: 9783319167053 (print)OAI: oai:DiVA.org:kth-168327DiVA: diva2:818345
Conference
Research in Computational Molecular Biology. 19th Annual International Conference, RECOMB 2015, 12-15 April 2015, Warsaw, Poland
Funder
Science for Life Laboratory - a national resource center for high-throughput molecular bioscience
Note

QC 20150608

Available from: 2015-06-08 Created: 2015-06-02 Last updated: 2015-10-26Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textScopus

Authority records BETA

Sahlin, Kristoffer

Search in DiVA

By author/editor
Sahlin, Kristoffer
By organisation
Computational Biology, CBScience for Life Laboratory, SciLifeLab
Bioinformatics (Computational Biology)

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 51 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf