Change search
ReferencesLink to record
Permanent link

Direct link
Gap filling as exact path length problem
KTH, School of Computer Science and Communication (CSC), Computational Biology, CB. KTH, Centres, Science for Life Laboratory, SciLifeLab.ORCID iD: 0000-0001-7378-2320
2015 (English)In: Research in Computational Molecular Biology. 19th Annual International Conference, RECOMB 2015. Proceedings, 2015, 281-292 p.Conference paper (Refereed)
Abstract [en]

One of the last steps in a genome assembly project is filling the gaps between consecutive contigs in the scaffolds. This problem can be naturally stated as finding an s-t path in a directed graph whose sum of arc costs belongs to a given range (the estimate on the gap length). Here s and t are any two contigs flanking a gap. This problem is known to be NP-hard in general. Here we derive a simpler dynamic programming solution than already known, pseudo-polynomial in the maximum value of the input range. We implemented various practical optimizations to it, and compared our exact gap filling solution experimentally to popular gap filling tools. Summing over all the bacterial assemblies considered in our experiments, we can in total fill 28% more gaps than the best previous tool and the gaps filled by our method span 80% more sequence. Furthermore, the error level of the newly introduced sequence is comparable to that of the previous tools.

Place, publisher, year, edition, pages
2015. 281-292 p.
, Lecture Notes in Computer Science, 9029
Keyword [en]
Directed graphs, Filling, Molecular biology, Polynomials, Arc costs, Error levels, Gap filling, Gap length, Genome assembly, NP-hard, Path length, Programming solutions, Dynamic programming
National Category
Bioinformatics (Computational Biology)
URN: urn:nbn:se:kth:diva-168327DOI: 10.1007/978-3-319-16706-0_29ISI: 000361983900029ScopusID: 2-s2.0-84926361099ISBN: 9783319167053OAI: diva2:818345
Research in Computational Molecular Biology. 19th Annual International Conference, RECOMB 2015, 12-15 April 2015, Warsaw, Poland
Science for Life Laboratory - a national resource center for high-throughput molecular bioscience

QC 20150608

Available from: 2015-06-08 Created: 2015-06-02 Last updated: 2015-10-26Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textScopus

Search in DiVA

By author/editor
Sahlin, Kristoffer
By organisation
Computational Biology, CBScience for Life Laboratory, SciLifeLab
Bioinformatics (Computational Biology)

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 42 hits
ReferencesLink to record
Permanent link

Direct link