Change search
ReferencesLink to record
Permanent link

Direct link
Gap Filling as Exact Path Length Problem
2016 (English)In: Journal of Computational Biology, ISSN 1066-5277, E-ISSN 1557-8666, Vol. 23, no 5, 347-361 p.Article in journal (Refereed) PublishedText
Abstract [en]

One of the last steps in a genome assembly project is filling the gaps between consecutive contigs in the scaffolds. This problem can be naturally stated as finding an s-t path in a directed graph whose sum of arc costs belongs to a given range (the estimate on the gap length). Here s and t are any two contigs flanking a gap. This problem is known to be NP-hard in general. Here we derive a simpler dynamic programming solution than already known, pseudo-polynomial in the maximum value of the input range. We implemented various practical optimizations to it, and compared our exact gap-filling solution experimentally to popular gap-filling tools. Summing over all the bacterial assemblies considered in our experiments, we can in total fill 76% more gaps than the best previous tool, and the gaps filled by our method span 136% more sequence. Furthermore, the error level of the newly introduced sequence is comparable to that of the previous tools. The experiments also show that our exact approach does not easily scale to larger genomes, where the problem is in general difficult for all tools.

Place, publisher, year, edition, pages
Mary Ann Liebert Inc , 2016. Vol. 23, no 5, 347-361 p.
Keyword [en]
de novo assembly, dynamic programming, gap filling, graph algorithms
National Category
Bioinformatics (Computational Biology)
URN: urn:nbn:se:kth:diva-188450DOI: 10.1089/cmb.2015.0197ISI: 000376080500006PubMedID: 26959081ScopusID: 2-s2.0-84969221176OAI: diva2:936175
Swedish Research Council, 2010-4634

QC 20160613

Available from: 2016-06-13 Created: 2016-06-10 Last updated: 2016-06-13Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textPubMedScopus
In the same journal
Journal of Computational Biology
Bioinformatics (Computational Biology)

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 6 hits
ReferencesLink to record
Permanent link

Direct link