Gap filling as exact path length problem
2015 (English)In: Research in Computational Molecular Biology. 19th Annual International Conference, RECOMB 2015. Proceedings, 2015, 281-292 p.Conference paper (Refereed)
One of the last steps in a genome assembly project is filling the gaps between consecutive contigs in the scaffolds. This problem can be naturally stated as finding an s-t path in a directed graph whose sum of arc costs belongs to a given range (the estimate on the gap length). Here s and t are any two contigs flanking a gap. This problem is known to be NP-hard in general. Here we derive a simpler dynamic programming solution than already known, pseudo-polynomial in the maximum value of the input range. We implemented various practical optimizations to it, and compared our exact gap filling solution experimentally to popular gap filling tools. Summing over all the bacterial assemblies considered in our experiments, we can in total fill 28% more gaps than the best previous tool and the gaps filled by our method span 80% more sequence. Furthermore, the error level of the newly introduced sequence is comparable to that of the previous tools.
Place, publisher, year, edition, pages
2015. 281-292 p.
, Lecture Notes in Computer Science, 9029
Directed graphs, Filling, Molecular biology, Polynomials, Arc costs, Error levels, Gap filling, Gap length, Genome assembly, NP-hard, Path length, Programming solutions, Dynamic programming
Bioinformatics (Computational Biology)
IdentifiersURN: urn:nbn:se:kth:diva-168327DOI: 10.1007/978-3-319-16706-0_29ISI: 000361983900029ScopusID: 2-s2.0-84926361099ISBN: 9783319167053OAI: oai:DiVA.org:kth-168327DiVA: diva2:818345
Research in Computational Molecular Biology. 19th Annual International Conference, RECOMB 2015, 12-15 April 2015, Warsaw, Poland
FunderScience for Life Laboratory - a national resource center for high-throughput molecular bioscience
QC 201506082015-06-082015-06-022015-10-26Bibliographically approved