Classification of DNA sequences using Bloom filters
2010 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1460-2059, Vol. 26, no 13, 1595-1600 p.Article in journal (Refereed) Published
Motivation: New generation sequencing technologies producing increasingly complex datasets demand new efficient and specialized sequence analysis algorithms. Often, it is only the 'novel' sequences in a complex dataset that are of interest and the superfluous sequences need to be removed. Results: A novel algorithm, fast and accurate classification of sequences (FACSs), is introduced that can accurately and rapidly classify sequences as belonging or not belonging to a reference sequence. FACS was first optimized and validated using a synthetic metagenome dataset. An experimental metagenome dataset was then used to show that FACS achieves comparable accuracy as BLAT and SSAHA2 but is at least 21 times faster in classifying sequences.
Place, publisher, year, edition, pages
2010. Vol. 26, no 13, 1595-1600 p.
Biochemistry and Molecular Biology
IdentifiersURN: urn:nbn:se:kth:diva-27282DOI: 10.1093/bioinformatics/btq230ISI: 000278967500003ScopusID: 2-s2.0-77954187316OAI: oai:DiVA.org:kth-27282DiVA: diva2:377544
QC 201012142010-12-142010-12-092011-11-15Bibliographically approved