Change search
ReferencesLink to record
Permanent link

Direct link
Optimization and Application extension fora Bloom filter based sequence classifier
KTH, School of Computer Science and Communication (CSC).
2013 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]


Nowadays, with the development of sequencing technologies, more sequencing reads are

generated and involved in genomics research, which leads to a critical problem, how do people

process these data rapidly and accurately? A data structure named Bloom filter which is initially

developed in 1970 has been reused and applied more and more in Bioinformatics field for its

relatively high storage efficiency and fast accessing speed. As an application of Bloom filter

technique, FACS [1] system is a rapid and accurate sequence classifier. However, several

bottlenecks have restricted its usage, for instance, neither supporting large query file nor fastq

format files. Hence, in this report, an improved FACS system will be introduced, which includes a

hashing system for FACS; making FACS become large query files (>2GB) and compressed files

supported; making FACS become fastq file supported; making FACS system more user friendly

etc. Moreover, the new paralleled FACS system (FACS 2.0) will be introduced and evaluated to

prove that FACS 2.0 is at least 10 times faster and equally accurate compared with the original

FACS system, Fastq_screen [7] and Deconseq [8] when doing sequence decontamination process.

Last but not the least, the possibility of developing an adapter trimmer based on FACS system will

also be analyzed in this report.

Key words: Bloom filter; Decontamination; Adapter trimming; Parallelization; Large query file

(compressed and normal) supported;

Place, publisher, year, edition, pages
National Category
Computer Science
URN: urn:nbn:se:kth:diva-142011OAI: diva2:699439
Educational program
Master of Science in Engineering - Computer Science and Technology
Available from: 2014-03-13 Created: 2014-02-27 Last updated: 2014-03-13Bibliographically approved

Open Access in DiVA

fulltext(2122 kB)143 downloads
File information
File name FULLTEXT01.pdfFile size 2122 kBChecksum SHA-512
Type fulltextMimetype application/pdf

By organisation
School of Computer Science and Communication (CSC)
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 143 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 158 hits
ReferencesLink to record
Permanent link

Direct link