Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Optimization and Application extension fora Bloom filter based sequence classifier
KTH, School of Computer Science and Communication (CSC).
2013 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Abstract

Nowadays, with the development of sequencing technologies, more sequencing reads are

generated and involved in genomics research, which leads to a critical problem, how do people

process these data rapidly and accurately? A data structure named Bloom filter which is initially

developed in 1970 has been reused and applied more and more in Bioinformatics field for its

relatively high storage efficiency and fast accessing speed. As an application of Bloom filter

technique, FACS [1] system is a rapid and accurate sequence classifier. However, several

bottlenecks have restricted its usage, for instance, neither supporting large query file nor fastq

format files. Hence, in this report, an improved FACS system will be introduced, which includes a

hashing system for FACS; making FACS become large query files (>2GB) and compressed files

supported; making FACS become fastq file supported; making FACS system more user friendly

etc. Moreover, the new paralleled FACS system (FACS 2.0) will be introduced and evaluated to

prove that FACS 2.0 is at least 10 times faster and equally accurate compared with the original

FACS system, Fastq_screen [7] and Deconseq [8] when doing sequence decontamination process.

Last but not the least, the possibility of developing an adapter trimmer based on FACS system will

also be analyzed in this report.

Key words: Bloom filter; Decontamination; Adapter trimming; Parallelization; Large query file

(compressed and normal) supported;

Place, publisher, year, edition, pages
2013.
National Category
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-142011OAI: oai:DiVA.org:kth-142011DiVA: diva2:699439
Educational program
Master of Science in Engineering - Computer Science and Technology
Supervisors
Examiners
Available from: 2014-03-13 Created: 2014-02-27 Last updated: 2014-03-13Bibliographically approved

Open Access in DiVA

fulltext(2122 kB)191 downloads
File information
File name FULLTEXT01.pdfFile size 2122 kBChecksum SHA-512
a8f67baf9ebd423a2b5bbcbc900af2f2601efc6880d7562e05c265587e10c18f9d5a65cf7a114c7ee0fa47ad7781572f72ebd256fbb051e69a360ec95c446dae
Type fulltextMimetype application/pdf

By organisation
School of Computer Science and Communication (CSC)
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 191 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 215 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf