ERNE-BS5: Aligning BS-treated sequences by multiple hits on a 5-letters alphabet
2012 (English)In: BCB '12 Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine, ACM , 2012, 12-19 p.Conference paper (Refereed)
Cytosine methylation is a DNA modification that has great impact on the regulation of gene expression and important implications for the biology and health of several living beings, including humans. Bisulfite conversion followed by next generation sequencing (BS-seq) of DNA is the gold standard technique used to detect DNA methylation at single-base resolution on a genome scale through the identification of 5-methylcytosine (5-mC). However, by converting unmethylated cytosines into thymines, BS-seq poses computational challenges to read alignment and aggravates the issue of multiple hits due to the ambiguity raised by the reduced sequence complexity. Here we present ERNE-BS5 (Extended Randomized Numerical alignEr - BiSulfite 5 ), an aligning program developed to efficiently map BS-treated reads against large genomes (e.g., human). To achieve this goal we have implemented three different ideas: (i) we use a 5-letters alphabet for storing methylation information, (ii) we use a weighted context-aware Hamming distance to identify a T coming from an unmethylated C context, and (iii) we use an iterative process to position multiple-hit reads starting from a preliminary map built using single-hit alignments. The map is corrected and extended at each cycle using the alignments added in the previous iteration. ERNE-BS5 is based on a new improved version of the rNA  aligning software with a more efficient core. ERNE (Extended Randomized Numerical alignEr) is a short string alignment package whose goal is to provide an all-inclusive set of tools to handle short reads. ERNE comprises: ERNE-MAP, ERNE-DMAP, ERNEFILTER, ERNE-VISUAL, and, from now on, ERNE-BS5. ERNE is free software and distributed with an Open Source License (GPL V3) and can be downloaded at: http://erne.sourceforge.net.
Place, publisher, year, edition, pages
ACM , 2012. 12-19 p.
Algorithms, Computational challenges, Context-Aware, DNA Methylation, DNA modification, Free software, Gold standards, Iterative process, Open sources, String alignment, Alignment, Alkylation, Gene encoding, Gene expression, Hamming distance, Iterative methods, Methylation, Open systems, Bioinformatics
Bioinformatics (Computational Biology)
IdentifiersURN: urn:nbn:se:kth:diva-108002DOI: 10.1145/2382936.2382938ScopusID: 2-s2.0-84869466321ISBN: 978-145031670-5OAI: oai:DiVA.org:kth-108002DiVA: diva2:580351
2012 ACM Conference on Bioinformatics, Computational Biology and Biomedicine, BCB 2012, 7 October 2012 through 10 October 2012, Orlando, FL
FunderScience for Life Laboratory - a national resource center for high-throughput molecular bioscienceSwedish e‐Science Research Center
QC 201212212012-12-212012-12-192013-04-08Bibliographically approved