Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Structural Variation Detection with Read Pair Information: An Improved Null Hypothesis Reduces Bias
KTH, Centres, Science for Life Laboratory, SciLifeLab.ORCID iD: 0000-0001-7378-2320
KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST). KTH, Centres, SeRC - Swedish e-Science Research Centre. Stockholm University, Sweden.ORCID iD: 0000-0001-5341-1733
2017 (English)In: Journal of Computational Biology, ISSN 1066-5277, E-ISSN 1557-8666, Vol. 24, no 6, 581-589 p.Article in journal (Refereed) Published
Abstract [en]

Reads from paired-end and mate-pair libraries are often utilized to find structural variation in genomes, and one common approach is to use their fragment length for detection. After aligning read pairs to the reference, read pair distances are analyzed for statistically significant deviations. However, previously proposed methods are based on a simplified model of observed fragment lengths that does not agree with data. We show how this model limits statistical analysis of identifying variants and propose a new model by adapting a model we have previously introduced for contig scaffolding, which agrees with data. From this model, we derive an improved null hypothesis that when applied in the variant caller CLEVER, reduces the number of false positives and corrects a bias that contributes to more deletion calls than insertion calls. We advise developers of variant callers with statistical fragment length-based methods to adapt the concepts in our proposed model and null hypothesis.

Place, publisher, year, edition, pages
Mary Ann Liebert, 2017. Vol. 24, no 6, 581-589 p.
Keyword [en]
Genomes, Algorithms, Alignment, Sequence, Indels
National Category
Bioinformatics (Computational Biology)
Identifiers
URN: urn:nbn:se:kth:diva-209893DOI: 10.1089/cmb.2016.0124ISI: 000402997500011Scopus ID: 2-s2.0-85020397976OAI: oai:DiVA.org:kth-209893DiVA: diva2:1115576
Funder
Swedish Research Council, 2010-4634Science for Life Laboratory - a national resource center for high-throughput molecular bioscienceSwedish e‐Science Research Center
Note

QC 20170627

Available from: 2017-06-27 Created: 2017-06-27 Last updated: 2017-07-03Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textScopus

Search in DiVA

By author/editor
Sahlin, KristofferArvestad, Lars
By organisation
Science for Life Laboratory, SciLifeLabComputational Science and Technology (CST)SeRC - Swedish e-Science Research Centre
In the same journal
Journal of Computational Biology
Bioinformatics (Computational Biology)

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 2 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf