Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Correcting bias from stochastic insert size in read pair data—applications to structural variation detection and genome assembly
KTH, School of Computer Science and Communication (CSC), Computational Biology, CB. (Arvestad)ORCID iD: 0000-0001-7378-2320
KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
2015 (English)Manuscript (preprint) (Other academic)
Place, publisher, year, edition, pages
2015.
National Category
Bioinformatics (Computational Biology)
Identifiers
URN: urn:nbn:se:kth:diva-173585DOI: 10.1101/023929OAI: oai:DiVA.org:kth-173585DiVA: diva2:853716
Note

QS 2015

Available from: 2015-09-14 Created: 2015-09-14 Last updated: 2016-02-02Bibliographically approved
In thesis
1. Algorithms and statistical models for scaffolding contig assemblies and detecting structural variants using read pair data
Open this publication in new window or tab >>Algorithms and statistical models for scaffolding contig assemblies and detecting structural variants using read pair data
2015 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Advances in throughput from Next Generation Sequencing (NGS) methods has provided new ways to study molecular biology. The increased amount of data enables genome wide scale studies of structural variation, transcription, translation and genome composition. Not only is the scale of each experiment large; lowered cost and faster turn-around has also increased the frequency with which new experiments are conducted. With the data growth comes an increase in demand for efficient and robust algorithms — this is a great computational challenge. The design of computationally efficient algorithms are crucial to cope with the amount of data and it is relatively easy to verify an efficient algorithm by runtime and memory consumption. However, as NGS data comes with several artifacts together with the size the difficulty lies in verifying that the algorithm gives accurate results and are robust to different data sets.

This thesis focuses on modeling assumptions of mate-pair and paired-end reads when scaffolding contig assemblies or detecting variants. Both genome assembly and structural variation are difficult problems, partly because of a computationally complex nature of the problems, but also due to various noise and artifacts in input data. Constructing methods that addresses all artifacts and parameters in data is difficult, if not impossible, and end-to-end pipelines often come with several simplifications. Instead of tackling these difficult problems all at once, a large part of this thesis concentrates on smaller problems around scaffolding and structural variation detection. By identifying and modeling parts of the problem where simplifications has been made in other algorithms, we obtain an improved solution to the corresponding full problem.

The first paper shows an improved model to estimate gap sizes, hence contig placement, in the scaffolding problem. The second paper introduces a new scaffolder to scaffold large complex genomes and the third paper extends the scaffolding method to account for paired-end-contamination in mate-pair libraries. The fourth paper investigates detection of structural variants using fragment length information and corrects a commonly assumed null-hypothesis distribution used to detect structural variants.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2015. x, 59 p.
Series
TRITA-CSC-A, ISSN 1653-5723 ; 2015:14
National Category
Bioinformatics (Computational Biology)
Identifiers
urn:nbn:se:kth:diva-173580 (URN)978-91-7595-677-0 (ISBN)
Public defence
2015-10-01, Atrium, Nobels väg 12B, Stockholm, 10:00 (English)
Opponent
Supervisors
Note

QC 20150915

Available from: 2015-09-15 Created: 2015-09-14 Last updated: 2015-09-15Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textBioRxiv

Authority records BETA

Sahlin, Kristoffer

Search in DiVA

By author/editor
Sahlin, KristofferFrånberg, MattiasArvestad, Lars
By organisation
Computational Biology, CB
Bioinformatics (Computational Biology)

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 41 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf