kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Pre-analysis of Nanopore Data for DNA Base Calling
KTH, School of Electrical Engineering and Computer Science (EECS).
KTH, School of Electrical Engineering and Computer Science (EECS).
2022 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

Nanopore sequencing is a relatively new DNA sequencing method which measures the current over a nanopore in a membrane as each nucleotide of the DNA passes through the nanopore. From the resulting current signal it is possible to determine the sequence of nucleotides in the DNA by using a base caller. The goal of this project was to create a machine learning model which could estimate the accuracy rate (identity score) of the sequenced DNA using the electric current signal and other data available through nanopore sequencing. The dataset that the machine learning models were trained on were samples from E. coli bacteria that had been sequenced through nanopore sequencing. In this project a linear regression model was created as well as several neural networks. The best performing model was a neural network which had a mean square error (MSE) of 6.12 ∙ 10-4, compared to a variance in the dataset of 2.11 ∙ 10-3. The low MSE indicates that the model can effectively predict identity scores.

Abstract [sv]

Nanopore sequencing är en relativt ny DNA-sekvenseringsmetod som mäter strömmen över en nanoskopisk por i ett membran samtidigt som varje DNA-nukleotid passerar genom poren. Från den resulterande elektriska signalen så är det möjligt att bestämma sekvensen av nukleotider i DNA:t genom att använda en base caller. Målet med det här projektet var att skapa en maskininlärningsmodell som kunde bestämma graden av noggrannhet av det sekvenserade DNA:t genom att använda den elektriska strömsignalen och andra typer av data tillgängliga av Nanopore sequencing. Datamängden som maskininlärningsmodellerna använde för träning bestod av samples från en E. coli bakterie som sekvenserats med nanopore sequencing. I det här projektet har en linjär regressions-modell skapats samt flera olika neurala nätverk. Den bäst presterande modellen var ett neuralt nätverk, som hade ett minstakvadratfel (MSE) på 6.12 ∙ 10-4, jämfört med datamängdens varians på 2.11 ∙ 10-3. Det låga MSE-värdet visar på att modellen effektivt kan skatta noggrannhetsgraden av den avlästa DNA-sekvensen.

Place, publisher, year, edition, pages
2022. , p. 629-636
Series
TRITA-EECS-EX ; 2022:178
Keywords [en]
Nanopore sequencing, DNA sequencing, bioinformatics, machine learning, neural networks, linear regression, supervised learning
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
URN: urn:nbn:se:kth:diva-323735OAI: oai:DiVA.org:kth-323735DiVA, id: diva2:1736062
Supervisors
Examiners
Projects
Kandidatexjobb i elektroteknik 2022, KTH, StockholmAvailable from: 2023-02-10 Created: 2023-02-10

Open Access in DiVA

fulltext(146281 kB)540 downloads
File information
File name FULLTEXT01.pdfFile size 146281 kBChecksum SHA-512
6ef8ac5f57bfa731be6b63752e35a8ad576eee3fa90434c6241186b62dff01f689b0a22454b6d600dd2dfa06cfe9b879bd9a8193673765e6a41ac9b51260faf0
Type fulltextMimetype application/pdf

By organisation
School of Electrical Engineering and Computer Science (EECS)
Electrical Engineering, Electronic Engineering, Information Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 540 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 472 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf