kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A Lempel-Ziv Complexity Analysis of SARS-CoV-2
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science.
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science.
2022 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

COVID-19 had enormous impact on society and in a world where digitalization is more prevalent than ever, we have access to great amounts of computing power and algorithms. Lempel-Ziv complexity is an algorithm of particular interest that measures the complexity of a string, and has already been applied on multiple problems of biological nature. In this thesis we have evaluated whether or not the Lempel-Ziv complexity of the virus SARS-CoV-2 has changed since 2019. To execute this, two programming solutions have been implemented. The first to select randomized data from GISAID and the second to execute the algorithm on 1,000 FASTA-formatted files at a time. We showed that the complexity had generally decreased since 2019. This, however, may have been the results of evolving sequencing methods as large spreads in data could be observed for the Delta and Omicron variants.

Abstract [sv]

Covid-19 har haft en enorm påverkan på samhället och i en värld där digitalisering är mer etablerat än någonsin har vi tillgång till stora mängder datorkraft och algoritmer. Lempel-Ziv-komplexitet är en algoritm av särskilt intresse som mäter en strängs komplexitet och har redan tillämpats på flera problem inom biologi. I denna avhandling har vi utvärderat huruvida Lempel-Ziv-komplexiteten för viruset SARS-CoV-2 har förändrats sedan 2019. För att utföra detta projekt har två programmeringslösningar implementerats. Den första för att välja slumpmässig data från GISAID och den andra för att exekvera algoritmen på 1 000 FASTA-formaterade filer åt gången. Vi visade att komplexiteten generellt sett hade minskat sedan 2019. Detta kan dock ha varit resultatet av förändrade sekvenseringsmetoder eftersom stora spridningar i data kunde observeras för Delta- och Omicron-varianterna.

Place, publisher, year, edition, pages
2022. , p. 35
Series
TRITA-EECS-EX ; 2022:470
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-319745OAI: oai:DiVA.org:kth-319745DiVA, id: diva2:1701710
Subject / course
Computer Science
Educational program
Master of Science in Engineering - Computer Science and Technology
Supervisors
Examiners
Available from: 2022-10-10 Created: 2022-10-07 Last updated: 2022-10-10Bibliographically approved

Open Access in DiVA

fulltext(1555 kB)210 downloads
File information
File name FULLTEXT01.pdfFile size 1555 kBChecksum SHA-512
4f46240cafc373fc4fce340c675bb05c867351c920c6144c71ba308ff693caeeb7bdf1d2dedf29d4d32a968b81b8ffd3ea6a45e35d72cea92e423d79b6c37e66
Type fulltextMimetype application/pdf

By organisation
Computer Science
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 211 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 183 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf