kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Data Complexity and its effect on Classification Accuracy in Multi Class Classification Problems: A study using synthetic datasets
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science.
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science.
2022 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesisAlternative title
Datakomplexitet och dess påverkan på klassificerares träffsäkerhet i multiklass-problem (Swedish)
Abstract [en]

This study investigates how the performance of a selection of machine learning classifiers is affected by the data complexity, measured by F1, N1, N2, and N3 in a multi class classification setting. This study uses synthetic datasets that span across the range of possible complexity levels for each complexity measure, allowing us to target the desired level of complexity for each dataset. The number of dimensions of the datasets was inspired by the Fashion-MNIST benchmark dataset. The study finds that classifier accuracy decreases when dataset complexity increases, the robustness of accuracies decreases as dataset complexity increases, and that the descriptive power of N1 and N3 are most reflective of real world performance.

Abstract [sv]

Denna studie undersöker hur prestandan av ett urval av maskininlärningsklassifierare påverkas av komplexitetsmåtten F1, N1, N2 och N3 hos multiklass-dataset. Studien använder syntetiska dataset som täcker in hela spannet av möjliga komplexitetsvärden för varje komplexitetsmått. Detta gör det möjligt att ha dataset med precis det komplexitetsvärde som önskas för varje dataset. Antal dimensioner i dataseten inspirerades av Fashion-MNISTs dataset. Studien påvisar att klassifierarens träffsäkerhet minskar när datasetens komplexitet ökar, att robustheten av träffsäkerheten minskar när datasetens komplexitet ökar, samt att måtten N1 och N3 är de mått som bäst återspeglar verkligheten.

Place, publisher, year, edition, pages
2022. , p. 27
Series
TRITA-EECS-EX ; 2022:453
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-319589OAI: oai:DiVA.org:kth-319589DiVA, id: diva2:1700888
Subject / course
Computer Science
Educational program
Master of Science in Engineering - Computer Science and Technology
Supervisors
Examiners
Available from: 2022-10-05 Created: 2022-10-04 Last updated: 2022-10-05Bibliographically approved

Open Access in DiVA

fulltext(1317 kB)1431 downloads
File information
File name FULLTEXT01.pdfFile size 1317 kBChecksum SHA-512
7045b9d480b548964c1ba80cb7184e5815b3ec8e428c689370d9a214264597b84c525a6ddb5d99940e8619dfc1f2749300c49c6564e9df097b5d3b252798679b
Type fulltextMimetype application/pdf

By organisation
Computer Science
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 1431 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 722 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf