kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Comparing Ensemble Methods with Individual Classifiers in Machine Learning for Diabetes Detection
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science.
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science.
2022 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesisAlternative title
Jämförelse av ensemblemetoder med individuella klassifierare i maskininlärning för att detektera diabetes (Swedish)
Abstract [en]

Diabetes is a common disease that is characterized by several health markers. These markers can be used in machine learning to help predict the presence of diabetes in an individual. The aim of this study is to investigate if combining individual Naive Bayes and decision tree classifiers using the stacking and soft voting ensemble methods will improve the diabetes detection accuracy compared to the individual classifiers. This investigation was performed by comparing the average classification accuracies, based on 1000 runs, for each classification method. The dataset used was the Pima Indian Diabetes Database, consisting of 768 samples. A significance test was used to determine whether the ensemble methods provided a statistically significant better classification accuracy than the individual classifiers. In our results the stacking method had the highest average accuracy of 75.568%, followed by soft voting with 74.773%, then Naive Bayes at 74.752%, and finally decision tree with 72.988%. However, the statistical significance test performed found that the difference in average accuracy between the ensemble methods and the individual classifiers was not statistically significant at a 5% significance level. It was also concluded that classification accuracy is on its own not a sufficient measure of performance for diagnosing disease, and that other metrics such as precision and recall should be included in the performance analysis. Further investigations of ensemble learning for diabetes detection could include more classifiers in the ensemble methods and use other larger datasets.

Abstract [sv]

Diabetes är en vanligt förekommande sjukdom som kännetecknas av olika hälsomarkörer. Dessa markörer kan användas i maskininlärning för att assistera i detekteringen av diabetes. Målet med denna studie är att undersöka om kombinationen av individuella Naive Bayes och decision tree klassificerare med hjälp av stacking och soft voting ensemblemetoderna kommer att förbättra korrektheten av detektering av diabetes jämfört med de individuella klassificerarna. Denna undersökning utfördes genom att jämföra medelvärdena av korrektheten för samtliga klassificerare, baserat på 1000 körningar. Den använda datamängden var det så kallade Pima Indian Diabetes Database som innehåller 768 prover. Ett signifikanstest genomfördes för att bestämma om ensemblemetoderna genererade statistiskt signifikant bättre klassificeringskorrekthet jämfört med de individuella klassificerarna. I våra resultat hade stackingmetoden den högsta medelkorrektheten på 75.568%. Soft voting följde därefter med en korrekthet på 74.773%. Naive Bayes och decision tree klassificerarna var 74.752% respektive 72.988% korrekta. Dock visade det statistiska signifikanstestet att differensen i korrektheten mellan ensemblemetoderna och de individuella klassificerarna inte var statistiskt signifikant vid en signifikansnivå på 5%. Slutsatsen var också att klassificeringskorrekthet inte enskilt var tillräckligt för att mäta prestanda av sjukdomsdiagnostisering, och att andra mått som de så kallade precision och recall borde inkluderas i analysen av prestandan. Vidare undersökningar av ensemblemetoder för diabetesdetektering skulle kunna inkludera fler klassificerare i ensemblemetoderna och att använda andra större datamängder.

Place, publisher, year, edition, pages
2022. , p. 34
Series
TRITA-EECS-EX ; 2022:475
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-319817OAI: oai:DiVA.org:kth-319817DiVA, id: diva2:1701964
Subject / course
Computer Science
Educational program
Master of Science in Engineering - Computer Science and Technology
Supervisors
Examiners
Available from: 2022-10-10 Created: 2022-10-08 Last updated: 2022-10-10Bibliographically approved

Open Access in DiVA

fulltext(729 kB)983 downloads
File information
File name FULLTEXT01.pdfFile size 729 kBChecksum SHA-512
442ec2cc2e25b819b2b9289f6aac3c599d7446ec03616206a8e5082f613271dd1ff5a12b3f81cb100ae51e83fdaede1697126e63ef5be9f02d108335ee8a8b2b
Type fulltextMimetype application/pdf

By organisation
Computer Science
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 983 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 629 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf