kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A comparison of machine learning algorithms in their ability to predict pancreatic cancer
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science.
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science.
2022 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesisAlternative title
En jämförelse av maskininlärningsalgoritmers förmåga att förutspå bukspottkörtelcancer (Swedish)
Abstract [en]

Pancreatic cancer is an uncommon but lethal disease which has no obvious biomarkers for its early stages. Machine learning has been used in order to predict the disease with limited success. Survey data has been of special interest due to its great size and accessibility. However only select machine learning algorithms, especially neural networks, have been successfully applied on this type of data. Multiple different machine learning algorithms were tested and compared in this study in order to find algorithms that also could perform well or even better than known well-performing algorithms. Health survey data from two different survey studies, NHIS and PLCO, were combined into a dataset with 22 features of 2 216 867 samples with 1 031 patients diagnosed with pancreatic cancer. A logistic regressor, a neural network, a decision tree, and a support vector machine were trained on this dataset using cross validation and then evaluated on a test partition. It was found that neural networks are in most use cases superior. Logistic regression can, however, have similar performance to neural networks when applied to survey data and can thus be a simpler alternative. The decision tree achieved similar results to the neural network in some metrics but lacked performance in precision. The support vector machine was shown to have worse performance than the aforementioned ones. This could be a result of the inability to train the support vector machine on the whole dataset due to its detrimental performance on larger datasets.

Abstract [sv]

Bukspottkörtelcancer är en ovanlig men dödlig sjukdom utan uppenbara biomarkörer för dess tidiga stadier. Maskininlärning har tidigare använts för att förutspå sjukdomen med begränsad framgång. Hälsoenkätdata har varit av speciellt intresse på grund av dess storlek och tillgänglighet. Dock har endast ett fåtal maskininlärningsalgoritmer, speciellt neurala nätverk, framgångsrikt tillämpats på denna typen av data. Flertalet olika maskininlärningsalgoritmer prövades i detta kandidatarbete för att finna alternativa algoritmer med likvärdig eller bättre prestanda än redan beprövade. Hälsoenkätdata från två olika enkätstudier, NHIS och PlCO, kombinerades till ett sammansatt dataset med 22 prediktorer med 2 216 867 stickprov med 1 031 patienter med diagnostiserad bukspottkörtelcancer. En logistisk regresserare, ett neuralt nätverk, ett beslutsträd och en stödvektormaskin tränades på datasetet med korsvalidering och utvärderades sedan på en testpartition. Det fanns att neurala nätverk i de flesta användningsfall är överlägsna. Dock kan logistisk regression ha likvärdig prestanda när det tillämpas på enkätdata och kan därav vara ett enklare alternativ. Beslutsträdet gav likvärdiga resultat som det neurala nätverket i vissa parametrar men fick betydligt sämre precision. Stödvektormaskinen presterade dessutom sämre än de tre tidigare nämnda algoritmerna. Detta antogs vara ett resultat av att stödvektormaskinen inte kunde tränas på hela datasetet på grund av dess begränsade prestanda på stora dataset.

Place, publisher, year, edition, pages
2022. , p. 22
Series
TRITA-EECS-EX ; 2022:473
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-319815OAI: oai:DiVA.org:kth-319815DiVA, id: diva2:1701962
Subject / course
Computer Science
Educational program
Master of Science in Engineering - Computer Science and Technology
Supervisors
Examiners
Available from: 2022-10-10 Created: 2022-10-08 Last updated: 2022-10-10Bibliographically approved

Open Access in DiVA

fulltext(429 kB)279 downloads
File information
File name FULLTEXT01.pdfFile size 429 kBChecksum SHA-512
96d3127e130ca5b5db20a9e45829f65bfafb478d857b11c933230f2ddc0d3f6d6721b5e9a8623b06f482fc2ad8fa72d439181083e2d701528fa20443bebc49e4
Type fulltextMimetype application/pdf

By organisation
Computer Science
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 279 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 489 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf