Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Fraud Detection on Unlabeled Data with Unsupervised Machine Learning
KTH, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), Biomedical Engineering and Health Systems, Health Informatics and Logistics.
KTH, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), Biomedical Engineering and Health Systems, Health Informatics and Logistics.
2018 (English)Independent thesis Basic level (university diploma), 10 credits / 15 HE creditsStudent thesisAlternative title
Bedrägeridetektering på omärkt data med oövervakad maskininlärning (Swedish)
Abstract [en]

A common problem in systems handling user interaction was the risk for fraudulent behaviour. As an example, in a system with credit card transactions it could have been a person using a another user's account for purchases, or in a system with advertisment it could be bots clicking on ads. These malicious attacks were often disguised as normal interactions and could be difficult to detect. It was especially challenging when working with datasets that did not contain so called labels, which showed if the data point was fraudulent or not. This meant that there were no data that had previously been classified as fraud, which in turn made it difficult to develop an algorithm that could distinguish between normal and fraudulent behavior. In this thesis, the area of anomaly detection was explored with the intent of detecting fraudulent behavior without labeled data. Three neural network based prototypes were developed in this study. All three prototypes were some sort of variation of autoencoders. The first prototype which served as a baseline was a simple three layer autoencoder, the second prototype was a novel autoencoder which was called stacked autoencoder, the third prototype was a variational autoencoder. The prototypes were then trained and evaluated on two different datasets which both contained non fraudulent and fraudulent data. In this study it was found that the proposed stacked autoencoder architecture achieved better performance scores in recall, accuracy and NPV in the tests that were designed to simulate a real world scenario.

Abstract [sv]

Ett vanligt problem med användares interaktioner i ett system var risken för bedrägeri. För ett system som hanterarade dataset med kreditkortstransaktioner så kunde ett exempel vara att en person använde en annans identitet för kortköp, eller i system som hanterade reklam så skulle det kunna ha varit en automatiserad mjukvara som simulerade interaktioner. Dessa attacker var ofta maskerade som normala interaktioner och kunde därmed vara svåra att upptäcka. Inom dataset som inte har korrekt märkt data så skulle det vara speciellt svårt att utveckla en algoritm som kan skilja på om interaktionen var avvikande eller inte. I denna avhandling så utforskas ämnet att upptäcka anomalier i dataset utan specifik data som tyder på att det var bedrägeri. Tre prototyper av neurala nätverk användes i denna studie som tränades och utvärderades på två dataset som innehöll både data som sade att det var bedrägeri och inte bedrägeri. Den första prototypen som fungerade som en bas var en simpel autoencoder med tre lager, den andra prototypen var en ny autoencoder som har fått namnet staplad autoencoder och den tredje prototypen var en variationell autoencoder. För denna studie så gav den föreslagna staplade autoencodern bäst resultat för återkallelse, noggrannhet och NPV i de test som var designade att efterlikna ett verkligt scenario.

Place, publisher, year, edition, pages
2018. , p. 62
Series
TRITA-CBH-GRU ; 2018:36
Keywords [en]
Anomaly detection, fraud detection, unsupervised machine learning, autoencoder
Keywords [sv]
Anomali detektering, bedrägeri detektering, oövervakad maskininlärning, autoencoder
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-230592OAI: oai:DiVA.org:kth-230592DiVA, id: diva2:1217521
External cooperation
Tele2
Subject / course
Computer and Systems Sciences
Educational program
Bachelor of Science in Engineering - Computer Engineering
Supervisors
Examiners
Available from: 2018-06-13 Created: 2018-06-13 Last updated: 2018-06-13Bibliographically approved

Open Access in DiVA

fulltext(1630 kB)391 downloads
File information
File name FULLTEXT01.pdfFile size 1630 kBChecksum SHA-512
0d4467f64dfc0965aee757758480df0fbaeaca81dc3408edefa9693fc4533775b6662a8561d9ae132fd234c8ad2c51b6456ea8b66eadbc2e6826f57ae5688219
Type fulltextMimetype application/pdf

By organisation
Health Informatics and Logistics
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 391 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 2183 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf