kth.sePublications KTH
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Predicting Default Probability in Credit Risk using Machine Learning Algorithms
KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics.
2020 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
Predicting Default Probability in Credit Risk using Machine Learning Algorithms (Swedish)
Abstract [en]

This thesis has explored the field of internally developed models for measuring the probability of default (PD) in credit risk. As regulators put restrictions on modelling practices and inhibit the advance of risk measurement, the fields of data science and machine learning are advancing. The tradeoff between stricter regulation on internally developed models and the advancement of data analytics was investigated by comparing model performance of the benchmark method Logistic Regression for estimating PD with the machine learning methods Decision Trees, Random Forest, Gradient Boosting and Artificial Neural Networks (ANN). The data was supplied by SEB and contained 45 variables and 24 635 samples.

As the machine learning techniques become increasingly complex to favour enhanced performance, it is often at the expense of the interpretability of the model. An exploratory analysis was therefore made with the objective of measuring variable importance in the machine learning techniques. The findings from the exploratory analysis will be compared to the results from benchmark methods that exist for measuring variable importance.

The results of this study shows that logistic regression outperformed the machine learning techniques based on the model performance measure AUC with a score of 0.906. The findings from the exploratory analysis did increase the interpretability of the machine learning techniques and were validated by the results from the benchmark methods.

Abstract [sv]

Denna uppsats har undersökt internt utvecklade modeller för att estimera sannolikheten för utebliven betalning (PD) inom kreditrisk. Samtidigt som nya regelverk sätter restriktioner på metoder för modellering av kreditrisk och i viss mån hämmar utvecklingen av riskmätning, utvecklas samtidigt mer avancerade metoder inom maskinlärning för riskmätning. Således har avvägningen mellan strängare regelverk av internt utvecklade modeller och framsteg i dataanalys undersökts genom jämförelse av modellprestanda för referens metoden logistisk regression för uppskattning av PD med maskininlärningsteknikerna beslutsträd, Random Forest, Gradient Boosting och artificiella neurala nätverk (ANN). Dataunderlaget kommer från SEB och består utav 45 variabler och 24 635 observationer.

När maskininlärningsteknikerna blir mer komplexa för att gynna förbättrad prestanda är det ofta på bekostnad av modellens tolkbarhet. En undersökande analys gjordes därför med målet att mäta förklarningsvariablers betydelse i maskininlärningsteknikerna. Resultaten från den undersökande analysen kommer att jämföras med resultat från etablerade metoder som mäter variabelsignifikans.

Resultatet av studien visar att den logistiska regressionen presterade bättre än maskininlärningsteknikerna baserat på prestandamåttet AUC som mätte 0.906. Resultatet from den undersökande analysen för förklarningsvariablers betydelse ökade tolkbarheten för maskininlärningsteknikerna. Resultatet blev även validerat med utkomsten av de etablerade metoderna för att mäta variabelsignifikans.

Place, publisher, year, edition, pages
2020.
Series
TRITA-SCI-GRU ; 2020:186
Keywords [en]
Credit risk, default probability, machine learning, logsitic regression, basel framework
Keywords [sv]
Kreditrisk, fallissemangssannolikhet, maskininlärning, logistisk regression, baselregelverk
National Category
Mathematics
Identifiers
URN: urn:nbn:se:kth:diva-275656OAI: oai:DiVA.org:kth-275656DiVA, id: diva2:1437874
External cooperation
SEB
Subject / course
Financial Mathematics
Educational program
Master of Science - Applied and Computational Mathematics
Supervisors
Examiners
Available from: 2020-06-09 Created: 2020-06-09 Last updated: 2022-06-26Bibliographically approved

Open Access in DiVA

fulltext(5559 kB)3481 downloads
File information
File name FULLTEXT01.pdfFile size 5559 kBChecksum SHA-512
6adb0b5d7f59b894a3903a4498d48413ffa6929cb33d8e0487e83c2f5954c55153f1b778ef515d698cbdb4277853d6943451fa0689942ed2513240ca59563854
Type fulltextMimetype application/pdf

By organisation
Mathematical Statistics
Mathematics

Search outside of DiVA

GoogleGoogle Scholar
Total: 3491 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 5191 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf