kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Two-Stage Logistic Regression Models for Improved Credit Scoring
KTH, School of Computer Science and Communication (CSC).
2015 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
Två-stegs logistiska regressioner för förbättrad credit scoring (Swedish)
Abstract [en]

This thesis has investigated two-stage regularized logistic regressions applied on the credit scoring problem. Credit scoring refers to the practice of estimating the probability that a customer will default if given credit.

The data was supplied by Klarna AB, and contains a larger number of observations than many other research papers on credit scoring. In this thesis, a two-stage regression refers to two staged regressions were the some kind of information from the first regression is used in the second regression to improve the overall performance. In the best performing models, the first stage was trained on alternative labels, payment status at earlier dates than the conventional. The predictions were then used as input to, or to segment, the second stage. This gave a gini increase of approximately 0.01. Using conventional scorecutoffs or distance to a decision boundary to segment the population did not improve performance.

Abstract [sv]

Denna uppsats har undersökt tvåstegs regulariserade logistiska regressioner för att estimera credit score hos konsumenter. Credit score är ett mått på kreditvärdighet och mäter sannolikheten att en person inte betalar tillbaka sin kredit. Data kommer från Klarna AB och innehåller fler observationer än mycket annan forskning om kreditvärdighet. Med tvåstegsregressioner menas i denna uppsats en regressionsmodell bestående av två steg där information från det första steget används i det andra steget för att förbättra den totala prestandan. De bäst presterande modellerna använder i det första steget en alternativ förklaringsvariabel, betalningsstatus vid en tidigare tidpunkt än den konventionella, för att segmentera eller som variabel i det andra steget. Detta gav en giniökning på approximativt 0,01. Användandet av enklare segmenteringsmetoder så som score-gränser eller avstånd till en beslutsgräns visade sig inte förbättra prestandan.

Place, publisher, year, edition, pages
2015. , p. 52
Keywords [en]
Machine Learning, Credit Scoring, Two-stage Logistic Regressions
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-160551OAI: oai:DiVA.org:kth-160551DiVA, id: diva2:790367
Educational program
Master of Science - Machine Learning
Available from: 2015-05-28 Created: 2015-02-24 Last updated: 2022-06-23Bibliographically approved

Open Access in DiVA

fulltext(9397 kB)13192 downloads
File information
File name FULLTEXT01.pdfFile size 9397 kBChecksum SHA-512
b46f2c77645a62079398b46d8f0adc326566fda915f8882d2b28f6342a9df4a36f6e3bc793e6f13bb79ce2105a4bf0d6cbedc7c56aaca15377a7052859212b6c
Type fulltextMimetype application/pdf

By organisation
School of Computer Science and Communication (CSC)
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 13192 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 1325 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf