kth.sePublications KTH
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Clustering of Unevenly Spaced Mixed Data Time Series
KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics.
KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics.
2023 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
Klustring av ojämnt fördelade tidsserier med numeriska och kategoriska variabler (Swedish)
Abstract [en]

This thesis explores the feasibility of clustering mixed data and unevenly spaced time series for customer segmentation. The proposed method implements the Gower dissimilarity as the local distance function in dynamic time warping to calculate dissimilarities between mixed data time series. The time series are then clustered with k−medoids and the clusters are evaluated with the silhouette score and t−SNE. The study further investigates the use of a time warping regularisation parameter. It is derived that implementing time as a feature has the same effect as penalising time warping, andtherefore time is implemented as a feature where the feature weight is equivalent to a regularisation parameter.

The results show that the proposed method successfully identifies clusters in customer transaction data provided by Nordea. Furthermore, the results show a decrease in the silhouette score with an increase in the regularisation parameter, suggesting that the time at which a transaction occurred might not be of relevance to the given dataset. However, due to the method’s high computational complexity, it is limited to relatively small datasets and therefore a need exists for a more scalable and efficient clustering technique.

Abstract [sv]

Denna uppsats utforskar klustring av ojämnt fördelade tidsserier med numeriska och kategoriska variabler för kundsegmentering. Den föreslagna metoden implementerar Gower dissimilaritet som avståndsfunktionen i dynamic time warping för att beräkna dissimilaritet mellan tidsserierna. Tidsserierna klustras sedan med k-medoids och klustren utvärderas med silhouette score och t-SNE. Studien undersökte vidare användningen av en regulariserings parameter. Det härledes att implementering av tid som en egenskap hade samma effekt som att bestraffa dynamic time warping, och därför implementerades tid som en egenskap där dess vikt är ekvivalent med en regulariseringsparameter. 

Resultaten visade att den föreslagna metoden lyckades identifiera kluster i transaktionsdata från Nordea. Vidare visades det att silhouette score minskade då regulariseringsparametern ökade, vilket antyder att tiden transaktion då en transaktion sker inte är relevant för det givna datan. Det visade sig ytterligare att metoden är begränsad till reltaivt små dataset på grund av dess höga beräkningskomplexitet, och därför finns det behov av att utforksa en mer skalbar och effektiv klusteringsteknik.

Place, publisher, year, edition, pages
2023. , p. 43
Series
TRITA-SCI-GRU ; 2023:386
Keywords [en]
mixed data time series, unevenly spaced time series, clustering, dynamic time warping, Gower dissimilarity, time warping regularisation
Keywords [sv]
numeriska och kategoriska tidsserier, ojämnt fördelade tidsserier, kluster analys, dynamic time warping, Gower dissimilaritet, regularisering av tidsförvränging
National Category
Other Mathematics
Identifiers
URN: urn:nbn:se:kth:diva-340421OAI: oai:DiVA.org:kth-340421DiVA, id: diva2:1816921
External cooperation
Nordea Bank Abp filial SE
Subject / course
Mathematical Statistics
Educational program
Master of Science - Applied and Computational Mathematics
Supervisors
Examiners
Available from: 2023-12-04 Created: 2023-12-04 Last updated: 2023-12-04Bibliographically approved

Open Access in DiVA

fulltext(1321 kB)463 downloads
File information
File name FULLTEXT01.pdfFile size 1321 kBChecksum SHA-512
bbdfe55401a3142d2c8b26f7f9cfa5545dd46a421620b83f1806e3394a0728d41dd553cb5b4a507bb74d2ea277d1bfaf0ebc609f43fb5df2d2cf30b6d83211b9
Type fulltextMimetype application/pdf

By organisation
Mathematical Statistics
Other Mathematics

Search outside of DiVA

GoogleGoogle Scholar
Total: 463 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 561 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf