kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Evaluation of Machine Learning Methods for Time Series Forecasting on E-commerce Data
KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics.
KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics.
2022 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
Utvärdering av Maskininlärningsmodeller för tidsserie-prognotisering på e-handels data (Swedish)
Abstract [en]

Within demand forecasting, and specifically within the field of e-commerce, the provided data often contains erratic behaviours which are difficult to explain. This induces contradictions to the common assumptions within classical approaches for time series analysis. Yet, classical and naive approaches are still commonly used. Machine learning could be used to alleviate such problems. This thesis evaluates four models together with Swedish fin-tech company QLIRO AB. More specifically, a MLR (Multiple Linear Regression) model, a classic Box-Jenkins model (SARIMAX), an XGBoost model, and a LSTM-network (Long Short-Term Memory). The provided data consists of aggregated total daily reservations by e-merchants within the Nordic market from 2014. Some data pre processing was required and a smoothed version of the data set was created for comparison. Each model was constructed according to their specific requirements but with similar feature engineering. Evaluation was then made on a monthly level with a forecast horizon of 30 days during 2021. The results shows that both the MLR and the XGBoost provides the most consistent results together with perks for being easy to use. After these two, the LSTM-network showed the best results for November and December on the original data set but worst overall. Yet it had good performance on the smoothed data set and was then comparable to the first two. The SARIMAX was the worst performing of all the models considered in this thesis and was not as easy to implement.

Abstract [sv]

Inom efterfrågeprognoser, och specifikt inom området e-handel, innehåller den tillhandahållna informationen ofta oberäkneliga beteenden som är svåra att förklara. Detta motsäger vanliga antaganden inom tidsserier som används för de mer klassiska tillvägagångssätten. Ändå är klassiska och naiva metoder fortfarande vanliga. Maskininlärning skulle kunna användas för att lindra sådana problem. Detta examensarbete utvärderar fyra modeller tillsammans med det svenska fintechföretaget QLIRO AB. Mer specifikt en MLR-modell (Multiple Linear Regression), en klassisk Box-Jenkins-modell (SARIMAX), en XGBoost-modell och ett LSTM-nätverk (Long Short-Term Memory). Den tillhandahållna informationen består av aggregerade dagliga reservationer från e-handlare inom den nordiska marknaden från 2014. Viss dataförbehandling krävdes och en utjämnad version av datamängden skapades för jämförelse. Varje modell konstruerades enligt deras specifika krav men med liknande \textit{feature engineering}. Utvärderingen gjordes sedan på månadsnivå med en prognoshorisont på 30 dagar under 2021. Resultaten visar att både MLR och XGBoost ger de mest pålitliga resultaten tillsammans med fördelar som att vara lätta att använda. Efter dessa visar LSTM-nätverket de bästa resultaten för november och december på den ursprungliga datamängden men sämst totalt sett. Ändå visar den god prestanda på den utjämnade datamängden och var sedan jämförbar med de två första modellerna. SARIMAX var den sämst presterande av alla jämförda modeller och inte lika lätt att implementera.

Place, publisher, year, edition, pages
2022. , p. 67
Series
TRITA-SCI-GRU ; 2022:309
Keywords [en]
Thesis, Time Series, Machine Learning, E-commerce, Demand Forecasting, Multiple Linear Regression, SARIMAX, XGBoost, LSTM, Model Evaluation
Keywords [sv]
Examensarbete, tidsserier, maskininlärning, e-handel, efterfrågeprognoser, multipel linjär regression, SARIMAX, XGBoost, LSTM, modellutvärdering
National Category
Other Mathematics
Identifiers
URN: urn:nbn:se:kth:diva-322495OAI: oai:DiVA.org:kth-322495DiVA, id: diva2:1719774
External cooperation
Qliro AB
Subject / course
Financial Mathematics
Educational program
Master of Science - Industrial Engineering and Management
Supervisors
Examiners
Available from: 2023-02-02 Created: 2022-12-16 Last updated: 2023-02-02Bibliographically approved

Open Access in DiVA

fulltext(1938 kB)1753 downloads
File information
File name FULLTEXT01.pdfFile size 1938 kBChecksum SHA-512
b79c010839ae3b0381f7e667aef3d07c0f7d04041359acc12fc5add8e8ea9f9d23c4bc3e864564d10261a4d16c8cb1b95ac14ed425e1655062b9f5fa76219b99
Type fulltextMimetype application/pdf

By organisation
Mathematical Statistics
Other Mathematics

Search outside of DiVA

GoogleGoogle Scholar
Total: 1753 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 1685 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf