kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Discover patterns within train log data using unsupervised learning and network analysis
KTH, School of Electrical Engineering and Computer Science (EECS).
2022 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

With the development of information technology in recent years, log analysis has gradually become a hot research topic. However, manual log analysis requires specialized knowledge and is a time-consuming task. Therefore, more and more researchers are searching for ways to automate log analysis. In this project, we explore methods for train log analysis using natural language processing and unsupervised machine learning. Multiple language models are used in this project to extract word embeddings, one of which is the traditional language model TF-IDF, and the other three are the very popular transformer-based model, BERT, and its variants, the DistilBERT and the RoBERTa. In addition, we also compare two unsupervised clustering algorithms, the DBSCAN and the Mini-Batch k-means. The silhouette coefficient and Davies-Bouldin score are utilized for evaluating the clustering performance. Moreover, the metadata of the train logs is used to verify the effectiveness of the unsupervised methods. Apart from unsupervised learning, network analysis is applied to the train log data in order to explore the connections between the patterns, which are identified by train control system experts. Network visualization and centrality analysis are investigated to analyze the relationship and, in terms of graph theory, importance of the patterns. In general, this project provides a feasible direction to conduct log analysis and processing in the future.

Abstract [sv]

I och med informationsteknologins utveckling de senaste åren har logganalys gradvis blivit ett hett forskningsämne. Manuell logganalys kräver dock specialistkunskap och är en tidskrävande uppgift. Därför söker fler och fler forskare efter sätt att automatisera logganalys. I detta projekt utforskar vi metoder för tåglogganalys med hjälp av naturlig språkbehandling och oövervakad maskininlärning. Flera språkmodeller används i detta projekt för att extrahera ordinbäddningar, varav en är den traditionella språkmodellen TF-IDF, och de andra tre är den mycket populära transformatorbaserade modellen, BERT, och dess varianter, DistilBERT och RoBERTa. Dessutom jämför vi två oövervakade klustringsalgoritmer, DBSCAN och Mini-Batch k-means. Siluettkoefficienten och Davies-Bouldin-poängen används för att utvärdera klustringsprestandan. Dessutom används tågloggarnas metadata för att verifiera effektiviteten hos de oövervakade metoderna. Förutom oövervakad inlärning tillämpas nätverksanalys på tågloggdata för att utforska sambanden mellan mönstren, som identifieras av experter på tågstyrsystem. Nätverksvisualisering och centralitetsanalys undersöks för att analysera sambandet och grafteoriskt betydelsen av mönstren mönstren. I allmänhet ger detta projekt en genomförbar riktning för att genomföra logganalys och bearbetning i framtiden.

Place, publisher, year, edition, pages
2022. , p. 47
Series
TRITA-EECS-EX ; 2022:708
Keywords [en]
Log analysis, Natural language processing, Unsupervised learning, Clustering, Network analysis
Keywords [sv]
Logganalys, Bearbetning av naturligt språk, Oövervakat lärande, Clustering, Nätverksanalys
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-321464OAI: oai:DiVA.org:kth-321464DiVA, id: diva2:1711127
External cooperation
Devoteam Creative Tech AB
Subject / course
Computer Science
Educational program
Master of Science - Computer Science
Supervisors
Examiners
Available from: 2022-11-22 Created: 2022-11-15 Last updated: 2022-11-22Bibliographically approved

Open Access in DiVA

fulltext(2425 kB)605 downloads
File information
File name FULLTEXT01.pdfFile size 2425 kBChecksum SHA-512
d22706a40adfde07f44e21b6e94d69635fa5f3fa70cb87529cd3b558613da0689f7b122f8f35464805879a4afce73657430f96276d568195213d19ffe4e622e5
Type fulltextMimetype application/pdf

By organisation
School of Electrical Engineering and Computer Science (EECS)
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 607 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 592 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf