Change search
ReferencesLink to record
Permanent link

Direct link
Clustering Generic Log Files Under Limited Data Assumptions
KTH, School of Computer Science and Communication (CSC).
2016 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
Klustring av generiska loggfiler under begränsade antaganden (Swedish)
Abstract [en]

Complex computer systems are often prone to anomalous or erroneous behavior, which can lead to costly downtime as the systems are diagnosed and repaired. One source of information for diagnosing the errors and anomalies are log files, which are often generated in vast and diverse amounts. However, the log files' size and semi-structured nature makes manual analysis of log files generally infeasible. Some automation is desirable to sift through the log files to find the source of the anomalies or errors. This project aimed to develop a generic algorithm that could cluster diverse log files in accordance to domain expertise. The results show that the developed algorithm performs well in accordance to manual clustering even under more relaxed data assumptions.

Abstract [sv]

Komplexa datorsystem är ofta benägna att uppvisa anormalt eller felaktigt beteende, vilket kan leda till kostsamma driftstopp under tiden som systemen diagnosticeras och repareras. En informationskälla till feldiagnosticeringen är loggfiler, vilka ofta genereras i stora mängder och av olika typer. Givet loggfilernas storlek och semistrukturerade utseende så blir en manuell analys orimlig att genomföra. Viss automatisering är önsvkärd för att sovra bland loggfilerna så att källan till felen och anormaliteterna blir enklare att upptäcka. Det här projektet syftade till att utveckla en generell algoritm som kan klustra olikartade loggfiler i enlighet med domänexpertis. Resultaten visar att algoritmen presterar väl i enlighet med manuell klustring även med färre antaganden om datan.

Place, publisher, year, edition, pages
Keyword [en]
machine learning, cluster analysis, log file analysis
National Category
Computer Science
URN: urn:nbn:se:kth:diva-189642OAI: diva2:948252
Educational program
Master of Science in Engineering - Computer Science and Technology
Available from: 2016-08-18 Created: 2016-07-10 Last updated: 2016-08-18Bibliographically approved

Open Access in DiVA

fulltext(1633 kB)9 downloads
File information
File name FULLTEXT01.pdfFile size 1633 kBChecksum SHA-512
Type fulltextMimetype application/pdf

By organisation
School of Computer Science and Communication (CSC)
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 9 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 33 hits
ReferencesLink to record
Permanent link

Direct link