Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Big Data Archivingwith Splunk and Hadoop
KTH, School of Computer Science and Communication (CSC).
KTH, School of Computer Science and Communication (CSC).
2013 (English)Independent thesis Basic level (degree of Bachelor), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Splunk is a software that handles large amounts of data every

day. With data constantly growing, there is a need to

phase out old data to keep the software from running slow.

However, some of Splunk’s customers have retention policies

that require the data to be stored longer than Splunk

can offer.

This thesis investigates how to create a solution for

archiving large amounts of data. We present the problems

with archiving data, the properties of the data we are

archiving and the types of file systems suitable for archiving.

By carefully considering data safety, reliability and using

the Apache Hadoop project to support multiple distributed

file systems, we create a flexible, reliable and scalable

archiving solution.

Abstract [sv]

Splunk är en mjukvara som hanterar stora mängder data

varje dag. Eftersom datavolymen ökar med tiden, finns det

ett behov att flytta ut gammalt data från programmet så

att det inte blir segt. Men vissa av Spunks kunder har datalagringspolicies

som kräver att datat lagras längre än vad

Splunk kan erbjuda.

Denna rapport undersöker hur man kan lagra stora

mängder data. Vi presenterar problemen som finns med att

arkivera data, egenskaperna av datat som ska arkiveras och

typer av filsystem som passar för arkivering.

Vi skapar en flexibel, tillförlitlig och skalbar lösning för

arkivering genom att noga studera datasäkerhet, tillförlitlighet

och genom att använda Apache Hadoop för att stödja

flera distribuerade filsystem.

Place, publisher, year, edition, pages
2013.
Series
Trita-CSC-E, ISSN 1653-5715 ; 13:101
National Category
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-137374OAI: oai:DiVA.org:kth-137374DiVA: diva2:678828
Educational program
Master of Science in Engineering - Computer Science and Technology
Supervisors
Examiners
Available from: 2013-12-13 Created: 2013-12-13 Last updated: 2013-12-13Bibliographically approved

Open Access in DiVA

Big Data Archiving with Splunk and Hadoop(1498 kB)708 downloads
File information
File name FULLTEXT01.pdfFile size 1498 kBChecksum SHA-512
2ba5f803842695abe9047c946fd4799e72760ca912791c5d8ad9071ad4412c7e908a4b990b6b5c3020d2384e732574b2506188156ff24450fa998b9949d25997
Type fulltextMimetype application/pdf

Other links

http://www.nada.kth.se/utbildning/grukth/exjobb/rapportlistor/2013/rapporter13/ergenekon_emre_berge_OCH_eriksson_petter_13006.pdf
By organisation
School of Computer Science and Communication (CSC)
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 708 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 530 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf