Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Searching Metadata in Hadoop
KTH, School of Information and Communication Technology (ICT).
2015 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

The rapid expansion of the internet has led to the Big Data era. Companies that provide services which deal with Big Data have to face two major issues: i) storing petabytes of data and ii) manipulating this data. On the one end the open source Hadoop ecosystem and particularly its distributed file system HDFS comes to take care of the former issue, by providing a persistent storage for unprecedented amounts of data. For the latter, there are many approaches when it comes to data analytics – from map-reduce jobs to information retrieval and data discovery.

This thesis provides a novel approach to information discovery firstly by providing the means to create, manage and associate metadata to HDFS files and secondly searching for files through their metadata using Elasticsearch. The work is composed of three parts: The first one is the metadata designer/manager, which is the AngularJS front end. The second part is the J2EE back end which enables the front end to perform all the managing actions on metadata using websockets. The third part is the indexing of data into Elasticsearch, the distributed and scalable open source search engine. Our work has shown that this approach works and it greatly helps finding information in the vast sea of data in the HDFS.

Place, publisher, year, edition, pages
2015. , 72 p.
Series
TRITA-ICT-EX, 2015:97
National Category
Computer and Information Science
Identifiers
URN: urn:nbn:se:kth:diva-177467OAI: oai:DiVA.org:kth-177467DiVA: diva2:872937
Examiners
Available from: 2015-11-25 Created: 2015-11-20 Last updated: 2017-06-15Bibliographically approved

Open Access in DiVA

fulltext(578 kB)22 downloads
File information
File name FULLTEXT01.pdfFile size 578 kBChecksum SHA-512
2201c0880555c5810e692a41d67f8d69463146edcd9364bcaf4f8b581b39c6f2dd873c7c824e1086383d2c6b20011cc6d23b946a5fb83b2ca7f71285e3f00116
Type fulltextMimetype application/pdf

By organisation
School of Information and Communication Technology (ICT)
Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 22 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 116 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf