Change search
ReferencesLink to record
Permanent link

Direct link
Searching Metadata in Hadoop
KTH, School of Information and Communication Technology (ICT).
2015 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

The rapid expansion of the internet has led to the Big Data era. Companies that provide services which deal with Big Data have to face two major issues: i) storing petabytes of data and ii) manipulating this data. On the one end the open source Hadoop ecosystem and particularly its distributed file system HDFS comes to take care of the former issue, by providing a persistent storage for unprecedented amounts of data. For the latter, there are many approaches when it comes to data analytics – from map-reduce jobs to information retrieval and data discovery.

This thesis provides a novel approach to information discovery firstly by providing the means to create, manage and associate metadata to HDFS files and secondly searching for files through their metadata using Elasticsearch. The work is composed of three parts: The first one is the metadata designer/manager, which is the AngularJS front end. The second part is the J2EE back end which enables the front end to perform all the managing actions on metadata using websockets. The third part is the indexing of data into Elasticsearch, the distributed and scalable open source search engine. Our work has shown that this approach works and it greatly helps finding information in the vast sea of data in the HDFS.

Place, publisher, year, edition, pages
2015. , 72 p.
TRITA-ICT-EX, 2015:97
National Category
Computer and Information Science
URN: urn:nbn:se:kth:diva-177467OAI: diva2:872937
Available from: 2015-11-25 Created: 2015-11-20 Last updated: 2015-11-25Bibliographically approved

Open Access in DiVA

No full text

By organisation
School of Information and Communication Technology (ICT)
Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 68 hits
ReferencesLink to record
Permanent link

Direct link