Searching Metadata in Hadoop
Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
The rapid expansion of the internet has led to the Big Data era. Companies that provide services which deal with Big Data have to face two major issues: i) storing petabytes of data and ii) manipulating this data. On the one end the open source Hadoop ecosystem and particularly its distributed file system HDFS comes to take care of the former issue, by providing a persistent storage for unprecedented amounts of data. For the latter, there are many approaches when it comes to data analytics – from map-reduce jobs to information retrieval and data discovery.
This thesis provides a novel approach to information discovery firstly by providing the means to create, manage and associate metadata to HDFS files and secondly searching for files through their metadata using Elasticsearch. The work is composed of three parts: The first one is the metadata designer/manager, which is the AngularJS front end. The second part is the J2EE back end which enables the front end to perform all the managing actions on metadata using websockets. The third part is the indexing of data into Elasticsearch, the distributed and scalable open source search engine. Our work has shown that this approach works and it greatly helps finding information in the vast sea of data in the HDFS.
Place, publisher, year, edition, pages
2015. , 72 p.
Computer and Information Science
IdentifiersURN: urn:nbn:se:kth:diva-177467OAI: oai:DiVA.org:kth-177467DiVA: diva2:872937