Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Hopsworks: Improving User Experience and Development on Hadoop with Scalable, Strongly Consistent Metadata
KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.ORCID iD: 0000-0002-6578-3902
Show others and affiliations
2017 (English)In: 2017 IEEE 37TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2017) / [ed] Lee, K Liu, L, IEEE COMPUTER SOC , 2017, p. 2525-2528Conference paper (Refereed)
Abstract [en]

Hadoop is a popular system for storing, managing, and processing large volumes of data, but it has bare-bones internal support for metadata, as metadata is a bottleneck and less means more scalability. The result is a scalable platform with rudimentary access control that is neither user-nor developer friendly. Also, metadata services that are built on Hadoop, such as SQL-on-Hadoop, access control, data provenance, and data governance are necessarily implemented as eventually consistent services, resulting in increased development effort and more brittle software. In this paper, we present a new project-based multi-tenancy model for Hadoop, built on a new distribution of Hadoop that provides a distributed database backend for the Hadoop Distributed Filesystem's (HDFS) metadata layer. We extend Hadoop's metadata model to introduce projects, datasets, and project-users as new core concepts that enable a user-friendly, UI-driven Hadoop experience. As our metadata service is backed by a transactional database, developers can easily extend metadata by adding new tables and ensure the strong consistency of extended metadata using both transactions and foreign keys.

Place, publisher, year, edition, pages
IEEE COMPUTER SOC , 2017. p. 2525-2528
Series
IEEE International Conference on Distributed Computing Systems, ISSN 1063-6927
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-217460DOI: 10.1109/ICDCS.2017.41ISI: 000412759500274Scopus ID: 2-s2.0-85027275789ISBN: 978-1-5386-1791-5 (print)OAI: oai:DiVA.org:kth-217460DiVA, id: diva2:1158011
Conference
37th IEEE International Conference on Distributed Computing Systems (ICDCS), JUN 05-08, 2017, Atlanta, GA
Note

QC 20171117

Available from: 2017-11-17 Created: 2017-11-17 Last updated: 2018-01-13Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records BETA

Ismail, MahmoudDowling, Jim

Search in DiVA

By author/editor
Ismail, MahmoudDowling, Jim
By organisation
Software and Computer systems, SCS
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 38 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf