Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Scaling HDFS to more than 1 million operations per second with HopsFS
KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.ORCID iD: 0000-0002-6578-3902
KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.ORCID iD: 0000-0002-1672-6899
Show others and affiliations
2017 (English)In: Proceedings - 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2017, Institute of Electrical and Electronics Engineers Inc. , 2017, p. 683-688Conference paper, Published paper (Refereed)
Abstract [en]

HopsFS is an open-source, next generation distribution of the Apache Hadoop Distributed File System(HDFS) that replaces the main scalability bottleneck in HDFS, single node in-memory metadata service, with a no-sharedstate distributed system built on a NewSQL database. By removing the metadata bottleneck in Apache HDFS, HopsFS enables significantly larger cluster sizes, more than an order of magnitude higher throughput, and significantly lower clientlatencies for large clusters. In this paper, we detail the techniques and optimizations that enable HopsFS to surpass 1 million file system operations per second-at least 16 times higher throughput than HDFS. In particular, we discuss how we exploit recent high performance features from NewSQL databases, such as application defined partitioning, partition-pruned index scans, and distribution aware transactions. Together with more traditional techniques, such as batching and write-Ahead caches, we show how many incremental optimizations have enabled a revolution in distributed hierarchical file system performance.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers Inc. , 2017. p. 683-688
Keywords [en]
Distributed File System, File System Design, High-performance file systems, NewSQL, Cluster computing, Computer software, Distributed database systems, File organization, Grid computing, Metadata, Open systems, Distributed file systems, Distributed systems, File systems, Hadoop distributed file system (HDFS), Incremental optimization, Metadata services, Traditional techniques, Distributed computer systems
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-216289DOI: 10.1109/CCGRID.2017.117ISI: 000426912900075Scopus ID: 2-s2.0-85027463411ISBN: 9781509066100 (print)OAI: oai:DiVA.org:kth-216289DiVA, id: diva2:1164610
Conference
17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2017, 14 May 2017 through 17 May 2017
Note

QC 20171211

Available from: 2017-12-11 Created: 2017-12-11 Last updated: 2018-04-03Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records BETA

Ismail, MahmoudNiazi, SalmanDowling, Jim

Search in DiVA

By author/editor
Ismail, MahmoudNiazi, SalmanDowling, Jim
By organisation
Software and Computer systems, SCS
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 13 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf