Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Dela-Sharing Large Datasets between Hadoop Clusters
KTH, Skolan för informations- och kommunikationsteknik (ICT), Programvaruteknik och Datorsystem, SCS.
KTH, Skolan för informations- och kommunikationsteknik (ICT), Programvaruteknik och Datorsystem, SCS.
2017 (engelsk)Inngår i: Proceedings - International Conference on Distributed Computing Systems, Institute of Electrical and Electronics Engineers (IEEE), 2017, s. 2533-2536, artikkel-id 7980225Konferansepaper (Fagfellevurdert)
Abstract [en]

Big data has, in recent years, revolutionised an evergrowing number of fields, from machine learning to climate science to genomics. The current state-of-the-art for storing large datasets is either object stores or distributed filesystems, with Hadoop being the dominant open-source platform for managing 'Big Data'. Existing large-scale storage platforms, however, lack support for the efficient sharing of large datasets over the Internet. Those systems that are widely used for the dissemination of large files, like BitTorrent, need to be adapted to handle challenges such as network links with both high latency and high bandwidth, and scalable storage backends that are optimised for streaming and not random access. In this paper, we introduce Dela, a peer-to-peer data-sharing service integrated into the Hops Hadoop platform that provides an end-to-end solution for dataset sharing. Dela is designed for large-scale storage backends and data transfers that are both non-intrusive to existing TCP network traffic and provide higher network throughput than TCP on high latency, high bandwidth network links, such as transatlantic network links. Dela provides a pluggable storage layer, implementing two alternative ways for clients to access shared data: stream processing of data as it arrives with Kafka, and traditional offline access to data using the Hadoop Distributed Filesystem. Dela is the first step for the Hadoop platform towards creating an open dataset ecosystem that supports user-friendly publishing, searching, and downloading of large datasets.

sted, utgiver, år, opplag, sider
Institute of Electrical and Electronics Engineers (IEEE), 2017. s. 2533-2536, artikkel-id 7980225
Emneord [en]
Big Data, BitTorrent, Dataset sharing, Hadoop, Peer-to-peer
HSV kategori
Identifikatorer
URN: urn:nbn:se:kth:diva-212447DOI: 10.1109/ICDCS.2017.199ISI: 000412759500276Scopus ID: 2-s2.0-85027245167ISBN: 9781538617915 (tryckt)OAI: oai:DiVA.org:kth-212447DiVA, id: diva2:1135129
Konferanse
37th IEEE International Conference on Distributed Computing Systems, ICDCS 2017, J.W. Marriott Hotel, Atlanta, United States, 5 June 2017 through 8 June 2017
Merknad

QC 20170822

Tilgjengelig fra: 2017-08-22 Laget: 2017-08-22 Sist oppdatert: 2018-01-13bibliografisk kontrollert

Open Access i DiVA

Fulltekst mangler i DiVA

Andre lenker

Forlagets fulltekstScopus

Søk i DiVA

Av forfatter/redaktør
Ormenisan, Alexandru-AdrianDownling, Jim
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric

doi
isbn
urn-nbn
Totalt: 27 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf