Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A programming model and foundation for lineage-based distributed computation
KTH, School of Electrical Engineering and Computer Science (EECS).ORCID iD: 0000-0002-2659-5271
2018 (English)In: Journal of functional programming (Print), ISSN 0956-7968, E-ISSN 1469-7653, Vol. 28, article id e7Article in journal (Refereed) Published
Abstract [en]

The most successful systems for "big data" processing have all adopted functional APIs. We present a new programming model, we call function passing, designed to provide a more principled substrate, or middleware, upon which to build data-centric distributed systems like Spark. A key idea is to build up a persistent functional data structure representing transformations on distributed immutable data by passing well-typed serializable functions over the wire and applying them to this distributed data. Thus, the function passing model can be thought of as a persistent functional data structure that is distributed, where transformations performed on distributed data are stored in its nodes rather than the distributed data itself. One advantage of this model is that failure recovery is simplified by design - data can be recovered by replaying function applications atop immutable data loaded from stable storage. Deferred evaluation is also central to our model; by incorporating deferred evaluation into our design only at the point of initiating network communication, the function passing model remains easy to reason about while remaining efficient in time and memory. Moreover, we provide a complete formalization of the programming model in order to study the foundations of lineage-based distributed computation. In particular, we develop a theory of safe, mobile lineages based on a subject reduction theorem for a typed core language. Furthermore, we formalize a progress theorem that guarantees the finite materialization of remote, lineage-based data. Thus, the formal model may serve as a basis for further developments of the theory of data-centric distributed programming, including aspects such as fault tolerance. We provide an open-source implementation of our model in and for the Scala programming language, along with a case study of several example frameworks and end-user programs written atop this model.

Place, publisher, year, edition, pages
Cambridge University Press, 2018. Vol. 28, article id e7
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-225189DOI: 10.1017/S0956796818000035ISI: 000427266900001OAI: oai:DiVA.org:kth-225189DiVA, id: diva2:1194878
Note

QC 20180404

Available from: 2018-04-04 Created: 2018-04-04 Last updated: 2018-04-04Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full text

Authority records BETA

Haller, Philipp

Search in DiVA

By author/editor
Haller, Philipp
By organisation
School of Electrical Engineering and Computer Science (EECS)
In the same journal
Journal of functional programming (Print)
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 116 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf