Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
WorkflowDSL: Scalable Workflow Execution with Provenance for Data Analysis Applications
KTH.
KTH.
KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.ORCID iD: 0000-0002-4722-0823
Show others and affiliations
2018 (English)In: Proceedings - International Computer Software and Applications Conference, IEEE Computer Society , 2018, p. 774-779Conference paper, Published paper (Refereed)
Abstract [en]

Data analysis projects typically use different programming languages (from Python for prototyping to C++ for support of runtime constraints) at their different stages by different experts. This creates a need for a data processing framework that is re-usable across multiple programming languages and supports collaboration of experts. In this work, we discuss implementation of a framework which uses a Domain Specific Language (DSL), called WorkflowDSL, that enables domain experts to collaborate on fine-tuning workflows. The framework includes support for parallel execution without any specialized code. It also provides a provenance capturing framework that enables users to analyse past executions and retrieve complete lineage of any data item generated. Graph database is used for storing provenance data. Advantages of usage of a graph database compare to relational databases are demonstrated. Experiments which were performed using a real-world scientific workflow from the bioinformatics domain and industrial data analysis models show that users were able to execute workflows efficiently when using WorkflowDSL for workflow composition and Python for task implementations. Moreover, we show that capturing provenance data can be useful for analysing past workflow executions.

Place, publisher, year, edition, pages
IEEE Computer Society , 2018. p. 774-779
Keywords [en]
Data analysis workflows, Linage, Parallel execution, Provenance, Application programs, C++ (programming language), Graph Databases, Information analysis, Problem oriented languages, Software prototyping, Domain specific language (DSL), Parallel executions, Relational Database, Scientific workflows, Work-flows, Workflow composition, Data handling
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-247215DOI: 10.1109/COMPSAC.2018.00115Scopus ID: 2-s2.0-85055419742ISBN: 9781538626665 (print)OAI: oai:DiVA.org:kth-247215DiVA, id: diva2:1304976
Conference
42nd IEEE Computer Software and Applications Conference, COMPSAC 2018, 23 July 2018 through 27 July 2018
Note

QC 20190415

Available from: 2019-04-15 Created: 2019-04-15 Last updated: 2019-04-15Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records BETA

Matskin, Mihhail

Search in DiVA

By author/editor
Fernando, ThariduGureev, NikitaMatskin, Mihhail
By organisation
KTHSoftware and Computer systems, SCS
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 38 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf