Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
WorkflowDSL: Scalable Workflow Execution with Provenance for Data Analysis Applications
KTH.
KTH.
KTH, Skolan för elektroteknik och datavetenskap (EECS), Programvaruteknik och datorsystem, SCS.ORCID-id: 0000-0002-4722-0823
Vise andre og tillknytning
2018 (engelsk)Inngår i: Proceedings - International Computer Software and Applications Conference, IEEE Computer Society , 2018, s. 774-779Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Data analysis projects typically use different programming languages (from Python for prototyping to C++ for support of runtime constraints) at their different stages by different experts. This creates a need for a data processing framework that is re-usable across multiple programming languages and supports collaboration of experts. In this work, we discuss implementation of a framework which uses a Domain Specific Language (DSL), called WorkflowDSL, that enables domain experts to collaborate on fine-tuning workflows. The framework includes support for parallel execution without any specialized code. It also provides a provenance capturing framework that enables users to analyse past executions and retrieve complete lineage of any data item generated. Graph database is used for storing provenance data. Advantages of usage of a graph database compare to relational databases are demonstrated. Experiments which were performed using a real-world scientific workflow from the bioinformatics domain and industrial data analysis models show that users were able to execute workflows efficiently when using WorkflowDSL for workflow composition and Python for task implementations. Moreover, we show that capturing provenance data can be useful for analysing past workflow executions.

sted, utgiver, år, opplag, sider
IEEE Computer Society , 2018. s. 774-779
Emneord [en]
Data analysis workflows, Linage, Parallel execution, Provenance, Application programs, C++ (programming language), Graph Databases, Information analysis, Problem oriented languages, Software prototyping, Domain specific language (DSL), Parallel executions, Relational Database, Scientific workflows, Work-flows, Workflow composition, Data handling
HSV kategori
Identifikatorer
URN: urn:nbn:se:kth:diva-247215DOI: 10.1109/COMPSAC.2018.00115Scopus ID: 2-s2.0-85055419742ISBN: 9781538626665 (tryckt)OAI: oai:DiVA.org:kth-247215DiVA, id: diva2:1304976
Konferanse
42nd IEEE Computer Software and Applications Conference, COMPSAC 2018, 23 July 2018 through 27 July 2018
Merknad

QC 20190415

Tilgjengelig fra: 2019-04-15 Laget: 2019-04-15 Sist oppdatert: 2019-04-15bibliografisk kontrollert

Open Access i DiVA

Fulltekst mangler i DiVA

Andre lenker

Forlagets fulltekstScopus

Personposter BETA

Matskin, Mihhail

Søk i DiVA

Av forfatter/redaktør
Fernando, ThariduGureev, NikitaMatskin, Mihhail
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric

doi
isbn
urn-nbn
Totalt: 148 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf