Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
WorkflowDSL: Scalable Workflow Execution with Provenance for Data Analysis Applications
KTH.
KTH.
KTH, Skolan för elektroteknik och datavetenskap (EECS), Programvaruteknik och datorsystem, SCS.ORCID-id: 0000-0002-4722-0823
Visa övriga samt affilieringar
2018 (Engelska)Ingår i: Proceedings - International Computer Software and Applications Conference, IEEE Computer Society , 2018, s. 774-779Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Data analysis projects typically use different programming languages (from Python for prototyping to C++ for support of runtime constraints) at their different stages by different experts. This creates a need for a data processing framework that is re-usable across multiple programming languages and supports collaboration of experts. In this work, we discuss implementation of a framework which uses a Domain Specific Language (DSL), called WorkflowDSL, that enables domain experts to collaborate on fine-tuning workflows. The framework includes support for parallel execution without any specialized code. It also provides a provenance capturing framework that enables users to analyse past executions and retrieve complete lineage of any data item generated. Graph database is used for storing provenance data. Advantages of usage of a graph database compare to relational databases are demonstrated. Experiments which were performed using a real-world scientific workflow from the bioinformatics domain and industrial data analysis models show that users were able to execute workflows efficiently when using WorkflowDSL for workflow composition and Python for task implementations. Moreover, we show that capturing provenance data can be useful for analysing past workflow executions.

Ort, förlag, år, upplaga, sidor
IEEE Computer Society , 2018. s. 774-779
Nyckelord [en]
Data analysis workflows, Linage, Parallel execution, Provenance, Application programs, C++ (programming language), Graph Databases, Information analysis, Problem oriented languages, Software prototyping, Domain specific language (DSL), Parallel executions, Relational Database, Scientific workflows, Work-flows, Workflow composition, Data handling
Nationell ämneskategori
Data- och informationsvetenskap
Identifikatorer
URN: urn:nbn:se:kth:diva-247215DOI: 10.1109/COMPSAC.2018.00115Scopus ID: 2-s2.0-85055419742ISBN: 9781538626665 (tryckt)OAI: oai:DiVA.org:kth-247215DiVA, id: diva2:1304976
Konferens
42nd IEEE Computer Software and Applications Conference, COMPSAC 2018, 23 July 2018 through 27 July 2018
Anmärkning

QC 20190415

Tillgänglig från: 2019-04-15 Skapad: 2019-04-15 Senast uppdaterad: 2019-04-15Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltextScopus

Personposter BETA

Matskin, Mihhail

Sök vidare i DiVA

Av författaren/redaktören
Fernando, ThariduGureev, NikitaMatskin, Mihhail
Av organisationen
KTHProgramvaruteknik och datorsystem, SCS
Data- och informationsvetenskap

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetricpoäng

doi
isbn
urn-nbn
Totalt: 130 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf