Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
BiobankCloud: A platform for the secure storage, sharing, and processing of large biomedical data sets
Vise andre og tillknytning
2016 (engelsk)Inngår i: 1st International Workshop on Data Management and Analytics for Medicine and Healthcare, DMAH 2015 and Workshop on Big-Graphs Online Querying, Big-O(Q) 2015 held in conjunction with 41st International Conference on Very Large Data Bases, VLDB 2015, Springer, 2016, s. 89-105Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Biobanks store and catalog human biological material that is increasingly being digitized using next-generation sequencing (NGS). There is, however, a computational bottleneck, as existing software systems are not scalable and secure enough to store and process the incoming wave of genomic data from NGS machines. In the BiobankCloud project, we are building a Hadoop-based platform for the secure storage, sharing, and parallel processing of genomic data. We extended Hadoop to include support for multi-tenant studies, reduced storage requirements with erasure coding, and added support for extensible and consistent metadata. On top of Hadoop, we built a scalable scientific workflow engine featuring a proper workflow definition language focusing on simple integration and chaining of existing tools, adaptive scheduling on Apache Yarn, and support for iterative dataflows. Our platform also supports the secure sharing of data across different, distributed Hadoop clusters. The software is easily installed and comes with a user-friendly web interface for running, managing, and accessing data sets behind a secure 2-factor authentication. Initial tests have shown that the engine scales well to dozens of nodes. The entire system is open-source and includes pre-defined workflows for popular tasks in biomedical data analysis, such as variant identification, differential transcriptome analysis using RNA-Seq, and analysis of miRNA-Seq and ChIP-Seq data.

sted, utgiver, år, opplag, sider
Springer, 2016. s. 89-105
Emneord [en]
Biological materials, Data handling, Engines, Genes, Information management, Open source software, Open systems, RNA, Storage (materials), Adaptive scheduling, Biomedical data analysis, Computational bottlenecks, Next-generation sequencing, Parallel processing, Scientific workflow engines, Storage requirements, Transcriptome analysis, Digital storage
HSV kategori
Identifikatorer
URN: urn:nbn:se:kth:diva-195513DOI: 10.1007/978-3-319-41576-5_7ISI: 000387957300007Scopus ID: 2-s2.0-84977528973ISBN: 9783319415758 (tryckt)OAI: oai:DiVA.org:kth-195513DiVA, id: diva2:1045827
Konferanse
31 August 2015 through 4 September 2015
Merknad

QC 20161110

Tilgjengelig fra: 2016-11-10 Laget: 2016-11-03 Sist oppdatert: 2018-01-13bibliografisk kontrollert

Open Access i DiVA

Fulltekst mangler i DiVA

Andre lenker

Forlagets fulltekstScopus

Personposter BETA

Dowling, JimHakimzadeh, KamalIsmail, Mahmoud

Søk i DiVA

Av forfatter/redaktør
Dowling, JimGholami, AliHakimzadeh, KamalIsmail, MahmoudLaure, ErwinNiazi, Salman
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric

doi
isbn
urn-nbn
Totalt: 825 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf