Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Large-scale virtual screening on public cloud resources with Apache Spark
KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).ORCID iD: 0000-0001-6877-3702
KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC. KTH, Centres, SeRC - Swedish e-Science Research Centre. KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).ORCID iD: 0000-0002-9901-9857
Show others and affiliations
2017 (English)In: Journal of Cheminformatics, ISSN 1758-2946, E-ISSN 1758-2946, Vol. 9, article id 15Article in journal (Refereed) Published
Abstract [en]

Background: Structure-based virtual screening is an in-silico method to screen a target receptor against a virtual molecular library. Applying docking-based screening to large molecular libraries can be computationally expensive, however it constitutes a trivially parallelizable task. Most of the available parallel implementations are based on message passing interface, relying on low failure rate hardware and fast network connection. Google's MapReduce revolutionized large-scale analysis, enabling the processing of massive datasets on commodity hardware and cloud resources, providing transparent scalability and fault tolerance at the software level. Open source implementations of MapReduce include Apache Hadoop and the more recent Apache Spark. Results: We developed a method to run existing docking-based screening software on distributed cloud resources, utilizing the MapReduce approach. We benchmarked our method, which is implemented in Apache Spark, docking a publicly available target receptor against similar to 2.2 M compounds. The performance experiments show a good parallel efficiency (87%) when running in a public cloud environment. Conclusion: Our method enables parallel Structure-based virtual screening on public cloud resources or commodity computer clusters. The degree of scalability that we achieve allows for trying out our method on relatively small libraries first and then to scale to larger libraries.

Place, publisher, year, edition, pages
BioMed Central, 2017. Vol. 9, article id 15
Keyword [en]
Virtual screening, Docking, Cloud computing, Apache Spark
National Category
Computer Systems
Identifiers
URN: urn:nbn:se:kth:diva-205469DOI: 10.1186/s13321-017-0204-4ISI: 000396830300001PubMedID: 28316653Scopus ID: 2-s2.0-85014678539OAI: oai:DiVA.org:kth-205469DiVA, id: diva2:1097896
Note

QC 20170523

Available from: 2017-05-23 Created: 2017-05-23 Last updated: 2017-05-23Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textPubMedScopus

Authority records BETA

Ahmed, LaeeqLaure, Erwin

Search in DiVA

By author/editor
Ahmed, LaeeqLaure, ErwinSpjuth, Ola
By organisation
High Performance Computing and Visualization (HPCViz)Centre for High Performance Computing, PDCSeRC - Swedish e-Science Research CentreComputational Science and Technology (CST)
In the same journal
Journal of Cheminformatics
Computer Systems

Search outside of DiVA

GoogleGoogle Scholar

doi
pubmed
urn-nbn

Altmetric score

doi
pubmed
urn-nbn
Total: 22 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf