Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Efficient iterative virtual screening with Apache Spark and conformal prediction
KTH, School of Electrical Engineering and Computer Science (EECS), Computational Science and Technology (CST).ORCID iD: 0000-0001-6877-3702
Show others and affiliations
2018 (English)In: Journal of Cheminformatics, ISSN 1758-2946, E-ISSN 1758-2946, Vol. 10, article id 8Article in journal (Refereed) Published
Abstract [en]

Background: Docking and scoring large libraries of ligands against target proteins forms the basis of structure-based virtual screening. The problem is trivially parallelizable, and calculations are generally carried out on computer clusters or on large workstations in a brute force manner, by docking and scoring all available ligands. Contribution: In this study we propose a strategy that is based on iteratively docking a set of ligands to form a training set, training a ligand-based model on this set, and predicting the remainder of the ligands to exclude those predicted as 'low-scoring' ligands. Then, another set of ligands are docked, the model is retrained and the process is repeated until a certain model efficiency level is reached. Thereafter, the remaining ligands are docked or excluded based on this model. We use SVM and conformal prediction to deliver valid prediction intervals for ranking the predicted ligands, and Apache Spark to parallelize both the docking and the modeling. Results: We show on 4 different targets that conformal prediction based virtual screening (CPVS) is able to reduce the number of docked molecules by 62.61% while retaining an accuracy for the top 30 hits of 94% on average and a speedup of 3.7. The implementation is available as open source via GitHub (https://github.com/laeeq80/spark-cpvs) and can be run on high-performance computers as well as on cloud resources.

Place, publisher, year, edition, pages
BioMed Central, 2018. Vol. 10, article id 8
Keyword [en]
Virtual screening, Docking, Conformal prediction, Cloud computing, Apache Spark
National Category
Chemical Sciences Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-224683DOI: 10.1186/s13321-018-0265-zISI: 000426699400001PubMedID: 29492726Scopus ID: 2-s2.0-85042857389OAI: oai:DiVA.org:kth-224683DiVA, id: diva2:1192864
Funder
Swedish eā€Science Research CenterSwedish National Infrastructure for Computing (SNIC)
Note

QC 20180323

Available from: 2018-03-23 Created: 2018-03-23 Last updated: 2018-03-23Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textPubMedScopus

Authority records BETA

Ahmed, LaeeqLaure, Erwin

Search in DiVA

By author/editor
Ahmed, LaeeqLaure, Erwin
By organisation
Computational Science and Technology (CST)
In the same journal
Journal of Cheminformatics
Chemical SciencesComputer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
pubmed
urn-nbn

Altmetric score

doi
pubmed
urn-nbn
Total: 24 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf