Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Efficient iterative virtual screening with Apache Spark and conformal prediction
KTH, Skolan för elektroteknik och datavetenskap (EECS), Beräkningsvetenskap och beräkningsteknik (CST).ORCID-id: 0000-0001-6877-3702
Visa övriga samt affilieringar
2018 (Engelska)Ingår i: Journal of Cheminformatics, ISSN 1758-2946, E-ISSN 1758-2946, Vol. 10, artikel-id 8Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Background: Docking and scoring large libraries of ligands against target proteins forms the basis of structure-based virtual screening. The problem is trivially parallelizable, and calculations are generally carried out on computer clusters or on large workstations in a brute force manner, by docking and scoring all available ligands. Contribution: In this study we propose a strategy that is based on iteratively docking a set of ligands to form a training set, training a ligand-based model on this set, and predicting the remainder of the ligands to exclude those predicted as 'low-scoring' ligands. Then, another set of ligands are docked, the model is retrained and the process is repeated until a certain model efficiency level is reached. Thereafter, the remaining ligands are docked or excluded based on this model. We use SVM and conformal prediction to deliver valid prediction intervals for ranking the predicted ligands, and Apache Spark to parallelize both the docking and the modeling. Results: We show on 4 different targets that conformal prediction based virtual screening (CPVS) is able to reduce the number of docked molecules by 62.61% while retaining an accuracy for the top 30 hits of 94% on average and a speedup of 3.7. The implementation is available as open source via GitHub (https://github.com/laeeq80/spark-cpvs) and can be run on high-performance computers as well as on cloud resources.

Ort, förlag, år, upplaga, sidor
BioMed Central, 2018. Vol. 10, artikel-id 8
Nyckelord [en]
Virtual screening, Docking, Conformal prediction, Cloud computing, Apache Spark
Nationell ämneskategori
Kemi Datavetenskap (datalogi)
Identifikatorer
URN: urn:nbn:se:kth:diva-224683DOI: 10.1186/s13321-018-0265-zISI: 000426699400001PubMedID: 29492726Scopus ID: 2-s2.0-85042857389OAI: oai:DiVA.org:kth-224683DiVA, id: diva2:1192864
Forskningsfinansiär
Swedish e‐Science Research CenterSwedish National Infrastructure for Computing (SNIC)
Anmärkning

QC 20180323

Tillgänglig från: 2018-03-23 Skapad: 2018-03-23 Senast uppdaterad: 2018-03-23Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltextPubMedScopus

Personposter BETA

Ahmed, LaeeqLaure, Erwin

Sök vidare i DiVA

Av författaren/redaktören
Ahmed, LaeeqLaure, Erwin
Av organisationen
Beräkningsvetenskap och beräkningsteknik (CST)
I samma tidskrift
Journal of Cheminformatics
KemiDatavetenskap (datalogi)

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
pubmed
urn-nbn

Altmetricpoäng

doi
pubmed
urn-nbn
Totalt: 453 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf