Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Using iterative MapReduce for parallel virtual screening
KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz).ORCID-id: 0000-0001-6877-3702
KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz).
KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz).ORCID-id: 0000-0002-9901-9857
2013 (Engelska)Ingår i: 2013 IEEE 5th International Conference on Cloud Computing Technology and Science (CloudCom), IEEE Computer Society, 2013, s. 27-32Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Virtual Screening is a technique in chemo informatics used for Drug discovery by searching large libraries of molecule structures. Virtual Screening often uses SVM, a supervised machine learning technique used for regression and classification analysis. Virtual screening using SVM not only involves huge datasets, but it is also compute expensive with a complexity that can grow at least up to O(n2). SVM based applications most commonly use MPI, which becomes complex and impractical with large datasets. As an alternative to MPI, MapReduce, and its different implementations, have been successfully used on commodity clusters for analysis of data for problems with very large datasets. Due to the large libraries of molecule structures in virtual screening, it becomes a good candidate for MapReduce. In this paper we present a MapReduce implementation of SVM based virtual screening, using Spark, an iterative MapReduce programming model. We show that our implementation has a good scaling behaviour and opens up the possibility of using huge public cloud infrastructures efficiently for virtual screening.

Ort, förlag, år, upplaga, sidor
IEEE Computer Society, 2013. s. 27-32
Nyckelord [en]
Big Data, Chemoinformatics, MapReduce, Parallel SVM, Spark
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
URN: urn:nbn:se:kth:diva-146956DOI: 10.1109/CloudCom.2013.99ISI: 000352079100005Scopus ID: 2-s2.0-84899736110ISBN: 978-0-7695-5095-4 (tryckt)OAI: oai:DiVA.org:kth-146956DiVA, id: diva2:726873
Konferens
5th IEEE International Conference on Cloud Computing Technology and Science, CloudCom 2013; Bristol; United Kingdom; 2 December 2013 through 5 December 2013
Anmärkning

QC 20140619

Tillgänglig från: 2014-06-19 Skapad: 2014-06-18 Senast uppdaterad: 2018-01-11Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltextScopus

Personposter BETA

Ahmed, LaeeqLaure, Erwin

Sök vidare i DiVA

Av författaren/redaktören
Ahmed, LaeeqEdlund, ÅkeLaure, Erwin
Av organisationen
High Performance Computing and Visualization (HPCViz)
Datavetenskap (datalogi)

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetricpoäng

doi
isbn
urn-nbn
Totalt: 434 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf