Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Poster - 3D Tixels: A highly efficient algorithm for GPU/CPU-acceleration of molecular dynamics on heterogeneous parallel architectures
KTH, School of Engineering Sciences (SCI), Theoretical Physics, Theoretical & Computational Biophysics.ORCID iD: 0000-0003-0603-5514
KTH, School of Engineering Sciences (SCI), Theoretical Physics, Theoretical & Computational Biophysics.ORCID iD: 0000-0002-7498-7763
KTH, School of Engineering Sciences (SCI), Theoretical Physics, Theoretical & Computational Biophysics.ORCID iD: 0000-0002-2734-2794
2011 (English)In: SC - Proc. High Perform. Comput. Networking, Storage Anal. Companion, Co-located SC, 2011, 71-72 p.Conference paper, Published paper (Refereed)
Abstract [en]

Several GPU-based algorithms have been developed to ac-celerate biomolecular simulations, but although they pro-vide benefits over single-core implementations, they have not been able to surpass the performance of state-of-the art SIMD CPU implementations (e.g. GROMACS), not to mention efficient scaling. Here, we present a heteroge-nous parallelization that utilizes both CPU and GPU re-sources efficiently. A novel fixed-particle-number sub-cell algorithm for non-bonded force calculation was developed. The algorithm uses the SIMD width as algorithmic work unit, it is intrinsically future-proof since it can be adapted to future hardware. The CUDA non-bonded kernel imple-mentation achieves up to 60% work-efficiency, 1.5 IPC, and 95% L1 cache utilization. On the CPU OpenMP-parallelized SSE-accelerated code runs overlapping with GPU execution. Fully automated dynamic inter-process as well as CPU-GPU load balancing is employed. We achieve threefold speedup compared to equivalent GROMACS CPU code and show good strong and weak scaling. To the best of our knowledge this the fastest GPU molecular dynamics implementation presented to date.

Place, publisher, year, edition, pages
2011. 71-72 p.
Keyword [en]
GPGPU, GPU, Heterogeneous architectures, Molecular dynamics, Multi-level paralleliza-tion, Biomolecular Simulation, Cache utilization, Efficient algorithm, Force calculation, GPU-based algorithms, Multi-level, Parallelizations, State of the art, Algorithms, Application programming interfaces (API), Core levels, Parallel architectures, Program processors
National Category
Biochemistry and Molecular Biology
Identifiers
URN: urn:nbn:se:kth:diva-149917DOI: 10.1145/2148600.2148637Scopus ID: 2-s2.0-84859097973ISBN: 9781450310307 (print)OAI: oai:DiVA.org:kth-149917DiVA: diva2:742361
Conference
2011 High Performance Computing Networking, Storage and Analysis, SC'11, Co-located with SC'11, 12 November 2011 through 18 November 2011, Seattle, WA
Note

QC 20140901

Available from: 2014-09-01 Created: 2014-08-28 Last updated: 2014-09-01Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textScopus

Authority records BETA

Páll, SzilárdHess, BerkLindahl, Erik

Search in DiVA

By author/editor
Páll, SzilárdHess, BerkLindahl, Erik
By organisation
Theoretical & Computational Biophysics
Biochemistry and Molecular Biology

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 230 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf