kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Comparing the Performance of SYCL Runtimes for Molecular Dynamics Applications
KTH, School of Engineering Sciences (SCI), Applied Physics, Biophysics.
KTH, Centres, Science for Life Laboratory, SciLifeLab. KTH, Centres, SeRC - Swedish e-Science Research Centre. KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.ORCID iD: 0000-0003-0603-5514
2023 (English)In: International Workshop on OpenCL (IWOCL ’23), ACM Digital Library, 2023, article id 6Conference paper, Oral presentation with published abstract (Refereed)
Abstract [en]

SYCL is a cross-platform, royalty-free standard for programming a wide range of hardware accelerators. It is a powerful and convenient way to write standard C++ 17 code that can take full advantage of available devices. There are already multiple SYCL implementations targeting a wide range of platforms, from embedded to HPC clusters. Since several implementations can target the same hardware, application developers and users must know how to choose the most fitting runtime for their needs. In this talk, we will compare the runtime performance of two major SYCL runtimes targeting GPUs, oneAPI DPC++ and Open SYCL [3], to the native implementations for the purposes of GROMACS, a high-performance molecular dynamics engine.Molecular dynamics (MD) applications were one of the earliest adopters of GPU acceleration, with force calculations being an obvious target for offloading. It is an iterative algorithm where, in its most basic form, on each step, forces acting between particles are computed, and then the equations of motions are integrated. As the computational power of the GPUs grew, the strong scaling problem became apparent: the biophysical systems modeled with molecular dynamics typically have fixed sizes, and the goal is to perform more time steps, each taking less than a millisecond of wall time. This places high demands on the underlying GPU framework, requiring it to efficiently schedule multiple small tasks with minimal overhead, allowing to achieve overlap between CPU and GPU work for large systems and allowing to keep GPU occupied for smaller systems. Another requirement is the ability of application developers to have control over the scheduling to optimize for external dependencies, such as MPI communication.GROMACS is a widely-used MD engine, supporting a wide range of hardware and software platforms, from laptops to the largest supercomputers [1]. Portability and performance across multiple architectures have always been one of the primary goals of the project, necessary to keep the code not only efficient but also maintainable. The initial support for NVIDIA accelerators, using CUDA, was added to GROMACS in 2010. Since then, heterogeneous parallelization has been a major target for performance optimization, not limited to NVIDIA devices but later adding support for GPUs of other vendors, as well as Xeon Phi accelerators. GROMACS initially adopted SYCL in its 2021 release to replace its previous GPU portability layer, OpenCL [2]. In further releases, the number of offloading modes supported by the SYCL backend steadily increased. As of GROMACS 2023, SYCL support in GROMACS achieved near feature parity with CUDA while allowing the use of a single code to target the GPUs of all three major vendors with minimal specialization.While this clearly supports the portability promise of modern SYCL implementations, the performance of such portable code remains an open question, especially given the strict requirements of MD algorithms. In this talk, we compare the performance of GROMACS across a wide range of system sizes when using oneAPI DPC++ and Open SYCL runtimes on high-performance NVIDIA, AMD, and Intel GPUs. Besides the analysis of individual kernel performance, we focus on the runtime overhead and the efficiency of task scheduling when compared to a highly optimized implementation using the native frameworks and discuss the possible sources of suboptimal performance and the amount of vendor-specific code branches, such as intrinsics or workarounds for compiler bugs, required to achieve the optimal performance.

Place, publisher, year, edition, pages
ACM Digital Library, 2023. article id 6
Keywords [en]
SYCL; GROMACS; heterogeneous acceleration; molecular dynamics; performance-portability
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-351151DOI: 10.1145/3585341.3585350ISBN: 9798400707452 (print)OAI: oai:DiVA.org:kth-351151DiVA, id: diva2:1886365
Conference
International Workshop on OpenCL (IWOCL'23),Cambridge United Kingdom April 18 - 20, 2023
Note

QC 20240802

Available from: 2024-07-31 Created: 2024-07-31 Last updated: 2024-08-02Bibliographically approved

Open Access in DiVA

fulltext(393 kB)19 downloads
File information
File name FULLTEXT01.pdfFile size 393 kBChecksum SHA-512
b6f8309c6d99cb9d1874de5e29584c1a5602ccb81bbd78829fbf9c2294b815d45e325104de6a85908a6afd3649f5cbfc473a549778ec6b21c71b3006fb5d4796
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Authority records

Alekseenko, AndreyPall, Szilard

Search in DiVA

By author/editor
Alekseenko, AndreyPall, Szilard
By organisation
BiophysicsScience for Life Laboratory, SciLifeLabSeRC - Swedish e-Science Research CentreCentre for High Performance Computing, PDC
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 19 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 71 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf