Redesigning GROMACS Halo Exchange: Improving Strong Scaling with GPU-initiated NVSHMEMShow others and affiliations
2025 (English)In: Proceedings of 2025 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC 2025 Workshops, Association for Computing Machinery (ACM) , 2025, p. 1314-1329Conference paper, Published paper (Refereed)
Abstract [en]
Improving time-to-solution in molecular dynamics simulations often requires strong scaling due to fixed-sized problems. GROMACS is highly latency-sensitive, with peak iteration rates in the sub-millisecond, making scalability on heterogeneous supercomputers challenging. MPI’s CPU-centric nature introduces additional latencies on GPU-resident applications’ critical path, hindering GPU utilization and scalability. To address these limitations, we present an NVSHMEM-based GPU kernel-initiated redesign of the GROMACS domain decomposition halo-exchange algorithm. Highly tuned GPU kernels fuse data packing and communication, leveraging hardware latency-hiding for fine-grained overlap. We employ kernel fusion across overlapped data forwarding communication phases and utilize the asynchronous copy engine over NVLink to optimize latency and bandwidth. Our GPU-resident formulation greatly increases communication-computation overlap, improving GROMACS strong scaling performance across NVLink by up to 1.5x (intra-node) and 2x (multi-node), and up to 1.3x multi-node over NVLink+InfiniBand. This demonstrates the profound benefits of GPU-initiated communication for strong-scaling a broad range of latency-sensitive applications.
Place, publisher, year, edition, pages
Association for Computing Machinery (ACM) , 2025. p. 1314-1329
Keywords [en]
GPU, GPU-initiated communication, GROMACS, halo exchange, molecular dynamics, NVSHMEM
National Category
Chemical Sciences
Identifiers
URN: urn:nbn:se:kth:diva-373943DOI: 10.1145/3731599.3767508Scopus ID: 2-s2.0-105023373270OAI: oai:DiVA.org:kth-373943DiVA, id: diva2:2020984
Conference
2025 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC 2025 Workshops, St. Louis, United States of America, Nov 16 2025 - Nov 21 2025
Note
Part of ISBN 9798400718717
QC 20251212
2025-12-122025-12-122025-12-12Bibliographically approved