kth.sePublications KTH
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
ARM SVE Unleashed: Performance and Insights Across HPC Applications on Nvidia Grace
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).ORCID iD: 0009-0003-4387-367X
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).ORCID iD: 0009-0003-6504-7109
Lawrence Livermore National Laboratory, Livermore, USA.ORCID iD: 0000-0003-4229-5735
Lawrence Livermore National Laboratory, Livermore, USA.ORCID iD: 0000-0003-4977-814X
Show others and affiliations
2026 (English)In: Euro-Par 2025: Parallel Processing - 31st European Conference on Parallel and Distributed Processing, Proceedings, Springer Nature , 2026, p. 33-47Conference paper, Published paper (Refereed)
Abstract [en]

Vector architectures are essential for boosting computing throughput. ARM provides SVE as the next-generation length-agnostic vector extension beyond traditional fixed-length SIMD. This work provides a first study of the maturity and readiness of exploiting ARM and SVE in HPC. Using selected performance hardware events on the ARM Grace processor and analytical models, we derive new metrics to quantify the effectiveness of exploiting SVE vectorization to reduce executed instructions and improve performance speedup. We further propose an adapted roofline model that combines vector length and data elements to identify potential performance bottlenecks. Finally, we propose a decision tree for classifying the SVE-boosted performance in applications.

Place, publisher, year, edition, pages
Springer Nature , 2026. p. 33-47
National Category
Computer Systems Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-370460DOI: 10.1007/978-3-031-99857-7_3Scopus ID: 2-s2.0-105014494119OAI: oai:DiVA.org:kth-370460DiVA, id: diva2:2002085
Conference
31st International Conference on Parallel and Distributed Computing, Euro-Par 2025, Dresden, Germany, August 25-29, 2025
Note

Part of ISBN 9783031998560

QC 20250929

Available from: 2025-09-29 Created: 2025-09-29 Last updated: 2025-09-29Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Shi, RuiminSchieffer, GabinPeng, Bo

Search in DiVA

By author/editor
Shi, RuiminSchieffer, GabinGokhale, MayaLin, Pei HungPatel, HirenPeng, Bo
By organisation
Computational Science and Technology (CST)
Computer SystemsComputer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 75 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf