kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
At the Locus of Performance: Quantifying the Effects of Copious 3D-Stacked Cache on HPC Workloads
RIKEN Center for Computational Science, 7-1-26 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo, Japan, 650-0047, 7-1-26 Minatojima-minamimachi, Chuo-ku, Hyogo.
RIKEN Center for Computational Science, 7-1-26 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo, Japan, 650-0047, 7-1-26 Minatojima-minamimachi, Chuo-ku, Hyogo.
Intel Corporation, 2111 NE 25th Ave, Hillsboro, Oregon, United States, 97124, 2111 NE 25th Ave.
RIKEN Center for Computational Science, 7-1-26 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo, Japan, 650-0047, 7-1-26 Minatojima-minamimachi, Chuo-ku, Hyogo.
Show others and affiliations
2023 (English)In: ACM Transactions on Architecture and Code Optimization (TACO), ISSN 1544-3566, E-ISSN 1544-3973, Vol. 20, no 4, article id 57Article in journal (Refereed) Published
Abstract [en]

Over the last three decades, innovations in the memory subsystem were primarily targeted at overcoming the data movement bottleneck. In this paper, we focus on a specific market trend in memory technology: 3D-stacked memory and caches. We investigate the impact of extending the on-chip memory capabilities in future HPC-focused processors, particularly by 3D-stacked SRAM. First, we propose a method oblivious to the memory subsystem to gauge the upper-bound in performance improvements when data movement costs are eliminated. Then, using the gem5 simulator, we model two variants of a hypothetical LARge Cache processor (LARC), fabricated in 1.5 nm and enriched with high-capacity 3D-stacked cache. With a volume of experiments involving a broad set of proxy-applications and benchmarks, we aim to reveal how HPC CPU performance will evolve, and conclude an average boost of 9.56× for cache-sensitive HPC applications, on a per-chip basis. Additionally, we exhaustively document our methodological exploration to motivate HPC centers to drive their own technological agenda through enhanced co-design.

Place, publisher, year, edition, pages
Association for Computing Machinery , 2023. Vol. 20, no 4, article id 57
Keywords [en]
3D-stacked memory, Emerging architecture study, gem5 simulation, proxy-applications
National Category
Computer Systems
Identifiers
URN: urn:nbn:se:kth:diva-342395DOI: 10.1145/3629520ISI: 001153375300012Scopus ID: 2-s2.0-85181487217OAI: oai:DiVA.org:kth-342395DiVA, id: diva2:1828907
Note

QC 20240118

Available from: 2024-01-17 Created: 2024-01-17 Last updated: 2024-02-26Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Podobas, Artur

Search in DiVA

By author/editor
Podobas, Artur
By organisation
Computational Science and Technology (CST)Software and Computer systems, SCS
In the same journal
ACM Transactions on Architecture and Code Optimization (TACO)
Computer Systems

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 63 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf