At the Locus of Performance: Quantifying the Effects of Copious 3D-Stacked Cache on HPC WorkloadsShow others and affiliations
2023 (English)In: ACM Transactions on Architecture and Code Optimization (TACO), ISSN 1544-3566, E-ISSN 1544-3973, Vol. 20, no 4, article id 57Article in journal (Refereed) Published
Abstract [en]
Over the last three decades, innovations in the memory subsystem were primarily targeted at overcoming the data movement bottleneck. In this paper, we focus on a specific market trend in memory technology: 3D-stacked memory and caches. We investigate the impact of extending the on-chip memory capabilities in future HPC-focused processors, particularly by 3D-stacked SRAM. First, we propose a method oblivious to the memory subsystem to gauge the upper-bound in performance improvements when data movement costs are eliminated. Then, using the gem5 simulator, we model two variants of a hypothetical LARge Cache processor (LARC), fabricated in 1.5 nm and enriched with high-capacity 3D-stacked cache. With a volume of experiments involving a broad set of proxy-applications and benchmarks, we aim to reveal how HPC CPU performance will evolve, and conclude an average boost of 9.56× for cache-sensitive HPC applications, on a per-chip basis. Additionally, we exhaustively document our methodological exploration to motivate HPC centers to drive their own technological agenda through enhanced co-design.
Place, publisher, year, edition, pages
Association for Computing Machinery , 2023. Vol. 20, no 4, article id 57
Keywords [en]
3D-stacked memory, Emerging architecture study, gem5 simulation, proxy-applications
National Category
Computer Systems
Identifiers
URN: urn:nbn:se:kth:diva-342395DOI: 10.1145/3629520ISI: 001153375300012Scopus ID: 2-s2.0-85181487217OAI: oai:DiVA.org:kth-342395DiVA, id: diva2:1828907
Note
QC 20240118
2024-01-172024-01-172024-02-26Bibliographically approved