kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Benchmarking the Nvidia GPU Lineage: From Early K80 to Modern A100 with Asynchronous Memory Transfers
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).ORCID iD: 0000-0001-6408-3333
KTH.
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).ORCID iD: 0000-0002-5020-1631
Show others and affiliations
2021 (English)In: ACM International Conference Proceeding Series, Association for Computing Machinery (ACM) , 2021Conference paper, Published paper (Refereed)
Abstract [en]

For many, Graphics Processing Units (GPUs) provides a source of reliable computing power. Recently, Nvidia introduced its 9th generation HPC-grade GPUs, the Ampere 100 (A100), claiming significant performance improvements over previous generations, particularly for AI-workloads, as well as introducing new architectural features such as asynchronous data movement. But how well does the A100 perform on non-AI benchmarks, and can we expect the A100 to deliver the application improvements we have grown used to with previous GPU generations? In this paper, we benchmark the A100 GPU and compare it to four previous generations of GPUs, with a particular focus on empirically quantifying our derived performance expectations. We find that the A100 delivers less performance increase than previous generations for the well-known Rodinia benchmark suite; we show that some of these performance anomalies can be remedied through clever use of the new data-movement features, which we microbenchmark and demonstrate where (and more importantly, how) they should be used.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM) , 2021.
Keywords [en]
Benchmarking, Computer graphics, Program processors, Architectural features, Asynchronous data, Benchmark suites, Data movements, Micro-benchmark, Performance anomaly, Performance expectations, Reliable computing, Graphics processing unit
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-310411DOI: 10.1145/3468044.3468053Scopus ID: 2-s2.0-85109396133OAI: oai:DiVA.org:kth-310411DiVA, id: diva2:1648591
Conference
11th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies, HEART 2021, 21 June 2021 through 23 June 2021,Online, Germany.
Note

Part of proceedings ISBN: 978-1-4503-8549-7

QC 20220331

Available from: 2022-03-31 Created: 2022-03-31 Last updated: 2024-03-15Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Svedin, MartinChien, Wei DerChikafa, GibsonJansson, NiclasPodobas, Artur

Search in DiVA

By author/editor
Svedin, MartinChien, Wei DerChikafa, GibsonJansson, NiclasPodobas, Artur
By organisation
Computational Science and Technology (CST)KTH
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 105 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf