Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
INPG: Accelerating Critical Section Access with In-network Packet Generation for NoC Based Many-Cores
KTH, School of Engineering Sciences (SCI), Physics.ORCID iD: 0000-0001-9448-5595
2018 (English)In: 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA), IEEE Computer Society, 2018, p. 15-26Conference paper, Published paper (Refereed)
Abstract [en]

As recently studied, serialized competition overhead for entering critical section is more dominant than critical section execution itself in limiting performance of multi-threaded shared variable applications on NoC-based many-cores. We illustrate that the invalidation-acknowledgement delay for cache coherency between the home node storing the critical section lock and the cores running competing threads is the leading factor to high competition overhead in lock spinning, which is realized in various spin-lock primitives (such as the ticket lock, ABQL, MCS lock, etc.) and the spinning phase of queue spin-lock (QSL) in advanced operating systems. To reduce such high lock coherence overhead, we propose in-network packet generation (iNPG) to turn passive 'normal' NoC routers which only transmit packets into active 'big' ones that can generate packets. Instead of performing all coherence maintenance at the home node, big routers which are deployed nearer to competing threads can generate packets to perform early invalidation-acknowledgement for failing threads before their requests reach the home node, shortening the protocol round-trip delay and thus significantly reducing competition overhead in various locking primitives. We evaluate iNPG in Gem5 using PARSEC and SPEC OMP2012 programs with five different locking primitives. Compared to a state-of-the-art technique accelerating critical section access, experimental results show that iNPG can effectively reduce lock coherence overhead, expediting critical section access by 1.35x on average and 2.03x at maximum and consequently improving the program Region-of-Interest (ROI) runtime by 7.8% on average and 14.7% at maximum.

Place, publisher, year, edition, pages
IEEE Computer Society, 2018. p. 15-26
Series
International Symposium on High-Performance Computer Architecture-Proceedings, ISSN 1530-0897
Keywords [en]
Cache Coherency, CMP, Critical Section, In Network Packet Generation, Network on Chip, Synchronisation Primitive
National Category
Communication Systems
Identifiers
URN: urn:nbn:se:kth:diva-228571DOI: 10.1109/HPCA.2018.00012ISI: 000440297700002Scopus ID: 2-s2.0-85046805697ISBN: 9781538636596 (print)OAI: oai:DiVA.org:kth-228571DiVA, id: diva2:1210452
Conference
24th IEEE International Symposium on High Performance Computer Architecture, HPCA 2018, Hotel Pyramide Congress Center, Vienna, Austria, 24 February 2018 through 28 February 2018
Note

QC 20180528

Available from: 2018-05-28 Created: 2018-05-28 Last updated: 2018-08-16Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records BETA

Yao, YuanLu, Zhonghai

Search in DiVA

By author/editor
Yao, YuanLu, Zhonghai
By organisation
PhysicsKTH
Communication Systems

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 31 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf