CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Efficient Memory Access and Synchronization in NoC-based Many-core Processors
KTH, School of Electrical Engineering and Computer Science (EECS), Electronics, Electronic and embedded systems.
2019 (English)Doctoral thesis, monograph (Other academic)
Abstract [en]

In NoC-based many-core processors, memory subsystem and synchronization mechanism are always the two important design aspects, since mining parallelism and pursuing higher performance require not only optimized memory management but also efficient synchronization mechanism. Therefore, we are motivated to research on efficient memory access and synchronization in three topics, namely, efficient on-chip memory organization, fair shared memory access, and efficient many-core synchronization.

One major way of optimizing the memory performance is constructing a suitable and efficient memory organization. A distributed memory organization is more suitable to NoC-based many-core processors, since it features good scalability. We envision that it is essential to support Distributed Shared Memory (DSM) because of the huge amount of legacy code and easy programming. Therefore, we first adopt the microcoded approach to address DSM issues, aiming for hardware performance but maintaining the flexibility of programs. Second, we further optimize the DSM performance by reducing the virtual-to-physical address translation overhead. In addition to the general-purpose memory organization such as DSM, there exists special-purpose memory organization to optimize the performance of application-specific memory access. We choose Fast Fourier Transform (FFT) as the target application, and propose a multi-bank data memory specialized for FFT computation.

In 3D NoC-based many-core processors, because processor cores and memories reside in different locations (center, corner, edge, etc.) of different layers, memory accesses behave differently due to their different communication distances. As the network size increases, the communication distance difference of memory accesses becomes larger, resulting in unfair memory access performance among different processor cores. This unfair memory access phenomenon may lead to high latencies of some memory accesses, thus negatively affecting the overall system performance. Therefore, we are motivated to study on-chip memory and DRAM access fairness in 3D NoC-based many-core processors through narrowing the round-trip latency difference of memory accesses as well as reducing the maximum memory access latency.

Barrier synchronization is used to synchronize the execution of parallel processor cores. Conventional barrier synchronization approaches such as master-slave, all-to-all, tree-based, and butterfly are algorithm oriented. As many processor cores are networked on a single chip, contended synchronization requests may cause large performance penalty. Motivated by this, different from the algorithm-based approaches, we choose another direction (i.e., exploiting efficient communication) to address the barrier synchronization problem. We propose cooperative communication as a means and combine it with the master-slave algorithm and the all-to-all algorithm to achieve efficient many-core barrier synchronization. Besides, a multi-FPGA implementation case study of fast many-core barrier synchronization is conducted.

Place, publisher, year, edition, pages
Stockholm, Sweden: KTH Royal Institute of Technology, 2019. , p. 164
Series
TRITA-EECS-AVL ; 2019:2
Keywords [en]
Many-core, Network-on-Chip, Distributed Shared Memory, Microcode, Virtual-to-physical Address Translation, Memory Access Fairness, Barrier Synchronization, Cooperative Communication
National Category
Embedded Systems Computer Systems
Research subject
Information and Communication Technology
Identifiers
URN: urn:nbn:se:kth:diva-240951ISBN: 978-91-7873-051-3 (print)OAI: oai:DiVA.org:kth-240951DiVA, id: diva2:1275389
Public defence
2019-02-01, Sal A, Electrum, Kungl Tekniska högskolan, Kistagången 16, Kista, Stockholm, 09:00 (English)
Opponent
Supervisors
Note

QC 20190107

Available from: 2019-01-07 Created: 2019-01-06 Last updated: 2019-01-07Bibliographically approved

Open Access in DiVA

Doctoral_Thesis_Xiaowen_Chen_20190106.pdf(12691 kB)88 downloads
File information
File name FULLTEXT01.pdfFile size 12691 kBChecksum SHA-512
919c8008c44f4947d8a52f4ee5f4f9527d471a61b6a10dba593af37a84c00f900c42fd424ad47497734a5812c4728e248883f854b03505d72e657d471f8f9145
Type fulltextMimetype application/pdf

Authority records BETA

Chen, Xiaowen

Search in DiVA

By author/editor
Chen, Xiaowen
By organisation
Electronic and embedded systems
Embedded SystemsComputer Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 88 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 548 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf