Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Architecture Support and Scalability Analysis of Memory Consistency Models in Network-on-Chip based Systems
KTH, School of Information and Communication Technology (ICT), Electronic Systems.
2013 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The shared memory systems should support parallelization at the computation (multi-core), communication (Network-on-Chip, NoC) and memory architecture levels to exploit the potential performance benefits. These parallel systems supporting shared memory abstraction both in the general purpose and application specific domains are confronting the critical issue of memory consistency. The memory consistency issue arises due to the unconstrained memory operations which leads to the unexpected behavior of shared memory systems. The memory consistency models enforce ordering constraints on the memory operations for the expected behavior of the shared memory systems. The intuitive Sequential Consistency (SC) model enforces strict ordering constraints on the memory operations and does not take advantage of the system optimizations both in the hardware and software. Alternatively, the relaxed memory consistency models relax the ordering constraints on the memory operations and exploit these optimizations to enhance the system performance at the reasonable cost. The purpose of this thesis is twofold. First, the novel architecture supports are provided for the different memory consistency models like: SC, Total Store Ordering (TSO), Partial Store Ordering (PSO), Weak Consistency (WC), Release Consistency (RC) and Protected Release Consistency (PRC) in the NoC-based multi-core (McNoC) systems. The PRC model is proposed as an extension of the RC model which provides additional reordering and relaxation in the memory operations. Second, the scalability analysis of these memory consistency models is performed in the McNoC systems.

The architecture supports for these different memory consistency models are provided in the McNoC platforms. Each configurable McNoC platform uses a packet-switched 2-D mesh NoC with deflection routing policy, distributed shared memory (DSM), distributed locks and customized processor interface. The memory consistency models/protocols are implemented in the customized processor interfaces which are developed to integrate the processors with the rest of the system. The realization schemes for the memory consistency models are based on a transaction counter and an an an address ddress ddress ddress ddress ddress ddress stack tacktack-based based based based based based novel approaches.approaches.approaches.approaches. approaches.approaches.approaches.approaches.approaches.approaches. The transaction counter is used in each node of the network to keep track of the outstanding memory operations issued by a processor in the system. The address stack is used in each node of the network to keep track of the addresses of the outstanding memory operations issued by a processor in the system. These hardware structures are used in the processor interface to enforce the required global orders under these different memory consistency models. The realization scheme of the PRC model in addition also uses acquire counter for further classification of the data operations as unprotected and protected operations.

The scalability analysis of these different memory consistency models is performed on the basis of different workloads which are developed and mapped on the various sized networks. The scalability study is conducted in the McNoC systems with 1 to 64-cores with various applications using different problem sizes and traffic patterns. The performance metrics like execution time, performance, speedup, overhead and efficiency are evaluated as a function of the network size. The experiments are conducted both with the synthetic and application workloads. The experimental results under different application workloads show that the average execution time under the relaxed memory consistency models decreases relative to the SC model. The specific numbers are highly sensitive to the application and depend on how well it matches to the architectures. This study shows the performance improvement under the relaxed memory consistency models over the SC model that is dependent on the computation-to-communication ratio, traffic patterns, data-to-synchronization ratio and the problem size. The performance improvement of the PRC and RC models over the SC model tends to be higher than 50% as observed in the experiments, when the system is further scaled up.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2013. , xviii, 143 p.
Series
Trita-ICT-ECS AVH, ISSN 1653-6363 ; 12:11
Keyword [en]
Memory consistency, Protected release consistency, Distributed shared memory; Network-on-Chip, Scalability
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
URN: urn:nbn:se:kth:diva-117700ISBN: 978-91-7501-617-7 (print)OAI: oai:DiVA.org:kth-117700DiVA: diva2:602640
Public defence
2013-03-13, Sal E, Forum, Isafjordsgatan 39, Kista, 09:00 (English)
Opponent
Supervisors
Note

QC 20130204

Available from: 2013-02-04 Created: 2013-02-02 Last updated: 2013-02-04Bibliographically approved
List of papers
1. Scalability of Weak Consistency in NoC based Multicore Architectures
Open this publication in new window or tab >>Scalability of Weak Consistency in NoC based Multicore Architectures
2010 (English)In: IEEE INT SYMP CIRC SYST PROC, New York: IEEE , 2010, 3497-3500 p.Conference paper, Published paper (Refereed)
Abstract [en]

In Multicore Network-on-Chip, it is preferable to realize distributed but shared memory (DSM) in order to reuse the huge amount of legacy code and easy programming. Within DSM systems, memory consistency is a critical issue since it affects not only performance but also the correctness of programs. In this paper, we investigate the scalability of the weak consistency model, which may be implemented using a transaction counter. The experimental results compare synchronization latencies for various network sizes, topologies and lock positions in the network. Average synchronization latency rises exponentially for mesh and torus topologies as the network size grows. However, torus improves the synchronization latency in comparison to mesh. For mesh topology network average synchronization latency is also slightly affected by the lock position with respect to the network center.

Place, publisher, year, edition, pages
New York: IEEE, 2010
Series
IEEE International Symposium on Circuits and Systems, ISSN 0271-4302
Keyword
Synchronization, Scalability, Memory consistency, Distributed shared memory
National Category
Computer and Information Science
Identifiers
urn:nbn:se:kth:diva-32140 (URN)10.1109/ISCAS.2010.5537833 (DOI)000287216003180 ()2-s2.0-77955996105 (Scopus ID)978-142445308-5 (ISBN)
Conference
International Symposium on Circuits and Systems Nano-Bio Circuit Fabrics and Systems (ISCAS 2010)
Note
QC 20110407Available from: 2011-04-07 Created: 2011-04-07 Last updated: 2013-02-04Bibliographically approved
2. Scalability of Relaxed Consistency Models in NoC based Multicore Architectures
Open this publication in new window or tab >>Scalability of Relaxed Consistency Models in NoC based Multicore Architectures
2009 (English)In: SIGARCH Computer Architecture News, ISSN 0163-5964, E-ISSN 1943-5851, Vol. 37, no 5, 8-15 p.Article in journal (Other academic) Published
Abstract [en]

This paper studies realization of relaxed memory consistency models in the network-on-chip based distributed shared memory (DSM) multi-core systems. Within DSM systems, memory consistency is a critical issue since it affects not only the performance but also the correctness of programs. We investigate the scalability of the relaxed consistency models (weak, release consistency) implemented by using transaction counters. Our experimental results compare the average and maximum code, synchronization and data latencies of the two consistency models for various network sizes with regular mesh topologies. The observed latencies rise for both the consistency models as the network size grows. However, the scaling behaviors are different. With the release consistency model these latencies grow significantly slower than with the weak  onsistency due to better optimization potential by means of overlapping, reordering and program order relaxations. The release consistency improves the performance by 15.6% and 26.5% on average in the code and consistency latencies over the weak consistency model for the specific application, as the system grows from single core to 64 cores. The latency of data transactions  rows 2.2 times faster on the average with a weak consistency model than with a release consistency model when the system scales from single core to 64 cores.

Place, publisher, year, edition, pages
ACM Press, 2009
Keyword
Synchronization, Scalability, Memory consistency, Distributed shared memory
National Category
Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-62118 (URN)10.1145/1755235.1755238 (DOI)
Projects
Mosart
Note

QC 20120126. QC 20160209

Available from: 2012-01-26 Created: 2012-01-18 Last updated: 2017-12-08
3. Realization and Performance Comparison of Sequential and Weak Memory Consistency Models in Network-on-Chip based Multi-core Systems
Open this publication in new window or tab >>Realization and Performance Comparison of Sequential and Weak Memory Consistency Models in Network-on-Chip based Multi-core Systems
2011 (English)In: Proceedings of 16th ACM/IEEE Asia and South Pacific Design Automation Conference(ASP-DAC) 2011, IEEE Press, 2011, 154-159 p.Conference paper, Published paper (Refereed)
Abstract [en]

This paper studies realization and performance comparison of the sequential and weak consistency models in the network-on-chip (NoC) based distributed shared memory (DSM) multi-ore systems. Memory consistency constrains the order of shared memory operations for the expected behavior of the multi-core systems. Both the consistency models are realized in the NoC based multi-core systems. The performance of the two consistency models are compared for various sizes of networks using regular mesh topologies and deflection routing algorithm. The results show that the weak consistency improves the performance by 46.17% and 33.76% on average in the code and consistency latencies over the sequential consistency model, due to relaxation in the program order, as the system grows from single core to 64 cores.

Place, publisher, year, edition, pages
IEEE Press, 2011
Keyword
NoC, consistency scalability, distributed shared memory, lock position, mesh topology, multicore architecture, network-on-chip, synchronization latency, torus topology, transaction counter, distributed shared memory systems, multiprocessor interconnection networks, network-on-chip, parallel architectures, performance evaluation, synchronisation, transaction processing
National Category
Embedded Systems
Identifiers
urn:nbn:se:kth:diva-62136 (URN)10.1109/ASPDAC.2011.5722176 (DOI)000299427300043 ()2-s2.0-79952920046 (Scopus ID)
Conference
16th ACM/IEEE Asia and South Pacific Design Automation Conference(ASP-DAC 2011)
Projects
Mosart
Note

© 2011 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. Qc 20120201

Available from: 2012-02-01 Created: 2012-01-18 Last updated: 2013-02-04Bibliographically approved
4. Memory Architecture and Management in an NoC Platform
Open this publication in new window or tab >>Memory Architecture and Management in an NoC Platform
Show others...
2011 (English)In: Scalable Multi-core Architectures: Design Methodologies and Tools / [ed] Axel Jantsch and Dimitrios Soudris, Springer, 2011, 1, 3-28 p.Chapter in book (Refereed)
Abstract [en]

The memory organization and the management of the memory space is a critical part of every NoC based platform design. We propose a Data Management Engine (DME), that is a block of programmable hardware and part of every processing element. It off-loads the processing element (CPU, DSP, etc.) by managing the memory space, memory access and the communication over the on-chip network. The DME’s main functions are virtual address translation, private and shared memory management, cache coherence protocol, support for memory consistency models, synchronization and protection mechanisms for shared memory communication. The DME is fully programmable and configurable thus allowing for customized support for high level data management functions such as dynamic memory allocation and abstract data types. This chapter describes the main concepts, design and functionality of the DME and presents case studies illustrating its usage and performance.

Place, publisher, year, edition, pages
Springer, 2011 Edition: 1
Keyword
Network on Chip, SoC Architecture, Memory Organization
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-62158 (URN)9781441967770 (ISBN)
Projects
MOSART
Note
QC 20120110Available from: 2012-01-20 Created: 2012-01-18 Last updated: 2013-02-04Bibliographically approved
5. Realization and Scalability of Release and Protected Release Consistency Models in NoC based Systems
Open this publication in new window or tab >>Realization and Scalability of Release and Protected Release Consistency Models in NoC based Systems
2011 (English)In: Proceeding of 14th Euromicro Conference on Digital System Design, 2011, Oulu: IEEE Computer Society, 2011, 47-54 p.Conference paper, Published paper (Refereed)
Abstract [en]

This paper studies the realization and scalability of release and protected release consistency models in Network-on-Chip (NoC) based Distributed Shared Memory (DSM) multi-core systems. The protected release consistency (PRC) model is proposed as an extension of the release consistency (RC) model and provides further relaxation in the shared memory operations. The realization schemes of RC and PRC models use a transaction counter in each node of the NoC based multi-core (McNoC) systems. Further, we study the scalability of these RC and PRC models and evaluate their performance in the McNoC platform. A configurable NoC based platform with 2D mesh topology and deflection routing algorithm is used in the tests. We experiment both with synthetic and application workloads. The performance of the RC and PRC models are compared using sequential consistency (SC) as the baseline. The experiments show that the average code execution time for the PRC model in 8x8 network (64 cores) is reduced by 30.5% over SC, and by 6.5% over RC model. Average data execution time in the 8x8 network for the PRC model is reduced by almost 37% over SC and by 8.8% over RC. The increase in area for the PRC of RC is about 880 gates in the network interface ( 1.7% ).

Place, publisher, year, edition, pages
Oulu: IEEE Computer Society, 2011
Keyword
Network-on-Chip, Distributed shared memory, Memory consistency;, Protected release consistency, Scalability
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-62185 (URN)10.1109/DSD.2011.11 (DOI)2-s2.0-80054986781 (Scopus ID)978-1-4577-1048-3 (ISBN)
Conference
14th Euromicro Conference on Digital System Design, (DSD 2011)
Note
© 2011 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. QC 20120201Available from: 2012-02-01 Created: 2012-01-18 Last updated: 2013-02-04Bibliographically approved
6. Architecture Support and Comparison of Three Memory Consistency Models in NoC based Syst
Open this publication in new window or tab >>Architecture Support and Comparison of Three Memory Consistency Models in NoC based Syst
2012 (English)In: Proceedings of 15th EUROMICRO Conference on Digital System Design: Architectures, Methods and Tools (DSD 2012), IEEE Computer Society, 2012, 304-311 p.Conference paper, Published paper (Refereed)
Abstract [en]

We propose a novel hardware support for three relaxed memory models, Release Consistency (RC), Partial Store Ordering (PSO) and Total Store Ordering (TSO) in Network-on-Chip (NoC) based distributed shared memory multicore systems. The RC model is realized by using a Transaction Counter and an Address Stack based approach while the PSO and TSO models are realized by using a Write Transaction Counter and a Write Address Stack based approach. In the experiments, we use a configurable platform based on a 2D mesh NoC using deflection routing policy. The results show that under synthetic workloads, the average execution time for the RC, PSO and TSO models in 8x8 network (64 cores) is reduced by 35.8%, 22.7% and 16.5% respectively, over the Sequential Consistency (SC) model. The average speedup for the RC, PSO and TSO models in the 8x8 network under different application workloads is increased by 34.3%, 10.6% and 8.9%, respectively, over the SC model. The area cost for the TSO, PSO and RC models is increased by less than 2% over the SC model at the interface to the processor.

Place, publisher, year, edition, pages
IEEE Computer Society, 2012
Keyword
Memory consistency; Release consistency; Scalability; Distributed shared memory; Network-on-Chip
National Category
Computer Systems Embedded Systems
Identifiers
urn:nbn:se:kth:diva-95668 (URN)10.1109/DSD.2012.27 (DOI)2-s2.0-84872916345 (Scopus ID)978-076954798-5 (ISBN)
Conference
15th EUROMICRO Conference on Digital System Design:Architectures, Methods and Tools (DSD 2012), September 5-8, 2012, Cesme, Izmir, Turkey
Note

QC 20121016

Available from: 2012-10-16 Created: 2012-05-28 Last updated: 2013-10-15Bibliographically approved
7. Scalability analysis of release and sequential consistency models in NoC based multicore systems
Open this publication in new window or tab >>Scalability analysis of release and sequential consistency models in NoC based multicore systems
2012 (English)In: 2012 International Symposium on System on Chip, SoC 2012, IEEE , 2012, 6376350- p.Conference paper, Published paper (Refereed)
Abstract [en]

We analyze the scalability of the Release Consistency (RC) and Sequential Consistency (SC) models which are realized in the Network-on-Chip (NoC) based distributed shared memory multicore systems. The analysis is performed on the basis of workloads mapped on the different sizes of networks with different data sets. The experiments use a configurable platform based on a 2D mesh NoC using deflection routing algorithm. The results show that under the synthetic workloads using different distributed locks, the performance of the RC model is increased by 17.6% to 54.6% over the SC model in the 64-cores system. For the application workloads, as the network size grows from 1 to 64 cores, the execution time under the RC model decreases relative to the SC model which depends on the application and its match to the architecture. The performance improvement of the RC model over the SC model tends to be higher than 50% observed in the experiments, when the system is further scaled up.

Place, publisher, year, edition, pages
IEEE, 2012
Keyword
Distributed shared memory, Memory consistency, Network-on-Chip, Release consistency, Scalability
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-115478 (URN)10.1109/ISSoC.2012.6376350 (DOI)2-s2.0-84871995990 (Scopus ID)978-146732895-1 (ISBN)
Conference
2012 International Symposium on System on Chip, SoC 2012, 10 October 2012 through 12 October 2012, Tampere
Note

QC 20130116

Available from: 2013-01-16 Created: 2013-01-15 Last updated: 2013-02-04Bibliographically approved
8. Scalability Analysis of Memory Consistency Models in NoC-based Distributed Shared Memory SoCs
Open this publication in new window or tab >>Scalability Analysis of Memory Consistency Models in NoC-based Distributed Shared Memory SoCs
2013 (English)In: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, ISSN 0278-0070, E-ISSN 1937-4151, Vol. 32, no 5, 760-773 p.Article in journal (Refereed) Published
Abstract [en]

We analyze the scalability of six memory consistency models in network-on-chip (NoC)-based distributed shared memory multicore systems: 1) protected release consistency (PRC); 2) release consistency (RC); 3) weak consistency (WC); 4) partial store ordering (PSO); 5) total store ordering (TSO); and 6) sequential consistency (SC). Their realizations are based on a transaction counter and an address-stack-based approach. The scalability analysis is based on different workloads mapped on various sizes of networks using different problem sizes. For the experiments, we use Nostrum NoC-based configurable multicore platform with a 2-D mesh topology and a deflection routing algorithm. Under the synthetic workloads, the average execution time for the PRC, RC, WC, PSO, and TSO models in the 8 x 8 network (64-cores) is reduced by 32.3%, 28.3%, 20.1%, 13.8%, and 9.9% over the SC model, respectively. For the application workloads, as the network size grows, the average execution time under these relaxed memory models decreases with respect to the SC model depending on the application and its match to the architecture. The performance improvement of the PRC and RC models over the SC model tends to be higher than 50% as observed in the experiments, when the system is further scaled up. The area cost in the network interface for the relaxed memory models is increased by less than 4% over the SC model.

Keyword
Distributed shared memory, memory consistency, network-on-chip, performance, scalability
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-117701 (URN)10.1109/TCAD.2012.2235914 (DOI)000318163800009 ()2-s2.0-84876750369 (Scopus ID)
Note

QC 20130603. Updated from accepted to published.

Available from: 2013-02-02 Created: 2013-02-02 Last updated: 2017-12-06Bibliographically approved

Open Access in DiVA

Final_Ph.D. thesis_AbdulNaeem(2837 kB)995 downloads
File information
File name FULLTEXT01.pdfFile size 2837 kBChecksum SHA-512
5677a13cc35da49c072e164bdda612962daf0f881ab6c13cd7343cf5299ae2a262221b75feeba786e569b5cdfa10a7f9f15c2b559453560858f9f0c2c63001f7
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Naeem, Abdul
By organisation
Electronic Systems
Electrical Engineering, Electronic Engineering, Information Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 995 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 464 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf