Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Architectural Techniques for Improving Performance in Networks on Chip
KTH, School of Information and Communication Technology (ICT), Electronic Systems.
2011 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The main aim of this thesis is to propose enhancing techniques for the performance in Networks on Chips. In addition, a concrete proposal for a protocol stack within our NoC platform Nostrum is presented. Nostrum inherently supports both Best Effort as well as Guaranteed Throughput traffic delivery. It employs a deflective routing scheme for best effort traffic delivery that gives a small footprint of the switches in combination with robustness to disturbances in the network. For the traffic delivery with hard guarantees a TDMA based scheme is used. During the transmission process in a NoC several stages are involved. In the papers included, I propose a set of strategies to enhance the performance in several of these stages. The strategies are summarised as follows

Temporally Disjoint Networks is that a physical network, potentially, can be seen to contain a set of separate networks that a packet can enter dependenton when it enters the physical network. This has the consequence that wecould have different traffic types in the different networks.

Looped containers provide means to set up virtual circuits in networksusing deflective routing. High priority container packets are inserted intothe network to follow a predefined, closed, route between source and destination.At sender side the packets are loaded and sent to the destination where it is unloaded and sent back.

Proximity Congestion Awareness reduces the load of the network by diverting packets away from congested areas. It can increase the maximum trafficload by a factor of 20.

Dual Packet Exit increases the exit bandwidth of the network leading to a50 percent reduction in worst-case latency and a 30 percent reduction inaverage latency as well as a lowered buffer usage.

Priority Based Forced Requeue prematurely lifts out low priority packetsfrom the network to be requeued. Packets that have not yet entered the network compete with packets inside the network which gives tighter boundson admission with a reduction of worst case latencies by 50 percent.

Furthermore, Operational Efficiency is proposed as a measure to quantifyhow effective a network is and is defined as the throughput per buffers used in the system. An increase of the injection of packets into the network to increase the system throughput will have a cost associated to it and can be optimised to save energy.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2011. , xxiv, 103 p.
Series
Trita-ICT-ECS AVH, ISSN 1653-6363 ; 11:13
National Category
Communication Systems
Identifiers
URN: urn:nbn:se:kth:diva-48243ISBN: 978-91-7501-169-1 (print)OAI: oai:DiVA.org:kth-48243DiVA: diva2:457089
Public defence
2011-12-08, Sal D, KTH-Forum, Isafjordsgatan 39, Kista, 13:00 (English)
Opponent
Supervisors
Note
QC 20111124Available from: 2011-11-24 Created: 2011-11-16 Last updated: 2012-01-16Bibliographically approved
List of papers
1. Load Distribution with the Proximity Congestion Awareness in a Network on Chip
Open this publication in new window or tab >>Load Distribution with the Proximity Congestion Awareness in a Network on Chip
2003 (English)In: Design, Automation And Test In Europe Conference And Exhibition, Proceedings , LOS ALAMITOS, USA: IEEE COMPUTER SOC , 2003, 1126-1127 p.Conference paper, Published paper (Refereed)
Abstract [en]

In Networks on Chip, NoC, very low cost and high performance switches will be of critical importance. For a regular two-dimensional NoC we propose a very simple, memoryless switch. In case of congestion, packets are emitted in a non-ideal direction, also called deflective routing. To increase the maximum tolerable load of the network, we propose a Proximity Congestion Awareness, PCA, technique, where switches use load information of neighbouring switches, called stress values, for their own switching decisions, thus avoiding congested areas. We present simulation results with random traffic which show that the PCA technique can increase the maximum traffic load by a factor of over 20.

Place, publisher, year, edition, pages
LOS ALAMITOS, USA: IEEE COMPUTER SOC, 2003
National Category
Other Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-5696 (URN)10.1109/DATE.2003.1253765 (DOI)000182683800189 ()0-7695-1870-2 (ISBN)
Conference
Design, Automation and Test in Europe Conference and Exhibition (DATE 03), MUNICH, GERMANY, MAR 03-07, 2003
Note
QC 20101122Available from: 2006-05-11 Created: 2006-05-11 Last updated: 2011-11-24Bibliographically approved
2. Guaranteed bandwidth using looped containers in temporally disjoint networks within the Nostrum network on chip
Open this publication in new window or tab >>Guaranteed bandwidth using looped containers in temporally disjoint networks within the Nostrum network on chip
2004 (English)In: Design, Automation And Test In Europe Conference And Exhibition, Vols 1 And 2, Proceedings / [ed] Gielen G, Figueras J, LOS ALAMITOS, USA: IEEE COMPUTER SOC , 2004, 890-895 p.Conference paper, Published paper (Refereed)
Abstract [en]

In today's emerging Network-on-Chips, there is a need for different traffic classes with different Quality-of-Service guarantees. Within our NoC architecture Nostrum, we have implemented a service of Guaranteed Bandwidth (GB), and latency, in addition to the already existing service of Best-Effort (BE) packet delivery. The guaranteed bandwidth is accessed via Virtual Circuits (VC). The vcs are implemented using a combination of two concepts that we call 'Looped Containers' and 'Temporally Disjoint Networks'. The Looped Containers are used to guarantee access to the network - independently of the current network load without dropping packets; and the TDNS are used in order to achieve several VCs, plus ordinary BE traffic, in the network. The TDNS are a consequence of the deflective routing policy used, and gives rise to an explicit time-division-multiplexing within the network. To prove our concept an HDL implementation has been synthesised and simulated. The cost in terms of additional hardware needed, as well as additional bandwidth is very low - less than 2 percent in both cases! Simulations showed that ordinary BE traffic is practically unaffected by the VCs.

Place, publisher, year, edition, pages
LOS ALAMITOS, USA: IEEE COMPUTER SOC, 2004
Keyword
Guaranteed bandwidth (GB), Looped Containers, Network-on-chips, Temporally Disjoint Networks
National Category
Other Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-5698 (URN)10.1109/DATE.2004.1269001 (DOI)000189434000165 ()2-s2.0-3042740415 (Scopus ID)0-7695-2085-5 (ISBN)
Conference
Design, Automation and Test in Europe Conference and Exhibition (DATE 04), Paris, FRANCE, FEB 16-20, 2004
Note
QC 20101122. Titeln ändrad från "Guaranteed Throughput using Temporally Disjoint Networks in the Nostrum platform".Available from: 2006-05-11 Created: 2006-05-11 Last updated: 2011-11-24Bibliographically approved
3. The Nostrum Backbone: a Communication Protocol Stack for Networks Chip
Open this publication in new window or tab >>The Nostrum Backbone: a Communication Protocol Stack for Networks Chip
Show others...
2004 (English)In: 17th International Conference On Vlsi Design, Proceedings - Design Methodologies For The Gigascale Era, LOS ALAMITOS, USA: IEEE COMPUTER SOC , 2004, 693-696 p.Conference paper, Published paper (Refereed)
Abstract [en]

We propose a communication protocol stack to be used in Nostrum, our Network on Chip (NoC) architecture. In order to aid the designer in the selection process of what parts of protocols, and their respective facilities, to include, a layered approach to communication is taken. A nomenclature for describing the individual layers' interfaces and service definitions of the layers in the protocol stack is suggested,and used. The concept includes support for best effort traffic packet delivery as well as support for guaranteed bandwidth traffic, using virtual circuits. Furthermore an application to NoC adapter is defined, as part of the Resource to Network Interface, and is used to communicate between the Nostrum protocol stack and the application. An industrial example has been implemented, simulated, and the results justifies the suggested layered approach.

Place, publisher, year, edition, pages
LOS ALAMITOS, USA: IEEE COMPUTER SOC, 2004
Keyword
Adaptive control systems, Computer architecture, Interfaces (computer), Microprocessor chips, Network protocols, Packet networks, Programmable logic controllers, Virtual reality
National Category
Other Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-5697 (URN)000189438600102 ()2-s2.0-2342620693 (Scopus ID)0-7695-2072-3 (ISBN)
Conference
17th International Conference on VLSI Design, Mumbai, INDIA, JAN 05-09, 2004
Note

QC 20101122

Available from: 2006-05-11 Created: 2006-05-11 Last updated: 2014-12-11Bibliographically approved
4. A study of NoC Exit Strategies
Open this publication in new window or tab >>A study of NoC Exit Strategies
2007 (English)In: NOCS 2007: First International Symposium on Networks-on-Chip, Proceedings, 2007, 217-217 p.Conference paper, Published paper (Refereed)
Abstract [en]

The throughput of a network is limited due to several interacting components. Analysing simulation results made it clear that the component that was worth attacking was the exit bandwidth between the network and the connected resources. The obvious approach is to increase this bandwidth; the benefit is a higher throughput of the network and a significant lowering of the buffer requirements at the entry points of the network this because worst case scenarios now happens at a higher injection rate. The result we present shows significant differences in throughput as well as in average and worst case latency.

National Category
Computer Science
Identifiers
urn:nbn:se:kth:diva-39644 (URN)000246800500023 ()2-s2.0-36349018826 (Scopus ID)978-0-7695-2773-4 (ISBN)
Conference
First International Symposium on Networks-on-Chip; Princeton, NJ; 7 May 2007 through 9 May 2007
Note
QC 20110912Available from: 2011-09-12 Created: 2011-09-12 Last updated: 2011-11-24Bibliographically approved
5. Increasing NoC performance and utilisation using a Dual Packet Exit strategy
Open this publication in new window or tab >>Increasing NoC performance and utilisation using a Dual Packet Exit strategy
2007 (English)In: DSD 2007: 10TH EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN ARCHITECTURES, METHODS AND TOOLS, PROCEEDINGS / [ed] Kubatova, H, LOS ALAMITOS: IEEE COMPUTER SOC , 2007, 511-518 p.Conference paper, Published paper (Refereed)
Abstract [en]

When designing a network the use of buffers is inevitable. Buffers are used at the entry point, inside and at the exits of the network. The usage of these buffers significantly changes the performance of the system. as a whole. In order to enhance the buffer utilisation the concept of letting more than one packet exit the network at every switch each clock cycle is introduced - Dual Packet Exit (DPE). The approach is tried on a 4x4 and a 6x6 mesh. We demonstrate the buffers used in combination with different routing strategies for best effort performance. The result we present shows a 50% reduction in terms of worst case latency and a 30% reduction in terms of average latency as well as an increased throughput both from a system and network perspective. We define the term Operational Efficiency as a measure of the network efficiency and show that it increases by roughly 20 % with the DPE technique.

Place, publisher, year, edition, pages
LOS ALAMITOS: IEEE COMPUTER SOC, 2007
Keyword
Computer networks; Electric network topology; Switching circuits; Systems analysis
National Category
Computer and Information Science
Identifiers
urn:nbn:se:kth:diva-39435 (URN)10.1109/DSD.2007.4341516 (DOI)000251463100074 ()2-s2.0-47749149537 (Scopus ID)978-0-7695-2978-3 (ISBN)
Conference
10th Euromicro Conference on Digital System Design Architectures, Methods and Tools. Lubeck, GERMANY. AUG 29-31, 2007
Note
QC 20110912Available from: 2011-09-12 Created: 2011-09-09 Last updated: 2011-11-24Bibliographically approved
6. Priority Based Forced Requeue to Reduce Worst-Case Latencies for Bursty Traffic
Open this publication in new window or tab >>Priority Based Forced Requeue to Reduce Worst-Case Latencies for Bursty Traffic
2009 (English)In: DATE: 2009 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, 2009, 1070-1075 p.Conference paper, Published paper (Refereed)
Abstract [en]

In this paper we introduce Priority Based Forced Requeue to decrease worst-case latencies in NoCs offering best effort services. Forced Requeue is to prematurely lift out low priority packets from the network and requeue them outside using priority queues. The first benefit of this approach, applicable to any NoC offering best effort services, is that packets that have not yet entered the network now compete with packets inside the network and hence tighter bounds on admission times can be given. The second benefit - which is more specific to deflective routing as in the Nostrum NoC - is that packet "reshuffling" dramatically reduces the latency inside the network for bursty traffic due to a lowered risk of collisions at the exit of the network. This paper studies the Forced Requeuing on a mesh with varying burst sizes and traffic scenarios. The experimental results show a 50% reduction in worst-case latency from a system perspective thanks to a reshaped latency distribution whilst keeping the average latency the same.

Series
Design, Automation and Test in Europe Conference and Expo, ISSN 1530-1591
Keyword
Best effort services, Burst size, Bursty traffic, Low priorities, Priority queues, Priority-based, Risk perception
National Category
Computer and Information Science
Identifiers
urn:nbn:se:kth:diva-30353 (URN)000273246700191 ()2-s2.0-70350070728 (Scopus ID)978-1-4244-3781-8 (ISBN)
Conference
Design, Automation and Test in Europe Conference and Exhibition, Nice, FRANCE, APR 20-24, 2009
Note
QC 20110303Available from: 2011-03-03 Created: 2011-02-24 Last updated: 2011-11-24Bibliographically approved
7. A network on chip architecture and design methodology
Open this publication in new window or tab >>A network on chip architecture and design methodology
Show others...
2002 (English)In: VLSI 2002: IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI - NEW PARADIGMS FOR VLSI SYSTEMS DESIGN, IEEE conference proceedings, 2002, 105-112 p.Conference paper, Published paper (Refereed)
Abstract [en]

We propose a packet switched platform for single chip systems which scales well to an arbitrary number of processor like resources. The platform, which we call Network-on-Chip (NOC), includes both the architecture and the design methodology. The NOC architecture is a m x n mesh of switches and resources are placed on the slots formed by the switches. We assume a direct layout of the 2-D mesh of switches and resources providing physical- architectural level design integration. Each switch is connected to one resource and four neighboring switches, and each resource is connected to one switch. A resource can be a processor core, memory, an FPGA, a custom hardware block or any other intellectual property (LP) block, which fits into the available slot and complies with the interface of the NOC. The NOC architecture essentially is the onchip communication infrastructure comprising the physical layer, the data link layer and the network layer of the OSI protocol stack. We define the concept of a region, which occupies an area of any number of resources and switches. This concept allows the NOC to accommodate large resources such as large memory banks, FPGA areas, or special purpose computation resources such as high performance multiprocessors. The NOC design methodology consists of two phases. In the first phase a concrete architecture is derived from the general NOC template. The concrete architecture defines the number of switches and shape of the network, the kind and shape of regions and the number and kind of resources. The second phase maps the application onto the concrete architecture to form a concrete product.

Place, publisher, year, edition, pages
IEEE conference proceedings, 2002
Keyword
Communication switching, Computer architecture, Concrete, Design methodology, Field programmable gate arrays, Hardware, Network-on-a-chip, Packet switching, Shape, Switches
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-46709 (URN)10.1109/ISVLSI.2002.1016885 (DOI)000176274900019 ()0-7695-1486-3 (ISBN)
Conference
IEEE Computer Society Annual Symposium on VLSI, 2002
Note

© 2002 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. QC 20111115

Available from: 2011-11-15 Created: 2011-11-04 Last updated: 2012-11-21Bibliographically approved
8. Evaluating NoC communication backbones with simulation
Open this publication in new window or tab >>Evaluating NoC communication backbones with simulation
2003 (English)In: Proceedings of the 21th NorChip Conference, IEEE conference proceedings, 2003, 27-30 p.Conference paper, Published paper (Refereed)
Abstract [en]

This paper describes a Network on Chip simulatorthat was developed to evaluate our NoC architecture Nostrum.It is shown how SystemC’s features for communicationrefinement is used to make a highly flexible simulator.The simulator is reconfigurable so that it is possibleto try different NoC platforms and different mappingsof workloads. In addition to the modeling of our Nostrumarchitecture, a bus-based architecture is modeled aswell, and the performance for a simple workload modelis compared.

Place, publisher, year, edition, pages
IEEE conference proceedings, 2003
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-48900 (URN)
Conference
The IEEE NorChip Conference, Riga, Latvia, Nov 10-11, 2003
Note

QC 20111124

Available from: 2011-11-24 Created: 2011-11-24 Last updated: 2012-11-21Bibliographically approved

Open Access in DiVA

fulltext(3613 kB)945 downloads
File information
File name FULLTEXT02.pdfFile size 3613 kBChecksum SHA-512
cd53badc86e7dbb2adf2d1ca2290f53848090034011611a664934872f85c3c872b38570f43b5f4239acb8dcff9874d0f88d388f5222aa9a9359d9b08371986ab
Type fulltextMimetype application/pdf

Other links

http://web.it.kth.se/~axel/papers/2011/PhD-Thesis-Mikael-Millberg.pdf

Search in DiVA

By author/editor
Millberg, Mikael
By organisation
Electronic Systems
Communication Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 998 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 1058 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf