Change search
Refine search result
1234 101 - 150 of 189
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the 'Create feeds' function.
  • 101.
    Lu, Zhonghai
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Salminen, Erno
    Grecu, Cristian
    Network-on Chip Micro-Benchmarks2008In: Embedded Systems Design, no SeptemberArticle in journal (Refereed)
    Abstract [en]

    The rapid development of Network-on-Chip (NoC) calls for a systematic approach to evaluate and fairly compare various NoC architectures. In this specification, we define a generic NoC architecture, a comprehensive set of synthetic workloads as micro-benchmarks, workload scenarios and evaluation criteria. These micro-benchmarks enable measuring particular properties of NoC architectures, complementing application benchmarks.

  • 102.
    Lu, Zhonghai
    et al.
    KTH, School of Information and Communication Technology (ICT), Microelectronics and Information Technology, IMIT.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Microelectronics and Information Technology, IMIT.
    Sander, Ingo
    KTH, School of Information and Communication Technology (ICT), Microelectronics and Information Technology, IMIT.
    Feasibility analysis of messages for on-chip networks using wormhole routing2005In: PROCEEDINGS OF THE ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, VOLS 1 AND 2, IEEE conference proceedings, 2005, p. 960-964Conference paper (Refereed)
    Abstract [en]

    The feasibility of a message in a network concerns if its timing property can be satisfied without jeopardizing any messages already in the network to meet their timing properties. We present a novel feasibility analysis for real-time (RT) and non-realtime (NT) messages in wormhole-routed networks on chip. For RT messages, we formulate a contention tree that captures contentions in the network. For coexisting RT and NT messages, we propose a simple bandwidth partitioning method that allows us to analyze their feasibility independently.

  • 103.
    Lu, Zhonghai
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Liu, Ming
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Layered switching for networks on chip2007In: 2007 44th ACM/IEEE Design Automation Conference, Vols 1 And 2, 2007, p. 122-127Conference paper (Refereed)
    Abstract [en]

    We present and evaluate a novel switching mechanism called layered switching. Conceptually, the layered switching implements wormhole on top of virtual cut-through switching. To show the feasibility of layered switching, as well as to confirm its advantages, we conducted an RTL implementation study based on a canonical wormhole architecture. Synthesis results show that our strategy suggests negligible degradation in hardware speed (1%) and area overhead (7%). Simulation results demonstrate that it achieves higher throughput than wormhole alone while significantly reducing the buffer space required at network nodes when compared with virtual cut-through.

  • 104.
    Lu, Zhonghai
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Millberg, Mikael
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Bruce, Alistair
    van der Wolf, Pieter
    Henriksson, Tomas
    Flow Regulation for On-Chip Communication2009In: DATE: 2009 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, 2009, p. 578-581Conference paper (Refereed)
    Abstract [en]

    We propose (sigma, rho)-based flow regulation as a design instrument for System-on-Chip (SoC) architects to control quality-of-service and achieve cost-effective communication, where sigma bounds the traffic burstiness and rho the traffic rate. This regulation changes the burstiness and timing of traffic flows, and can be used to decrease delay and reduce buffer requirements in the SoC infrastructure. In this paper, we define and analyze the regulation spectrum, which bounds the upper and lower limits of regulation. Experiments on a Network-on-Chip (NoC) with guaranteed service demonstrate the benefits of regulation We conclude that flow regulation may exert significant positive impact on communication performance and buffer requirements.

  • 105.
    Lu, Zhonghai
    et al.
    KTH, Superseded Departments, Electronic Systems Design.
    Sander, Ingo
    KTH, Superseded Departments, Electronic Systems Design.
    Jantsch, Axel
    KTH, Superseded Departments, Electronic Systems Design.
    A case study of hardware and software synthesis in ForSyDe2002In: Proceedings of the 15th International Symposium on System Synthesis, 2002Conference paper (Refereed)
  • 106.
    Lu, Zhonghai
    et al.
    KTH, School of Information and Communication Technology (ICT), Microelectronics and Information Technology, IMIT.
    Sander, Ingo
    KTH, School of Information and Communication Technology (ICT), Microelectronics and Information Technology, IMIT.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Microelectronics and Information Technology, IMIT.
    Refinement of A Perfectly Synchronous Communication Model onto Nostrum NoC Best-Effort Communication Service2005In: Proceedings of the Forum on Design Languages, 2005Conference paper (Refereed)
  • 107.
    Lu, Zhonghai
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Sander, Ingo
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Refining synchronous communication onto network-on-chip best-effort services2006In: Applications of Specification and Design Languages for SoCs / [ed] Vachoux, A., DORDRECHT: Springer , 2006, p. 23-38Conference paper (Refereed)
    Abstract [en]

    We present a novel approach to refine a system model specified with perfectly synchronous communication onto a network-on-chip (NoC) best-effort communication service. It is a top-down procedure with three steps, namely, channel refinement, process refinement, and communication mapping. In channel refinement, synchronous channels are replaced with stochastic channels abstracting the best-effort service. In process refinement, processes are refined in terms of interfaces and synchronization properties. Particularly, we use synchronizers to maintain local synchronization of processes and thus achieve synchronization consistency, which is a key requirement while mapping a synchronous model onto an asynchronous architecture. Within communication mapping, the refined processes and channels are mapped to an NoC architecture. Adopting the Nostrum NoC platform as target architecture, we use a digital equalizer as a tutorial example to illustrate the feasibility of our concepts.

  • 108.
    Lu, Zhonghai
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Sander, Ingo
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Towards performance-oriented pattern-based refinement of synchronous models onto NoC communication2006In: DSD 2006: 9th EUROMICRO Conference on Digital System Design: Architectures, Methods and Tools, Proceedings / [ed] Muthukumar V, 2006, p. 37-44Conference paper (Refereed)
    Abstract [en]

    We present a performance-oriented refinement approach that refines a perfectly synchronous communication model onto Network-on-Chip (NoC) communication. We first identify four basic forms of NoC process interaction patterns at the process level, namely, producer-consumer, peers, client-server and multicast. We propose a three-step top-down refinement method: channel refinement, protocol refinement and channel mapping. For the producer-consumer pattern, we describe it in detail. In channel refinement, we deal with interfacing multiple clock domains and use a stochastic process to model channel delay and jitter In protocol refinement, we show how to refine communication towards application requirements such as reliability and throughput. In channel mapping, we discuss channel convergence and channel merge arising from channel overlapping. All the refinements have been conducted and validated as an integral design phase towards implementation in ForSyDe, a formal system-level design methodology based on a synchronous model of computation.

  • 109.
    Lu, Zhonghai
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Sicking, Jonas
    KTH.
    Sander, Ingo
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Using synchronizers for refining synchronous communication onto Hardware/Software architectures2007In: RSP 2007: 18th IEEE/IFIP International Workshop on Rapid System Prototyping, Proceedings, IEEE Computer Society, 2007, p. 143-149Conference paper (Refereed)
    Abstract [en]

    We have presented a formal set of synchronization components called synchronizers for refining synchronous communication onto HW/SW codesign architectures. Such an architecture imposes asynchronous communication between HW-HW SW-SW and HW-SW components. The synchronizers enable local synchronization, thus satisfy the synchronization requirement of a typical IP core. In this paper we present their implementations in HW, SW and HW/SW as well as their application. To validate our concepts, we conduct a case study on a Nios FPGA that comprises a processor memory and custom logic. The final HW/SW implementation achieves equivalent performance to pure HW implementation. Our prototyping experience suggests that the synchronizers can be standardized as library modules and effectively separate the design of computation from that of communication.

  • 110.
    Lu, Zhonghai
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Thid, Rikard
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Millberg, Mikael
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Nilsson, Erland
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    NNSE: Nostrum Network-on-Chip Simulation Environment2005In: Proceedings of Swedish System-on-Chip Conference, Stockholm, Sweden, April 2005., 2005Conference paper (Other academic)
    Abstract [en]

    A main challenge for Network-on-Chip (NoC) design isto select a network architecture that suits a particular application.NNSE enables to analyze the performance impactof NoC configuration parameters. It allows one to(1) configure a network with respect to topology, flow controland routing algorithm etc.; (2) configure various regularand application specific traffic patterns; (3) evaluatethe network with the traffic patterns in terms of latency and throughput.

  • 111.
    Lu, Zhonghai
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Tong, Li
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Yin, Bei
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    A power efficient flit-admission scheme for wormhole-switched networks on chip2005In: WMSCI 2005: 9th World Multi-Conference on Systemics, Cybernetics and Informatics, Vol 4 / [ed] Callaos, N; Lesso, W; Palesi, M, 2005, p. 25-30Conference paper (Refereed)
    Abstract [en]

    Reducing power consumption is a main challenge when adopting a network as a global on-chip communication interconnect since the reduction in power dissipation should not at the expense of degrading the system performance. We investigate power in a wormhole-switched network with focus on the impact of flit-admission schemes, i.e., when and how the flits of packets are admitted into the network We have proposed a novel flit-admission scheme that shows significant shrink of the switch complexity while maintaining equivalent network performance. This paper investigates its influence in network power involving both switches and links. We conduct experiments on a 2D mesh network. The results show that our flit-admission scheme achieves significant power and area reduction without performance penalty. To our knowledge, our work is the first study of power dissipation on flit admission schemes.

  • 112.
    Lu, Zhonghai
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Wang, Yi
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Dynamic flow regulation for IP integration on network-on-chip2012In: Proceedings of the 2012 6th IEEE/ACM International Symposium on Networks-on-Chip, NoCS 2012, IEEE , 2012, p. 115-123Conference paper (Refereed)
    Abstract [en]

    Flow regulation is a traffic shaping technique, which can be used to achieve communication performance guarantees with low buffering cost when integrating IPs to network-on-chip architectures. This paper presents dynamic flow regulation, which overcomes the rigidity of static flow regulation that pre-configures regulation parameters statically and only once. The dynamic regulation is made possible by employing a sliding window based online flow (σ, ρ) characterization technique, where σ bounds traffic burstiness and ρ reflects the average rate. The characterization method is effective and can be implemented in hardware with small area and high speed. The resulting dynamic regulation can adaptively adjust the traffic regulation strength in response to real traffic workload scenarios. As such, it makes more efficient use of the system interconnect resources, leading to significant improvement in network performance.

  • 113.
    Lu, Zhonghai
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Xia, Lei
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Cluster-based simulated annealing for mapping cores onto 2D mesh networks on chip2008In: 2008 IEEE Workshop On Design And Diagnostics Of Electronic Circuits And Systems, Proceedings / [ed] Straube, B; Drutarovsky, M; Renovell, M; Gramata, P; Fischerova, M, 2008, p. 92-97Conference paper (Refereed)
    Abstract [en]

    In Network-on-Chip (NoC) application design, core-to-node mapping is an important but intractable optimization problem. In the paper, we use simulated annealing to tackle the mapping problem in 2D mesh NoCs. In particular, we combine a clustering technique with the simulated annealing to speed up the convergence to near-optimal solutions. The clustering exploits the connectivity and distance relation in the network architecture as well as the locality and bandwidth requirements in the core communication graph. The annealing is cluster-aware and may be dynamically constrained within clusters. Our experiments suggest that simulated annealing can be effectively used to solve the mapping problem with a scalable size, and the combined strategy improves over the simulated annealing in execution time by up to 30% without compromising the quality of solutions.

  • 114.
    Lu, Zhonghai
    et al.
    KTH, School of Information and Communication Technology (ICT), Elektronics, Electronic and embedded systems.
    Yao, Yuan
    KTH, School of Information and Communication Technology (ICT), Elektronics, Electronic and embedded systems.
    Aggregate flow-based performance fairness in CMPs2016In: ACM Transactions on Architecture and Code Optimization (TACO), ISSN 1544-3566, E-ISSN 1544-3973, Vol. 13, no 4, article id 53Article in journal (Refereed)
    Abstract [en]

    In CMPs, multiple co-executing applications create mutual interference when sharing the underlying network-on-chip architecture. Such interference causes different performance slowdowns to different applications. To mitigate the unfairness problem, we treat traffic initiated from the same thread as an aggregate flow such that causal request/reply packet sequences can be allocated to resources consistently and fairly according to online profiled traffic injection rates. Our solution comprises three coherent mechanisms from rate profiling, rate inheritance, and rate-proportional channel scheduling to facilitate and realize unbiased workload-adaptive resource allocation. Full-system evaluations in GEM5 demonstrate that, compared to classic packet-centric and latest application-prioritization approaches, our approach significantly improves weighted speed-up for all multi-application mixtures and achieves nearly ideal performance fairness.

  • 115.
    Lu, Zhonghai
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Yao, Yuan
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Dynamic Traffic Regulation in NoC-Based Systems2017In: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 25, no 2, p. 556-569Article in journal (Refereed)
    Abstract [en]

    In network-on-chip (NoC)-based systems, performance enhancement has primarily focused on the network itself, with little attention paid on controlling traffic injection at the network boundary. This is unsatisfactory because traffic may be over injected, aggravating congestion, and lowering performance. Recently, traffic regulation is proposed as an orthogonal means for performance improvement. Rather than as soon as possible admission, traffic regulation may hold back packet injection by admitting packets into the network only when the accumulated traffic volume at any time interval does not exceed a threshold. These regulation techniques are, however, often static, likely causing overregulation and underregulation. We propose dynamic traffic regulation to improve the system performance for NoC-based multi/many-processor systemson- chip (MPSoC) and chip multi/many-core processor (CMP) designs. It can be applied to MPSoCs for intellectual property integration in an open-loop fashion by injecting traffic according to its run-time profiled characteristics. It can also be applied to CMPs in a closed-loop fashion by admitting traffic fully adaptive to the traffic and network states. Through extensive experiments and results, we show that both the open-loop and closed-loop dynamic regulation techniques can significantly improve the network and system performance.

  • 116.
    Lu, Zhonghai
    et al.
    KTH, Superseded Departments (pre-2005), Electronic Systems Design.
    Yao, Yuan
    KTH, School of Engineering Sciences (SCI), Physics.
    Marginal Performance: Formalizing and Quantifying Power Over/Under Provisioning in NoC DVFS2017In: I.E.E.E. transactions on computers (Print), ISSN 0018-9340, E-ISSN 1557-9956, Vol. 66, no 11, p. 1903-1917Article in journal (Refereed)
  • 117.
    Lu, Zhonghai
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Yao, Yuan
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Jiang, Y.
    Towards stochastic delay bound analysis for network-on-chip2015In: Proceedings - 2014 8th IEEE/ACM International Symposium on Networks-on-Chip, NoCS 2014, 2015, p. 64-71Conference paper (Refereed)
    Abstract [en]

    We propose stochastic performance analysis in order to provide probabilistic quality-of-service guarantees in on-chip packet-switching networks. In contrast to deterministic analysis which gives per-flow absolute delay bound, stochastic analysis derives per-flow probabilistic delay bounding function, which can be used to avoid over-dimensioning network resources. Based on stochastic network calculus, we build a basic analytic model for an on-chip router, propose and exemplify a stochastic performance analysis flow. In experiments, we show the correctness and accuracy of our analysis, and exhibit its potential in enhancing network utilization with a relaxed delay requirement. Moreover, the benefits of such relaxation is demonstrated through a video playback application.

  • 118.
    Lu, Zhonghai
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Yin, Bei
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Connection-oriented multicasting in wormhole-switched networks on chip2006In: IEEE Computer Society Annual Symposium on VLSI, Proceedings - EMERGING VLSI TECHNOLOGIES AND ARCHITECTURES, 2006, p. 205-210Conference paper (Refereed)
    Abstract [en]

    Network-on-Chip (NoC) proposes networks to replace buses as a scalable global communication interconnect for future SoC designs. However, a bus is very efficient in broadcasting. As the system size scales up to explore the chip capacity, broadcasting in NoCs must be efficiently supported. This paper presents a novel multicast scheme in wormhole-switched NoCs. By this scheme, a multicast procedure consists of establishment, communication and release phase. A multicast group can request to reserve virtual channels during establishment and has priority on arbitration of link bandwidth. This multicasting method has been effectively implemented in a mesh network with dead-lock freedom. Our experiments show that the multicast technique improves throughput, and does not exhibit significant impact on unicast performance in a network with mixed unicast and multicast traffic if the network is not saturated.

  • 119.
    Lu, Zhonghai
    et al.
    KTH, Superseded Departments (pre-2005), Electronic Systems Design.
    Zhao, Xueqian
    xMAS-Based QoS Analysis Methodology2018In: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, ISSN 0278-0070, E-ISSN 1937-4151, Vol. 37, no 2, p. 364-377Article in journal (Refereed)
    Abstract [en]

    On-chip communication system design starting from a high-level model can facilitate formal verification of system properties, such as safety and deadlock freedom. Yet, analyzing its quality-of-service (QoS) property, in our context, per-flow delay bound, is an open challenge. Based on executable micro-architectural specification (xMAS) which is a formal framework modeling communication fabrics, we first present how to model a classic input-queuing virtual channel router using the xMAS primitives and then a QoS analysis methodology using network calculus (NC). Thanks to the precise semantics of the xMAS primitives, the router can be modeled in different variants, which cannot be otherwise captured by normal ad hoc box diagrams. The analysis methodology consists of three steps: 1) given network and flow knowledge, we first create a well-defined precise xMAS model for a specific application on a concrete on-chip network; 2) the specific xMAS model is then mapped to an NC graph (NCG) following a set of mapping rules; and 3) finally, existing QoS analysis techniques can be applied to analyze the NCG to obtain end-to-end delay bound per flow. We also show how to apply the technique to a typical all-to-one communication pattern on a binary-tree network and conduct an SoC case study, exemplifying the step-by-step analysis procedure and discussing the tightness of the results.

  • 120.
    Lu, Zhonghai
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Zhong, Mingchen
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Connection-oriented multicasting in wormhole-switched networks on chip2006In: Proceedings of the 16th ACM Great Lakes symposium on VLSI, Association for Computing Machinery (ACM), 2006, p. 296-301Conference paper (Refereed)
    Abstract [en]

    Deflection routing is being proposed for networks on chips since it is simple and adaptive. A deflection switch can be much smaller and faster than a wormhole or virtual cut-through switch. A deflection-routed network has three orthogonal characteristics: topology, routing algorithm and deflection policy. In this paper we evaluate deflection networks with different topologies such as mesh, torus and Manhattan Street Network, different routing algorithms such as random, dimension XY, delta XY and minimum deflection, as well as different deflection policies such as non-priority, weighted priority and straight-through policies. Our results suggest that the performance of a deflection network is more sensitive to its topology than the other two parameters. It is less sensitive to its routing algorithm, but a routing algorithm should be minimal. A priority-based deflection policy that uses global and history-related criterion can achieve both better average-case and worst-case performance than a non-priority or priority policy that uses local and stateless criterion. These findings are important since they can guide designers to make right decisions on the deflection network architecture, for instance, selecting a routing algorithm or deflection policy which has potentially low cost and high speed for hardware implementation.

  • 121.
    Ma, Ning
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK.
    Zheng, Lirong
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK.
    System design of full HD MVC decoding on mesh-based multicore NoCs2011In: Microprocessors and microsystems, ISSN 0141-9331, E-ISSN 1872-9436, Vol. 35, no 2, p. 217-229Article in journal (Refereed)
    Abstract [en]

    Future multimedia applications such as full HD (1920 x 1080) multiview video coding (MVC) present great challenges on computing architectures. Even if with the state-of-the-art ASIC technology which can process single view HD decoding, dealing with multiple views would require times of computation capacity in proportion to the number of views, which is difficult to achieve. In this paper, we explore the system-level design space for full HD MVC applications mapped onto mesh-based multicore Network-on-Chip (NoC) architectures. To this end, we establish a simulation framework capable of simulating the combination of communication networks with computing cores. We investigate two task assignment schemes: picture-level assignment and view-level assignment. With an eight-view MVC decoding, we explore the design options with respect to network size, single-core performance and link bandwidth under both task assignment schemes. Our studies show that, to achieve a certain decoding performance, the computation capability and communication capacity should be balanced in the system. Also, to realize the eight-view HD decoding, the system only requires twice or less than twice of the single-core processing capacity required by single view decoding, thanks to the parallel computation and communication enabled by the multicore NoC architectures. Our results exhibit feasibility and potential of efficiently implementing the full HD MVC decoding on multicore NoC architectures.

  • 122.
    Ma, Ning
    et al.
    KTH, School of Information and Communication Technology (ICT), Industrial and Medical Electronics.
    Zou, Zhuo
    KTH, School of Information and Communication Technology (ICT), Industrial and Medical Electronics.
    Blixt, Stefan
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Zheng, Lirong
    KTH, School of Information and Communication Technology (ICT), Industrial and Medical Electronics.
    A 2-mW Multi-mode Router Design with Dual-core Processor in 65 nm LL CMOS for Inter-silicon Communication2015Manuscript (preprint) (Other academic)
  • 123.
    Ma, Ning
    et al.
    KTH, School of Information and Communication Technology (ICT), Industrial and Medical Electronics.
    Zou, Zhuo
    KTH, School of Information and Communication Technology (ICT), Industrial and Medical Electronics.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Zheng, Lirong
    KTH, School of Information and Communication Technology (ICT), Industrial and Medical Electronics.
    Design and Implementation of Multi-mode Routers for Large-scale Inter-core Networks2016In: Integration, ISSN 0167-9260, E-ISSN 1872-7522, Vol. 53, p. 1-13Article in journal (Other academic)
    Abstract [en]

    Constructing on-chip or inter-silicon (inter-die/inter-chip) networks to connect multiple processors extends the system capability and scalability. It is a key issue to implement a flexible router that can fit into various application scenarios. This paper proposes a multi-mode adaptable router that can support both circuit and wormhole switching with supplying flexible working strategies for specific traffic patterns in diverse applications. The limitation of mono-mode switched routers is shown at first, followed by algorithm exploration in the proposed router for choosing the proper working strategy in a specific network. We then present the performance improvement when applying the mixed circuit/wormhole switching mode to different applications, and analyze the image decoding as a case study. The multi-mode router has been implemented with different configurations in a 65 nm CMOS technology. The one with 8-bit flit width is demonstrated together with a multi-core processor to show the feasibility. Working at 350 MHz, the average power consumption of the whole system is 22 mW.

  • 124.
    Ma, Ning
    et al.
    KTH, School of Information and Communication Technology (ICT), Industrial and Medical Electronics.
    Zou, Zhuo
    KTH, School of Information and Communication Technology (ICT), Industrial and Medical Electronics.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Zheng, Lirong
    KTH, School of Information and Communication Technology (ICT), Industrial and Medical Electronics.
    Implementing MVC Decoding on Homogeneous NoCs: Circuit Switching or Wormhole Switching2015Conference paper (Refereed)
    Abstract [en]

    To implement multiview video decoding on network on-chip (NoC) based homogeneous multicore architectures, the selection of switching techniques for routers is one of the most important aspects for design space exploration. Circuit switching and wormhole switching are two most feasible switching techniques for on-chip networks. To choose the suitable switching technique, we perform the comparison on decoding speed of the whole system, link utilization and delay between circuit switching and wormhole switching for implementing eight-view QVGA video decoding on 4 × 4 NoCs at 30 fps. The required link bandwidths are both around 800 Mbps with the similar network utilization and delay. We conclude that, to implement multiview video decoding on homogeneous NoCs, circuit switching is more suitable considering the similar performance and lower cost compared with wormhole switching.

  • 125.
    Ma, Ning
    et al.
    KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK.
    Zou, Zhuo
    KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK.
    Zheng, Lirong
    KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK.
    Blixt, Stefan
    A Hierarchical Reconfigurable Micro-coded Multi-core Processor for IoT Applications2014In: 2014 9TH INTERNATIONAL SYMPOSIUM ON RECONFIGURABLE AND COMMUNICATION-CENTRIC SYSTEMS-ON-CHIP (RECOSOC), 2014Conference paper (Refereed)
    Abstract [en]

    This paper presents a micro-coded multi-core processor featuring reconfigurability and scalability with high energy efficiency for IoT domain-specific applications. By simplifying the control logic and removing the pipelines, the gate count of one core is minimized to 14 K. Meanwhile, all the hardware units are directly controlled and can be reorganized by the long microinstructions. High utilization of the hardware is thus achieved when designing the micro programs properly. Furthermore, both the ISAs for C and Java have been implemented by the micro programs to supply the general-purpose programmability. Besides, application-specific instructions can be further developed once higher performance is demanded in specific scenarios. Depending on the performance requirement, the activity and working strategies of the cores are adjustable. Moreover, several processors can be further connected to construct a network with the integrated router for even higher performance. As a case study, the AES encryption is implemented using both C and micro programs. More than 10 times of performance improvement is achieved when using micro programs on the single core, and 20 times on two cores.

  • 126.
    Ma, Ning
    et al.
    KTH, School of Information and Communication Technology (ICT), Industrial and Medical Electronics.
    Zou, Zhuo
    KTH, School of Information and Communication Technology (ICT), Industrial and Medical Electronics.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Zheng, Lirong
    KTH, School of Information and Communication Technology (ICT), Industrial and Medical Electronics.
    Huan, Yuxiang
    Blixt, Stefan
    A 101.4 GOPS/W Reconfigurable and Scalable Control-centric Embedded Processor for Domain-specific Applications2016In: Proceedings - IEEE International Symposium on Circuits and Systems, IEEE, 2016, p. 1746-1749Conference paper (Refereed)
    Abstract [en]

    Increasing the energy efficiency and performance while providing the customizability and scalability is vital for embedded processors adapting to domain-specific applications such as Internet of Things. In this paper, we proposed a reconfigurable and scalable control-centric architecture, and implemented the design consisting of two cores and an on-chip multi-mode router in 65 nm technology. The reconfigurability is enabled by the restructurable sequence mapping table (SMT) thus the reorganizable functional units. Owing to the integration of the multi-mode router, on-chip or inter-chip network for multi-/many-core computing can be composed for performance extension on demand even in the post-fabrication stage. Control-centric design simplifies the control logic, shrinks the non-functional units and orchestrates the operations to increase the hard are utilization and reduce the excessive data movement for high energy efficiency. As a result, the processor can both conduct general-purpose processing with 29% smaller code size and application-specific processing with over 10 times performance improvement when implementing AES by SMT. The dual-core processor consumes 19.7 μW/MHz with die size of 3.5 mm2. The achieved energy efficiency is 101.4GOPS/W.

  • 127.
    Naeem, Abdul
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Chen, Xiaowen
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Realization and Performance Comparison of Sequential and Weak Memory Consistency Models in Network-on-Chip based Multi-core Systems2011In: Proceedings of 16th ACM/IEEE Asia and South Pacific Design Automation Conference(ASP-DAC) 2011, IEEE Press, 2011, p. 154-159Conference paper (Refereed)
    Abstract [en]

    This paper studies realization and performance comparison of the sequential and weak consistency models in the network-on-chip (NoC) based distributed shared memory (DSM) multi-ore systems. Memory consistency constrains the order of shared memory operations for the expected behavior of the multi-core systems. Both the consistency models are realized in the NoC based multi-core systems. The performance of the two consistency models are compared for various sizes of networks using regular mesh topologies and deflection routing algorithm. The results show that the weak consistency improves the performance by 46.17% and 33.76% on average in the code and consistency latencies over the sequential consistency model, due to relaxation in the program order, as the system grows from single core to 64 cores.

  • 128.
    Naeem, Abdul
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Chen, Xiaowen
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Scalability of Relaxed Consistency Models in NoC based Multicore Architectures2009In: SIGARCH Computer Architecture News, ISSN 0163-5964, E-ISSN 1943-5851, Vol. 37, no 5, p. 8-15Article in journal (Other academic)
    Abstract [en]

    This paper studies realization of relaxed memory consistency models in the network-on-chip based distributed shared memory (DSM) multi-core systems. Within DSM systems, memory consistency is a critical issue since it affects not only the performance but also the correctness of programs. We investigate the scalability of the relaxed consistency models (weak, release consistency) implemented by using transaction counters. Our experimental results compare the average and maximum code, synchronization and data latencies of the two consistency models for various network sizes with regular mesh topologies. The observed latencies rise for both the consistency models as the network size grows. However, the scaling behaviors are different. With the release consistency model these latencies grow significantly slower than with the weak  onsistency due to better optimization potential by means of overlapping, reordering and program order relaxations. The release consistency improves the performance by 15.6% and 26.5% on average in the code and consistency latencies over the weak consistency model for the specific application, as the system grows from single core to 64 cores. The latency of data transactions  rows 2.2 times faster on the average with a weak consistency model than with a release consistency model when the system scales from single core to 64 cores.

  • 129.
    Naeem, Abdul
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Chen, Xiaowen
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Scalability of Weak Consistency in NoC based Multicore Architectures2010In: IEEE INT SYMP CIRC SYST PROC, New York: IEEE , 2010, p. 3497-3500Conference paper (Refereed)
    Abstract [en]

    In Multicore Network-on-Chip, it is preferable to realize distributed but shared memory (DSM) in order to reuse the huge amount of legacy code and easy programming. Within DSM systems, memory consistency is a critical issue since it affects not only performance but also the correctness of programs. In this paper, we investigate the scalability of the weak consistency model, which may be implemented using a transaction counter. The experimental results compare synchronization latencies for various network sizes, topologies and lock positions in the network. Average synchronization latency rises exponentially for mesh and torus topologies as the network size grows. However, torus improves the synchronization latency in comparison to mesh. For mesh topology network average synchronization latency is also slightly affected by the lock position with respect to the network center.

  • 130.
    Naeem, Abdul
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Chen, Xiaowen
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Realization and Scalability of Release and Protected Release Consistency Models in NoC based Systems2011In: Proceeding of 14th Euromicro Conference on Digital System Design, 2011, Oulu: IEEE Computer Society, 2011, p. 47-54Conference paper (Refereed)
    Abstract [en]

    This paper studies the realization and scalability of release and protected release consistency models in Network-on-Chip (NoC) based Distributed Shared Memory (DSM) multi-core systems. The protected release consistency (PRC) model is proposed as an extension of the release consistency (RC) model and provides further relaxation in the shared memory operations. The realization schemes of RC and PRC models use a transaction counter in each node of the NoC based multi-core (McNoC) systems. Further, we study the scalability of these RC and PRC models and evaluate their performance in the McNoC platform. A configurable NoC based platform with 2D mesh topology and deflection routing algorithm is used in the tests. We experiment both with synthetic and application workloads. The performance of the RC and PRC models are compared using sequential consistency (SC) as the baseline. The experiments show that the average code execution time for the PRC model in 8x8 network (64 cores) is reduced by 30.5% over SC, and by 6.5% over RC model. Average data execution time in the 8x8 network for the PRC model is reduced by almost 37% over SC and by 8.8% over RC. The increase in area for the PRC of RC is about 880 gates in the network interface ( 1.7% ).

  • 131.
    Naeem, Abdul
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Architecture Support and Comparison of Three Memory Consistency Models in NoC based Syst2012In: Proceedings of 15th EUROMICRO Conference on Digital System Design: Architectures, Methods and Tools (DSD 2012), IEEE Computer Society, 2012, p. 304-311Conference paper (Refereed)
    Abstract [en]

    We propose a novel hardware support for three relaxed memory models, Release Consistency (RC), Partial Store Ordering (PSO) and Total Store Ordering (TSO) in Network-on-Chip (NoC) based distributed shared memory multicore systems. The RC model is realized by using a Transaction Counter and an Address Stack based approach while the PSO and TSO models are realized by using a Write Transaction Counter and a Write Address Stack based approach. In the experiments, we use a configurable platform based on a 2D mesh NoC using deflection routing policy. The results show that under synthetic workloads, the average execution time for the RC, PSO and TSO models in 8x8 network (64 cores) is reduced by 35.8%, 22.7% and 16.5% respectively, over the Sequential Consistency (SC) model. The average speedup for the RC, PSO and TSO models in the 8x8 network under different application workloads is increased by 34.3%, 10.6% and 8.9%, respectively, over the SC model. The area cost for the TSO, PSO and RC models is increased by less than 2% over the SC model at the interface to the processor.

  • 132.
    Naeem, Abdul
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Scalability Analysis of Memory Consistency Models in NoC-based Distributed Shared Memory SoCs2013In: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, ISSN 0278-0070, E-ISSN 1937-4151, Vol. 32, no 5, p. 760-773Article in journal (Refereed)
    Abstract [en]

    We analyze the scalability of six memory consistency models in network-on-chip (NoC)-based distributed shared memory multicore systems: 1) protected release consistency (PRC); 2) release consistency (RC); 3) weak consistency (WC); 4) partial store ordering (PSO); 5) total store ordering (TSO); and 6) sequential consistency (SC). Their realizations are based on a transaction counter and an address-stack-based approach. The scalability analysis is based on different workloads mapped on various sizes of networks using different problem sizes. For the experiments, we use Nostrum NoC-based configurable multicore platform with a 2-D mesh topology and a deflection routing algorithm. Under the synthetic workloads, the average execution time for the PRC, RC, WC, PSO, and TSO models in the 8 x 8 network (64-cores) is reduced by 32.3%, 28.3%, 20.1%, 13.8%, and 9.9% over the SC model, respectively. For the application workloads, as the network size grows, the average execution time under these relaxed memory models decreases with respect to the SC model depending on the application and its match to the architecture. The performance improvement of the PRC and RC models over the SC model tends to be higher than 50% as observed in the experiments, when the system is further scaled up. The area cost in the network interface for the relaxed memory models is increased by less than 4% over the SC model.

  • 133.
    Naeem, Abdul
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Scalability analysis of release and sequential consistency models in NoC based multicore systems2012In: 2012 International Symposium on System on Chip, SoC 2012, IEEE , 2012, p. 6376350-Conference paper (Refereed)
    Abstract [en]

    We analyze the scalability of the Release Consistency (RC) and Sequential Consistency (SC) models which are realized in the Network-on-Chip (NoC) based distributed shared memory multicore systems. The analysis is performed on the basis of workloads mapped on the different sizes of networks with different data sets. The experiments use a configurable platform based on a 2D mesh NoC using deflection routing algorithm. The results show that under the synthetic workloads using different distributed locks, the performance of the RC model is increased by 17.6% to 54.6% over the SC model in the 64-cores system. For the application workloads, as the network size grows from 1 to 64 cores, the execution time under the RC model decreases relative to the SC model which depends on the application and its match to the architecture. The performance improvement of the RC model over the SC model tends to be higher than 50% observed in the experiments, when the system is further scaled up.

  • 134.
    Naeem, Abdul
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Scalability and Performance Evaluation of Memory Consistency Models in NoC based Multicore SoCs2012Conference paper (Other academic)
  • 135. Qian, Yue
    et al.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Dou, Qiang
    QoS Scheduling for NoCs: Strict Priority Queueing versus Weighted Round Robin2010In: 2010 IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, 2010, p. 52-59Conference paper (Refereed)
    Abstract [en]

    Strict Priority Queueing (SPQ) and Weighted Round Robin (WRR) are two common scheduling techniques to achieve Quality-of-Service (QoS) while using shared resources. Based on network calculus, we build analytical models for traffic flows under SPQ and WRR scheduling in on-chip wormhole networks. With these models, we can derive per-flow end-to-end delay bound. We compare the service behavior and show that WRR is not only more fair but also more flexible for QoS provision. To exhibit the potential and flexibility enabled by WRR, we develop a weight allocation algorithm to automatically assign proper weights for individual flows to satisfy their delay constraints. In particular, the weights are assigned in a way not more than necessary, in other words, to approach flows' delay constraints in order to leave room for other flows. Our experimental results validate our analysis technique and algorithms.

  • 136. Qian, Yue
    et al.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Dou, Wenhua
    Analysis of Communication Delay Bounds for Network on Chips2009In: PROCEEDINGS OF THE ASP-DAC 2009: ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE 2009, 2009, p. 7-12Conference paper (Refereed)
    Abstract [en]

    In network-on-chip, computing worst-case delay bound for packet delivery is crucial for designing predictable systems but yet an intractable problem due to complicated resource contention scenarios. In this paper, we present an analysis technique to derive the communication delay bound for individual flows. Based on a network contention model, this technique, which is topology independent, employs the network calculus theory to first compute the equivalent service curve for individual flows and then calculate their packet delay bound. To exemplify our method, we also present the derivation of a closed-form formula to calculate the delay bound for all-to-one gather communication. Our experimental results demonstrate the theoretical bounds are correct and tight.

  • 137. Qian, Yue
    et al.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Dou, Wenhua
    Analysis of Worst-case Delay Bounds for Best-effort Communication in Wormhole Networks on Chip2009In: 2009 3RD ACM/IEEE INTERNATIONAL SYMPOSIUM ON NETWORKS-ON-CHIP, 2009, p. 44-53Conference paper (Refereed)
    Abstract [en]

    In packet-switched network-on-chip, computing worst-case delay bounds is crucial for designing predictable and cost-effective communication systems but yet an intractable problem due to complicated resource sharing scenarios. For wormhole networks with credit-based flow control, the existence of cyclic dependency between flit delivery and credit generation further complicates the problem. Based on network calculus, we propose a technique for analyzing communication delay bounds for individual flows in wormhole networks. We first propose router service analysis models for flow control, link and buffer sharing. Based on these analysis models, we obtain a buffering-sharing analysis network, which is open-ended and captures both flow control and link sharing. Furthermore, we compute equivalent service curves for individual flows using the network contention tree model in the buffer-sharing analysis network, and then derive their delay bounds. Our experimental results verify that the theoretical bounds are correct and tight.

  • 138. Qian, Yue
    et al.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Dou, Wenhua
    Analysis of Worst-Case Delay Bounds for On-Chip Packet-Switching Networks2010In: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, ISSN 0278-0070, E-ISSN 1937-4151, Vol. 29, no 5, p. 802-815Article in journal (Refereed)
    Abstract [en]

    In network-on-chip (NoC), computing worst-case delay bounds for packet delivery is crucial for designing predictable systems but yet an intractable problem. This paper presents an analysis technique to derive per-flow communication delay bound. Based on a network contention model, this technique, which is topology independent, employs network calculus to first compute the equivalent service curve for an individual flow and then calculate its packet delay bound. To exemplify this method, this paper also presents the derivation of a closed-form formula to compute a flow's delay bound under all-to-one gather communication. Experimental results demonstrate that the theoretical bounds are correct and tight.

  • 139. Qian, Yue
    et al.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Dou, Wenhua
    Applying Network Calculus for Performance Analysis of Self-Similar Traffic in On-Chip Networks2009In: IEEE/ACM/IFIP 2009 International Conference on Hardware-Software Codesign and System Synthesis (CODES+ISSS’09), 2009, p. 453-460Conference paper (Refereed)
    Abstract [en]

    On-chip traffic of many applications exhibits self-similar characteristics. In this paper, we intend to apply network calculus to analyze the delay and backlog bounds for self-similar traffic in networks on chips. We first prove that self-similar traffic can not be constrained by any deterministic arrival curve. Then we prove that self-similar traffic can be constrained by deterministic linear arrival curves α{r,b}(t)=rt+b (r:rate, b:burstiness) if an additional parameter, excess probability ε, is used to capture its burstiness exceeding the arrival envelope. This three-parameter model, ε-α{r,b}(t)=rt+b(ε), enables us to apply and extend the results of network calculus to analyze the performance and buffering cost of networks delivering self-similar traffic flows. Assuming the latency-rate server model for the network elements, we give closed-form equations to compute the delay and backlog bounds for self-similar traffic traversing a series of network elements. Furthermore, we describe a performance analysis flow with self-similar traffic as input. Our experimental results using real on-chip multimedia traffic traces validate our model and approach.

  • 140. Qian, Yue
    et al.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Dou, Wenhua
    Applying Network Calculus for Worst-case Delay Bound Analysis in On-chip Networks2009In: Proceedings of the DTIS'09 - 2009 4th IEEE International Conference on Design and Technology of Integrated Systems in Nanoscale Era / [ed] ElTahawy, H; Abadir, M; Jerraya, A; Salem, A, 2009, p. 113-118Conference paper (Refereed)
    Abstract [en]

    In network-on-chip, computing worst-case delay bounds for packet delivery is crucial for designing predictable systems but yet an intractable problem due to complicated resource contention scenarios. In this paper, based on network calculus, we propose a technique for analyzing the communication delay bound for individual flows. The fundamental elements with the technique include three network calculus models that describe the traffic behaviors when flows are multiplexed, split, or controlled by feedback credits, respectively. Based on the basic models, we can compute the equivalent system service curve for individual flows and then calculate their packet delay bound.

  • 141. Qian, Yue
    et al.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Dou, Wenhua
    Comparative Analysis of Worst-Case Communication Delay Bounds for 2D and 3D NoCs2009In: Proceedings of Workshop on 3D Integration and Interconnect-Centric Architectures held in conjunction with 15th International Symposium on High-Performance Computer Architecture, 2009Conference paper (Refereed)
  • 142. Qian, Yue
    et al.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Dou, Wenhua
    From 2D to 3D NoCs: A Case Study of the worst-case Communication Performance2009In: IEEE/ACM 2009 International Conference on Computer-Aided Design (ICCAD’09), IEEE Computer Society, 2009, p. 555-562Conference paper (Refereed)
    Abstract [en]

    Advanced integration technologies enable the construction of Network-on-Chip (NoC) from two dimensions to three dimensions. Studies have shown that 3D NoCs can improve average communication performance because of the possibility of using the additional dimension to shorten communication distance. In this paper, we present a detailed case study on worst-case communication performance in regular k-ary-2-mesh networks. Through both analysis and simulation, we show that, while 3D networks achieve better average performance, this may not be the case for worst-case performance mainly due to constraints on vertical channels. Our analysis is based on network calculus, which allows to calculate theoretical delay bounds for constrained flows traversing network elements.

  • 143. Qian, Yue
    et al.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Dou, Wenhua
    Worst-Case Flit and Packet Delay Bounds in Wormhole Networks on Chip2009In: IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences, ISSN 0916-8508, E-ISSN 1745-1337, Vol. E92A, no 12, p. 3211-3220Article in journal (Refereed)
    Abstract [en]

    We investigate per-flow flit and packet worst-case delay bounds in on-chip wormhole networks. Such investigation is essential in order to provide guarantees under worst-case conditions in cost-constrained systems, as required by many hard real-time embedded applications. We first propose analysis models for flow control, link and buffer sharing. Based on these analysis models, we obtain an open-ended service analysis model capturing the combined effect of flow control, link and buffer sharing. With the service analysis model, we compute equivalent service curves for individual flows, and then derive their flit and packet delay bounds. Our experimental results verify that our analytical bounds are correct and tight.

  • 144. Qian, Yue
    et al.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Dou, Wenhua
    Dou, Qiang
    Analyzing Credit-Based Router-to-Router Flow Control for On-Chip Networks2009In: IEICE transactions on electronics, ISSN 0916-8524, E-ISSN 1745-1353, Vol. E92C, no 10, p. 1276-1283Article in journal (Refereed)
    Abstract [en]

    Credit-based router-to-router flow control is one main link-level flow control mechanism proposed for Networks on Chip (NoCs). Based on network calculus, we analyze its performance and optimal buffer size. To model the feedback control behavior due to credits, we introduce a virtual network service element called flow controller. Then we derive its service curve, and further the system service curve. In addition, we give and prove a theorem that determines the optimal buffer size guaranteeing the maximum system service curve. Moreover, assuming the latency-rate server model for routers, we give closed-form formulas to calculate the flit delay bound and optimal buffer size. Our experiments with real on-chip traffic traces validate that our analysis is correct; delay bounds are tight and the optimal buffer size is exact.

  • 145. Qiu, Meikang
    et al.
    Ming, Zhong
    Li, Jiayin
    Liu, Shaobo
    Wang, Bin
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Three-phase time-aware energy minimization with DVFS and unrolling for Chip Multiprocessors2012In: Journal of systems architecture, ISSN 1383-7621, E-ISSN 1873-6165, Vol. 58, no 10, p. 439-445Article in journal (Refereed)
    Abstract [en]

    Energy consumption has been one of the most critical issues in the Chip Multiprocessor (CMP). Using the Dynamic Voltage and Frequency Scaling (DVFS), a CMP system can achieve a balance between the performance and the energy-efficiency. In this paper, we propose a three-phase discrete DVFS algorithm for a CMP system dedicated to applications where the period of the applications' task graph is smaller than the deadline of tasks. In these applications, multiple task graphs are unrolled and then concatenated together to form a new task graph. The proposed DVFS algorithm is applied to the newly formed task graph to stretch tasks' execution time, lower operating frequencies of processors and achieve the system power efficiency. Experimental results show that the proposed algorithm reduces the energy dissipation by 25% on average, compared to previous DVFS approaches.

  • 146.
    Saggio, Alberto
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Du, G.
    Zhao, Xueqian
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Validating delay bounds in networks on chip: Tightness and pitfalls2015In: Proceedings of IEEE Computer Society Annual Symposium on VLSI, ISVLSI, Institute of Electrical and Electronics Engineers (IEEE), 2015, p. 404-409Conference paper (Refereed)
    Abstract [en]

    Analytical methods for estimating on-chip network performance can be very useful to accelerate and simplify the design process of Networks on Chip. However, in order to increase the confidence in these approaches it is fundamental to perform systematic studies that assess their potential. We present a methodical investigation on the tightness between analytical end-to-end delay bounds and worst-case simulation latencies in various scenarios. We first introduce our network calculus based analytical technique to derive per-flow communication delay bounds. Then, we examine the worst-case performance analysis process in NoCs outlining the major aspects that affect the tightness. Finally, experimental results confirm our deductions and allow us to provide general guidelines to avoid pitfalls in the validation process of analytical delay bounds.

  • 147.
    Sander, Ingo
    et al.
    KTH, Superseded Departments, Electronic Systems Design.
    Jantsch, Axel
    KTH, Superseded Departments, Electronic Systems Design.
    Lu, Zhonghai
    KTH, Superseded Departments, Electronic Systems Design.
    Development and application of design transformations in ForSyDe2003In: IEE Proceedings - Computers and digital Techniques, ISSN 1350-2387, E-ISSN 1359-7027, Vol. 150, no 5, p. 313-320Article in journal (Refereed)
    Abstract [en]

    The formal system design (ForSyDe) methodology has been developed for system level design. Starting with a formal specification model, which captures the functionality of the system at a high level of abstraction, it provides formal design transformation methods for a transparent refinement process of the specification model into an implementation model which is optimised for synthesis. The formal treatment of transformational design refinement is the central contribution of this article. Using the formal semantics of ForSyDe processes we introduce the term characteristic function to be able to define and classify transformations as either semantic preserving or design decision. We also illustrate how we can incorporate classical synthesis techniques that have traditionally been used with control/data-flow graphs as ForSyDe transformations. This approach avoids discontinuities as it moves design refinement into the domain of the specification model.

  • 148.
    Schamberger, Pierre
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jiang, X.
    Qiu, M.
    Modeling and power evaluation of on-chip router components in spintronics2012In: Proceedings of the 2012 6th IEEE/ACM International Symposium on Networks-on-Chip, NoCS 2012, IEEE , 2012, p. 51-58Conference paper (Refereed)
    Abstract [en]

    On-chip routers are power hungry components. Besides exploiting current CMOS-based power-saving techniques, it is also desirable to investigate the power saving potential enabled by new technologies and devices. This paper investigates the potential of exploiting the emerging spin-electronics based MTJ (Magnetic Tunnel Junction) devices with application to on-chip router modules, in particular, buffers and crossbars. To this end, we build MTJ models, design circuits based on mixed MTJ-CMOS devices, and evaluate their switching power consumption, using their pure CMOS counterparts as the baseline. Our study shows that the new technology can significantly improve power efficiency for buffers but the gain for crossbars is less clear.

  • 149.
    Shaoteng, Liu
    et al.
    KTH, School of Information and Communication Technology (ICT). KTH.
    Zhonghai, LuKTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.Axel, JantschTU Wien, Vienna, Austria.
    Highway in TDM NoC2015Conference proceedings (editor) (Refereed)
  • 150.
    She, Huimin
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    System-level evaluation of sensor networks deployment strategies: Coverage, lifetime and cost2012In: 2012 8th International Wireless Communications And Mobile Computing Conference (IWCMC), IEEE , 2012, p. 549-554Conference paper (Refereed)
    Abstract [en]

    In wireless sensor networks, sensor nodes can be organized either randomly or deterministically according to regular deployment patterns. Due to the trade-offs between performance and cost, evaluating the advantages and disadvantages of node deployment strategies are fundamental issues to be solved. In this paper, we present a framework for analyzing node deployment schemes in terms of three performance metrics: coverage, lifetime, and cost. Based on the proposed coverage analysis model, energy model and cost model, we compare the performance of two node deployment schemes: rectangle mesh and uniformly random. The results show that the rectangle mesh scheme is generally better than the uniformly random scheme in terms of coverage and network lifetime. Our method can be used to evaluate the benefits of different deployment schemes and thus provide guidelines for network designers.

1234 101 - 150 of 189
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf