Ändra sökning
Avgränsa sökresultatet
1234 1 - 50 av 194
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Träffar per sida
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
Markera
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 1. Anagnostopoulos, Iraklis
    et al.
    Xydis, Sotirios
    Bartzas, Alexandros
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Soudris, Dimitrios
    Jantsch, Axel
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Custom Microcoded Dynamic Memory Management for Distributed On-Chip Memory Organizations2011Ingår i: IEEE Embedded Systems Letters, ISSN 1943-0663, Vol. 3, nr 2, s. 66-69Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Multiprocessor system-on-chip (MPSoCs) have attracted significant attention since they are recognized as a scalable paradigm to interconnect and organize a high number of cores. Current multicore embedded systems exhibit increased levels of dynamicbehavior, leading to unexpected memory footprint variations unknown at design time.Dynamic memory management (DMM) is a promising solution for such types of dynamicsystems. Although some efficient dynamic memory managers have been proposed for conventional bus-based MPSoC platforms, there are no DMM solutions regarding the constraints and the opportunities delivered by the physical distribution of multiple memorynodes of the platform. In this work, we address the problem of providing customizedmicrocoded DMM on MPSoC platforms with distributed memory organization. Customization is enabled at application-and platform-level. Results show that customizedmicrocoded DMM can serve approximately 7× more allocation requests compared to puredistributed memory platforms and perform 25% faster than the corresponding high-level implementation in C language. 

  • 2.
    Badawi, Mohammad
    et al.
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Hemani, Ahmed
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Customizable Coarse-grained Energy-efficient Reconfigurable Packet Processing Architecture2014Ingår i: Proceedings Of The 2014 IEEE 25th International Conference on Application-specific Systems, Architectures and Processors (ASAP), IEEE , 2014, s. 30-35Konferensbidrag (Refereegranskat)
    Abstract [en]

    In this paper, we present a highly customizable and rapidly reconfigurable multi-core packet processing architecture that provides energy and area efficiency while retaining flexibility. Presented architecture with its agile reconfigurability permits time-critical adaptability where resources can be re-clustered at run time in few cycles, hence, maintaining efficiency if requirements of the use-case change. We elaborate the flexibility and adaptability of our architecture and we report its evaluation results. For evaluation, we performed the widely-used UDP/IP and we compared our proposed architecture to low-power 32-bit general purpose processors, a custom ASIC implementation and a programmable protocol processor. Compared to GPP-based solutions, our architecture is 20-34 times more energy efficient while providing 2.4-4.1 times higher throughput. While retaining the programmability, the proposed solution achieved 78% of the energy efficiency of hardwired ASIC implementation. Compared to a programmable protocol processor, our solution has 2.6 times more throughput and requires only a third of the gate count. lastly, we quantified the worst-case time and average-case time required for time-critical adaptability when reconfiguration occurs during a real-life Voice-Over IP traffic.

  • 3.
    Badawi, Mohammad
    et al.
    KTH, Skolan för informations- och kommunikationsteknik (ICT).
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektronik och Inbyggda System.
    Hemani, Ahmed
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektronik och Inbyggda System.
    Elastic Management and QoS Provisioning Scheme for Adaptable Multi-core Protocol Processing Architecture2016Ingår i: 19TH EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN (DSD 2016), IEEE, 2016, s. 575-583Konferensbidrag (Refereegranskat)
    Abstract [en]

    Adaptable protocol processing architectures can offer quality-of-service (QoS) while improving energy efficiency and resource utilization. However, a key condition for adaptable architectures to support QoS is that, the latency required for processor adaptation does not result in violating packet processing delay bound. Moreover, adaptation latency must not cause packets to accumulate until memory becomes full and packets are dropped. In this paper, we present an elastic management scheme for agile adaptable multi-core protocol processing architecture to facilitate processor adaptation when QoS has to be maintained. The proposed management scheme encompasses a set of reconfigurable finite state machines (FSMs) and each is dimensioned to associate single processing element (PE). During processor adaptation, the needed FSMs can rapidly be clustered to provide the control needed for the newly adapted structure. We use a real-life application to demonstrate how our proposed management scheme supports maintaining QoS during processor adaptation. We also quantify the time needed for processor adaptation as well as the reduction in energy, latency and area achieved when using our scheme.

  • 4.
    Badawi, Mohammad
    et al.
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektronik och Inbyggda System.
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektronik och Inbyggda System.
    Hemani, Ahmed
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektronik och Inbyggda System.
    Service-Guaranteed Multi-Port PacketMemory for Parallel Protocol Processing Architecture2016Ingår i: Proceedings - 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, PDP 2016, Institute of Electrical and Electronics Engineers (IEEE), 2016, s. 408-412, artikel-id 7445367Konferensbidrag (Refereegranskat)
    Abstract [en]

    Parallel processing architectures have been increasingly utilized due to their potential for improving performance and energy efficiency. Unfortunately, the anticipated improvement often suffers from a limitation caused by memory access latency and latency variation, which consequently impact Quality of Service (QoS). This paper presents a service-guaranteed multi-port packet memory system to boost parallelism in protocol processing architectures. In this proposed memory system, all arriving packets are guaranteed a memory space, such that, a packet memory space can be allocated in a bounded number of cycles and each of its locations is accessible in a single cycle. We consider a real-time Voice Over Internet Protocol (VOIP) call as a case-study to evaluate our service-guaranteed memory system.

  • 5. Candaele, Bernard
    et al.
    Aguirre, Sylvain
    Sarlotte, Michel
    Anagnostopoulos, Iraklis
    Xydis, Sotirios
    Bartzas, Alexandros
    Bekiaris, Dimitris
    Soudris, Dimitrios
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Chen, Xiaowen
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektronik- och datorsystem, ECS.
    Chabloz, Jean-Michel
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektronik- och datorsystem, ECS.
    Hemani, Ahmed
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektronik- och datorsystem, ECS.
    Jantsch, Axel
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektronik- och datorsystem, ECS.
    Vanmeerbeeck, Geert
    Kreku, Jari
    Tiensyrja, Kari
    Ieromnimon, Fragkiskos
    Kritharidis, Dimitrios
    Wiefrink, Andreas
    Vanthournout, Bart
    Martin, Philippe
    Mapping Optimisation for Scalable multi-core ARchiTecture: The MOSART approach2010Ingår i: Proceedings - IEEE Annual Symposium on VLSI, ISVLSI 2010, 2010, s. 518-523Konferensbidrag (Refereegranskat)
    Abstract [en]

    The project will address two main challenges of prevailing architectures: 1) The global Interconnect and memory bottleneck due to a single, globally shared memory with high access times and power consumption; 2) The difficulties in programming heterogeneous, multi-core platforms, in particular in dynamically managing data structures in distributed memory. MOSART aims to overcome these through a multi-core architecture with distributed memory organisation, a Network-on-Chip (NoC) communication backbone and configurable processing cores that are scaled, optimised and customised together to achieve diverse energy, performance, cost and size requirements of different classes of applications. MOSART achieves this by: A) Providing platform support for management of abstract data structures Including middleware services and a run-time data manager for NoC based communication infrastructure; 2) Developing tool support for parallelizing and mapping applications on the multi-core target platform and customizing the processing cores for the application.

  • 6. Candaele, Bernard
    et al.
    Aguirre, Sylvain
    Sarlotte, Michel
    Anagnostopoulos, Iraklis
    Xydis, Sotirios
    Bartzas, Alexandros
    Bekiaris, Dimitris
    Soudris, Dimitrios
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Chen, Xiaowen
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Chabloz, Jean-Michel
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Hemani, Ahmed
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Jantsch, Axel
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Vanmeerbeeck, Geert
    Kreku, Jari
    Tiensyrja, Kari
    Ieromnimon, Fragkiskos
    Kritharidis, Dimitrios
    Wiefrink, Andreas
    Vanthournout, Bart
    Martin, Philippe
    The MOSART Mapping Optimization for multi-core Architectures2011Ingår i: VLSI 2010 Annual Symposium, Springer Publishing Company, 2011, s. 181-195Konferensbidrag (Refereegranskat)
    Abstract [en]

    MOSART project addresses two main challenges of prevailing architectures: (i) Theglobal interconnect and memory bottleneck due to a single, globally shared memorywith high access times and power consumption; (ii) The difficulties in programmingheterogeneous, multi-core platforms MOSART aims to overcome these through amulti-core architecture with distributed memory organization, a Network-on-Chip(NoC) communication backbone and configurable processing cores that are scaled,optimized and customized together to achieve diverse energy, performance, cost andsize requirements of different classes of applications. MOSART achieves this by:(i) Providing platform support for management of abstract data structures includingmiddleware services and a run-time data manager for NoC based communicationinfrastructure; (ii) Developing tool support for parallelizing and mapping applicationson the multi-core target platform and customizing the processing cores for theapplication.

  • 7.
    Chen, DeJiu
    et al.
    KTH, Skolan för industriell teknik och management (ITM), Maskinkonstruktion (Inst.), Inbyggda styrsystem.
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektronik, Elektronik och inbyggda system.
    A Model-based Approach to Dynamic Self-Assessment for Automated Performance and Safety Awareness of Cyber-Physical Systems2017Ingår i: Model-Based Safety and Assessment - 5th International Symposium, Trento, Italy, September 11–13, 2017 / [ed] Marco Bozzano, Yiannis Papadopoulos, Springer, 2017, Vol. LNCS 10437, s. 227-240Konferensbidrag (Refereegranskat)
    Abstract [en]

    Modern automotive vehicles represent one category of CPS (Cyber-Physical Systems) that are inherently time- and safety-critical. To justify the actions for quality-of-service adaptation and safety assurance, it is fundamental to perceive the uncertainties of system components in operation, which are caused by emergent properties, design or operation anomalies. From an industrial point of view, a further challenge is related to the usages of generic purpose COTS (Commercial-Off-The-Shelf) components, which are separately developed and evolved, often not sufficiently verified and validated for specific automotive contexts. While introducing additional uncertainties in regard to the overall system performance and safety, the adoption of COTS components constitutes a necessary means for effective product evolution and innovation. Accordingly, we propose in this paper a novel approach that aims to enable advanced operation monitoring and self-assessment in regard to operational uncertainties and thereby automated performance and safety awareness. The emphasis is on the integration of several modeling technologies, including the domain-specific modeling framework EAST-ADL, the A-G contract theory and Hidden Markov Model (HMM). In particular, we also present some initial concepts in regard to the usage performance and safety awareness for quality-of-service adaptation and dynamic risk mitigation.

  • 8. Chen, X.
    et al.
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektronik och Inbyggda System.
    Li, Y.
    Jantsch, A.
    Zhao, Xueqian
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektronik och Inbyggda System.
    Chen, S.
    Guo, Y.
    Liu, Z.
    Lu, J.
    Wan, J.
    Sun, S.
    Chen, H.
    Achieving memory access equalization via round-trip routing latency prediction in 3D many-core NoCs2015Ingår i: Proceedings of IEEE Computer Society Annual Symposium on VLSI, ISVLSI, IEEE , 2015, s. 398-403Konferensbidrag (Refereegranskat)
    Abstract [en]

    3D many-core NoCs are emerging architectures for future high-performance single chips due to its integration of many processor cores and memories by stacking multiple layers. In such architecture, because processor cores and memories reside in different locations (center, corner, edge, etc.), memory accesses behave differently due to their different communication distances, and the performance (latency) gap of different memory accesses becomes larger as the network size is scaled up. This phenomenon may lead to very high latencies suffered from by some memory accesses, thus degrading the system performance. To achieve high performance, it is crucial to reduce the number of memory accesses with very high latencies. However, this should be done with care since shortening the latency of one memory access can worsen the latency of another as a result of shared network resources. Therefore, the goal should focus on narrowing the latency difference of memory accesses. In the paper, we address the goal by proposing to prioritize the memory access packets based on predicting the round-trip routing latencies of memory accesses. The communication distance and the number of the occupied items in the buffers in the remaining routing path are used to predict the round-trip latency of a memory access. The predicted round-trip routing latency is used as the base to arbitrate the memory access packets so that the memory access with potential high latency can be transferred as early and fast as possible, thus equalizing the memory access latencies as much as possible. Experiments with varied network sizes and packet injection rates prove that our approach can achieve the goal of memory access equalization and outperforms the classic round-robin arbitration in terms of maximum latency, average latency, and LSD1. In the experiments, the maximum improvement of the maximum latency, the average latency and the LSD are 80%, 14%, and 45% respectively. © 2015 IEEE.

  • 9.
    Chen, Xiamen
    et al.
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektronik, Elektronik och inbyggda system. National University of Defense Technology, China.
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektronik, Elektronik och inbyggda system.
    Lei, Yuanwu
    Wang, Yaohua
    Chen, Shenggang
    Multi-bit Transient Fault Control for NoC Links Using 2D Fault Coding Method2016Ingår i: 2016 TENTH IEEE/ACM INTERNATIONAL SYMPOSIUM ON NETWORKS-ON-CHIP (NOCS), IEEE, 2016Konferensbidrag (Refereegranskat)
    Abstract [en]

    In deep nanometer scale, Network-on-Chip (NoC) links are more prone to multi-bit transient fault. Conventional ECC techniques brings heavy area, power, and timing overheads when correcting and detecting multiple transient faults. Therefore, a cost-effective ECC technique, named 2D fault coding method, is adopted to overcome the multi-bit transient fault issue of NoC links. Its key innovation is that the wires of a link are treated as its matrix appearance and light-weight Parity Check Coding (PCC) is performed on the matrix's two dimensions (horizontal matrix rows and vertical matrix columns). Horizontal PCCs and vertical PCCs work together to find the faults' position and then correct them by simply inverting them. The procedure of using the 2D fault coding method to protect a NoC link is proposed, its correction and detection capability is analyzed, and its hardware implementation is carried out. Comparative experiments show that the proposal can largely reduce the ECC hardware cost, have much higher fault detection coverage, maintain almost zero silent fault percentages, and have higher fault correction percentages normalized under the same area, demonstrating that it is cost-effective and suitable to the multi-bit transient fault control for NoC links.

  • 10.
    Chen, Xiaowen
    et al.
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Chen, Shuming
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Jantsch, Axel
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Area and Performance Optimization of Barrier Synchronization on Multi-core Network-on-Chips2010Ingår i: 3rd IEEE International Conference on Computer and Electrical Engineering (ICCEE), 2010Konferensbidrag (Refereegranskat)
    Abstract [en]

    Barrier synchronization is commonly and widelyused to synchronize the execution of parallel processor coreson multi-core Network-on-Chips (NoCs). Since its globalnature may cause heavy serialization resulting in largeperformance penalty, barrier synchronization should becarefully designed to have low latency communication and tominimize overall completion time. Therefore, in the paper, wepropose a fast barrier synchronization mechanism, targetingMulti-core NoCs. The fast barrier synchronization mechanismincludes a dedicated hardware module, named Fast BarrierSynchronizer (FBS), integrated with each processor node. Itoffers a set of barrier counters and can concurrently processsynchronization requests issued by the local node and remotenodes via the on-chip network. The salient feature of our fastbarrier synchronization mechanism is that, once the barriercondition is reached, the “barrier release” acknowledgement isrouted to all processor nodes in a broadcast way in order tosave chip area by avoiding storing source node informationand to minimize completion time by avoiding serialization ofbarrier releasing. Synthesis results suggest that the FBS canrun over 1 GHz in SMIC® 130nm technology with small areaoverhead. We implemented a FBS-enhanced multi-core NoCarchitecture on our FPGA platform using the Xilinx® Virtex 5as the FPGA chip. FPGA utilization and simulation resultsshow that our fast barrier synchronization demonstrates botharea and performance advantages over the barriersynchronization counterpart with unicast barrier releasing.

  • 11.
    Chen, Xiaowen
    et al.
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Chen, Shuming
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Jantsch, Axel
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Hybrid distributed shared memory space in multi-core processors2011Ingår i: Journal of Software, ISSN 1796-217X, Vol. 6, nr 12 SPEC. ISSUE, s. 2369-2378Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    On multi-core processors, memories are preferably distributed and supporting Distributed Shared Memory (DSM) is essential for the sake of reusing huge amount of legacy code and easy programming. However, the DSM organization imports the inherent overhead of translating virtual memory addresses into physical memory addresses, resulting in negative performance. We observe that, in parallel applications, different data have different properties (private or shared). For the private data accesses, it's unnecessary to perform Virtual-to-Physical address translations. Even for the same datum, its property may be changeable in different phases of the program execution. Therefore, this paper focuses on decreasing the overhead of Virtualto- Physical address translation and hence improving the system performance by introducing hybrid DSM organization and supporting run-time partitioning according to the data property. The hybrid DSM organization aims at supporting fast and physical memory accesses for private data and maintaining a global and single virtual memory space for shared data. Based on the data property of parallel applications, the run-time partitioning supports changing the hybrid DSM organization during the program execution. It ensures fast physical memory addressing on private data and conventional virtual memory addressing on shared data, improving the performance of the entire system by reducing virtual-to-physical address translation overhead as much as possible. We formulate the run-time partitioning of hybrid DSM organization in order to analyze its performance. A real DSM based multi-core platform is also constructed. The experimental results of real applications show that the hybrid DSM organization with run-time partitioning demonstrates performance advantage over the conventional DSM counterpart. The percentage of performance improvement depends on problem size, way of data partitioning and computation/communication ratio of parallel applications, network size of the system, etc. In our experiments, the maximal improvement is 34.42%, the minimal improvement 3.68%.

  • 12.
    Chen, Xiaowen
    et al.
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Chen, Shuming
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Jantsch, Axel
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Xu, Bangjian
    Luo, Heng
    Multi-FPGA Implementation of a Network-on-Chip Based Many-core Architecture with Fast Barrier Synchronization Mechanism2010Ingår i: Proceedings of the IEEE Norchip Conference, 2010Konferensbidrag (Refereegranskat)
    Abstract [en]

    In this paper, we propose a fast barrier synchronization mechanism, targetingNetwork-on-Chip based manycore architectures. Its salient feature is that, once thebarrier condition is reached, the "barrier release" acknowledgement is routed to all processor nodes in a broadcast way in order to save area by avoiding storing source node information and to minimize completion time by eliminating serialization of barrierreleasing. Then, we construct a multi-FPGA platform using Xilinx® Virtex 5 as FPGA chipsand implement a NoC based many-core architecture on it. FPGA utilization and simulation results show that our mechanism demonstrates both area and performance advantages over the barrier synchronization counterpart with unicast barrier releasing. 

  • 13.
    Chen, Xiaowen
    et al.
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Chen, Shuming
    Jantsch, Axel
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Run-time Partitioning of Hybrid Distributed Shared Memory on Multi-core Network-on-Chips2010Ingår i: The 3rd IEEE International Symposium on Parallel Architectures, Algorithms and Programming (PAAP 2010), 2010, s. 39-46Konferensbidrag (Refereegranskat)
    Abstract [en]

    On multi-core Network-on-Chips (NoCs), mem- ories are preferably distributed and supporting Distributed Shared Memory (DSM) is essential for the sake of reusing huge amount of legacy code and easy programming. However, the DSM organization imports the inherent overhead of translating virtual memory addresses into physical memoryaddresses, resulting in negative performance. We observe that, in parallel applications, different data have different properties (private or shared). For the private data accesses, it's unnecessary to perform Virtual-to-Physical address translations. Even for the same datum, its property may be changeable in different phases of the program execution. Therefore, this paper focuses on decreasing the overhead of Virtual-to-Physical address translation and hence improving the system performance by introducing hybrid DSM organization and supporting run-time partitioning according to the data property. Thehybrid DSM organization aims at supporting fast and physical memory accesses for private data and maintaining a global and single virtual memory space for shared data. Based on the data property of parallel applications, the run-time partitioning supports changing the hybrid DSM organization during the program execution. It ensures fast physical memory addressing on private data and conventional virtual memory addressingon shared data, improving the performance of the entire system by reducing virtual-to-physical address translation overhead as much as possible. We formulate the run-timepartitioning of hybrid DSM organization in order to analyze its perfor- mance. A real DSM based multi-core NoC platform is also constructed. The experimental results of real applications show that the hybrid DSM organization with run-time partitioningdemonstrates performance advantage over the conventional DSM counterpart. The percentage of performance improve- ment depends on problem size, way of datapartitioning and computation/ communication ratio of parallel applications, network size of the system, etc. In our experiments, the maximal improvement is 34.42%, the minimal improvement 3.68%.

  • 14.
    Chen, Xiaowen
    et al.
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektronik och Inbyggda System.
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektronik och Inbyggda System.
    Jantsch, A.
    Chen, S.
    Guo, Y.
    Chen, H.
    Performance analysis of homogeneous on-chip large-scale parallel computing architectures for data-parallel applications2015Ingår i: Journal of Electrical and Computer Engineering, ISSN 2090-0147, E-ISSN 2090-0155, Vol. 2015, artikel-id 902591Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    On-chip computing platforms are evolving from single-core bus-based systems to many-core network-based systems, which are referred to as On-chip Large-scale Parallel Computing Architectures (OLPCs) in the paper. Homogenous OLPCs feature strong regularity and scalability due to its identical cores and routers. Data-parallel applications have their parallel data subsets that are handled individually by the same program running in different cores. Therefore, data-parallel applications are able to obtain good speedup in homogenous OLPCs. The paper addresses modeling the speedup performance of homogeneous OLPCs for data-parallel applications. When establishing the speedup performance model, the network communication latency and the ways of storing data of data-parallel applications are modeled and analyzed in detail. Two abstract concepts (equivalent serial packet and equivalent serial communication) are proposed to construct the network communication latency model. The uniform and hotspot traffic models are adopted to reflect the ways of storing data. Some useful suggestions are presented during the performance model's analysis. Finally, three data-parallel applications are performed on our cycle-accurate homogenous OLPC experimental platform to validate the analytic results and demonstrate that our study provides a feasible way to estimate and evaluate the performance of data-parallel applications onto homogenous OLPCs.

  • 15.
    Chen, Xiaowen
    et al.
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Jantsch, Axel
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Chen, Shuming
    Handling Shared Variable Synchronization in Multi-core Network-on-Chips with Distributed Memory2010Ingår i: Proceedings: IEEE International SOC Conference, SOCC 2010, 2010, s. 467-472Konferensbidrag (Refereegranskat)
    Abstract [en]

    Parallelized shared variable applications running on multi-core Network-on-Chips(NoCs) require efficient support for synchronization, since communication is on the critical path of system performance and contended synchronization requests may cause large performance penalty. In this paper, we propose a dedicated hardware module forsynchronization management. This module is called Synchronization Handler (SH), integrated with each processor-memory node on the multi-core NoCs. It uses two physical buffers to concurrently process synchronization requests issued by the local processor and remote processors via the on-chip network. One salient feature is that the two physical buffers are dynamically allocated to form multiple virtual buffers (a virtual buffer is related to a shared synchronization variable) so as to improve the buffer utilization and alleviate the head-of-line blocking. Synthesis results suggest that the SH can run over 900 MHz in 130nm technology with small area overhead. To justify the SH-enhanced multicore NoCs, we employ synthetic workloads to evaluate synchronizationcost and buffer utilization, and run synchronization-intensive applications to investigate speedup. The results show that our approach is viable.

  • 16.
    Chen, Xiaowen
    et al.
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektronik- och datorsystem, ECS.
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektronik- och datorsystem, ECS.
    Jantsch, Axel
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektronik- och datorsystem, ECS.
    Chen, Shuming
    Speedup Analysis of Data-parallel Applications on Multi-core NoCs2009Ingår i: Proceedings of the IEEE International Conference on ASIC (ASICON), 2009, s. 105-108Konferensbidrag (Refereegranskat)
    Abstract [en]

    As more computing cores are integrated onto a single chip, the effect of network communication latency is becoming more and more significant on Multi-core Network-onChips (NoCs). For data-parallel applications, we study the model ofparallel speedup by including network communication latency in Amdahl's law. The speedup analysis considers the effect of network topology, network size, traffic model and computation/communication ratio. We also study the speedup efficiency. In our Multi-core NoC platform, a real data-parallel application, i.e. matrix multiplication, is used to validate the analysis. Our theoretical analysis and the application results show that the speedup improvement is nonlinear and the speedup efficiency decreases as the system size is scaled up. Such analysis can be used to guide architects and programmers to improve parallel processing efficiency by reducing network latency with optimized network design and increasing computation proportion in the program.

  • 17.
    Chen, Xiaowen
    et al.
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Jantsch, Axel
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Chen, Shuming
    Supporting Distributed Shared Memory on Multi-core Network-on-Chips Using a Dual Microcoded Controller2010Ingår i: Proceedings of the confernece for Design Automation and Test in Europe, 2010, s. 39-44Konferensbidrag (Refereegranskat)
    Abstract [en]

    Supporting Distributed Shared Memory (DSM) is essential for multi-coreNetwork-on-Chips for the sake of reusing huge amount of legacy code and easy programmability. We propose a microcoded controller as a hardware module in each node to connect the core, the local memory and the network. The controller is programmable where the DSM functions such as virtual-to-physical address translation,memory access and synchronization etc. are realized using microcode. To enable concurrent processing of memory requests from the local and remote cores, ourcontroller features two mini-processors, one dealing with requests from the local coreand the other from remote cores. Synthesis results suggest that the controller consumes 51k gates for the logic and can run up to 455 MHz in 130 nm technology. To evaluate its performance, we use synthetic and application workloads. Results show that, when the system size is scaled up, the delay overhead incurred by the controller may become less significant when compared with the network delay. In this way, the delay efficiency of our DSM solution is close to hardware solutions on average but still have all the flexibility of software solutions.

  • 18.
    Chen, Xiaowen
    et al.
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Jantsch, Axel
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Chen, Shuming
    Supporting Efficient Synchronization in Multi-core NoCs Using Dynamic Buffer Allocation Technique2010Ingår i: Proceedings of the IEEE Annual Symposium on VLSI, 2010, s. 462-463Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper explores a dynamic buffer allocation technique to guide a distributedsynchronization architecture to support efficient synchronization on multi-core Network-on-Chips (NoCs). The synchronization architecture features two physical buffers to be able to concurrently queue and handle synchronization requests issued by the local processor and remote processors via the on-chip network. Using the dynamic bufferallocation technique, the two physical buffers are dynamically allocated to form multiple virtual buffers in order to improve buffers' utilization. Experiments are carried on to evaluate buffers' utilization.

  • 19.
    Chen, Xiaowen
    et al.
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Jantsch, Axel
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Chen, Shuming
    Chen, Shenggang
    Gu, Huitao
    Reducing Virtual-to-Physical address translation overhead in Distributed Shared Memory based multi-core Network-on-Chips according to data property2013Ingår i: Computers & electrical engineering, ISSN 0045-7906, E-ISSN 1879-0755, Vol. 39, nr 2, s. 596-612Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    In Network-on-Chip (NoC) based multi-core platforms, Distributed Shared Memory (DSM) preferably uses virtual addressing in order to hide the physical locations of the memories. However, this incurs performance penalty due to the Virtual-to-Physical (V2P) address translation overhead for all memory accesses. Based on the data property which can be either private or shared, this paper proposes a hybrid DSM which partitions a local memory into a private and a shared part. The private part is accessed directly using physical addressing and the shared part using virtual addressing. In particular, the partitioning boundary can be configured statically at design time and dynamically at runtime. The dynamic configuration further removes the V2P address translation overhead for those data with changeable property when they become private at runtime. In the experiments with three applications (matrix multiplication, 2D FFT, and H.264/AVC encoding), compared with the conventional DSM, our techniques show performance improvement up to 37.89%.

  • 20.
    Chen, Xiaowen
    et al.
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem. National University of Defense Technology, China .
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Jantsch, Axel
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Chen, Shuming
    Guo, Yang
    Liu, Hengzhu
    Cooperative communication for efficient and scalable all-to-all barrier synchronization on mesh-based many-core NoCs2014Ingår i: IEICE Electronics Express, ISSN 1349-2543, E-ISSN 1349-2543, Vol. 11, nr 18, s. 20140542-Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    On many-core Network-on-Chips (NoCs), communication is on the critical path of system performance and contended synchronization requests may cause large performance penalty. Different from conventional algorithm-based approaches, the paper addresses the barrier synchronization problem from the angle of optimizing its communication performance and proposes cooperative communication as a means to achieve efficient and scalable all-to-all barrier synchronization on mesh-based many-core NoCs. With the cooperative communication, routers collaborate with one another to accomplish a fast barrier synchronization task. The cooperative communication is implemented in our router at low cost. Through comparative experiments, our approach evidently exhibits high efficiency and good scalability.

  • 21.
    Chen, Xiaowen
    et al.
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Jantsch, Axel
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Chen, Shuming
    Liu, Hai
    Cooperative communication based barrier synchronization in on-chip mesh architectures2011Ingår i: IEICE ELECTRON EXPR, ISSN 1349-2543, Vol. 8, nr 22, s. 1856-1862Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    We propose cooperative communication as a means to enable efficient and scalable barrier synchronization on mesh-based many-core architectures. Our approach is different from but orthogonal to conventional algorithm-based optimizations. It relies on collaborating routers to provide efficient gather and multicast communication. In conjunction with a master-slave algorithm, it exploits the mesh regularity to achieve efficiency. The gather and multicast functions have been implemented in our router. Synthesis results suggest marginal area overhead. With synthetic and benchmark experiments, we show that our approach significantly reduces synchronization completion time and increases speedup.

  • 22.
    Chen, Xiaowen
    et al.
    KTH, Skolan för elektro- och systemteknik (EES).
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Centra, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK. KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektronik.
    Liu, S.
    Chen, S.
    Round-trip DRAM access fairness in 3D NoC-based many-core systems2017Ingår i: ACM Transactions on Embedded Computing Systems, ISSN 1539-9087, E-ISSN 1558-3465, Vol. 16, nr 5s, artikel-id 162Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    In 3D NoC-based many-core systems, DRAM accesses behave differently due to their different communication distances and the latency gap of different DRAM accesses becomes bigger as the network size increases, which leads to unfair DRAM access performance among different nodes. This phenomenon may lead to high latencies for some DRAM accesses that become the performance bottleneck of the system. The paper addresses the DRAM access fairness problem in 3D NoC-based many-core systems by narrowing the latency difference of DRAM accesses as well as reducing the maximum latency. Firstly, the latency of a round-trip DRAM access is modeled and the factors causing DRAM access latency difference are discussed in detail. Secondly, the DRAM access fairness is further quantitatively analyzed through experiments. Thirdly, we propose to predict the network latency of round-trip DRAM accesses and use the predicted round-trip DRAM access time as the basis to prioritize the DRAM accesses in DRAM interfaces so that the DRAM accesses with potential high latencies can be transferred as early and fast as possible, thus achieving fair DRAM access. Experiments with synthetic and application workloads validate that our approach can achieve fair DRAM access and outperform the traditional First-Come-First-Serve (FCFS) scheduling policy and the scheduling policies proposed by reference [7] and [24] in terms of maximum latency, Latency Standard Deviation (LSD)1 and speedup. In the experiments, the maximum improvement of the maximum latency, LSD, and speedup are 12.8%, 6.57%, and 8.3% respectively. Besides, our proposal brings very small extra hardware overhead (<0.6%) in comparison to the three counterparts.

  • 23.
    Chen, Xiaowen
    et al.
    KTH. National University of Defense Technology, KTH Royal Institute of Technology, China.
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektronik.
    Liu, Sheng
    Chen, Shuming
    Round-trip DRAM Access Fairness in 3D NoC-based Many-core Systems2017Ingår i: ACM Transactions on Embedded Computing Systems, ISSN 1539-9087, E-ISSN 1558-3465, Vol. 16, artikel-id 162Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    In 3D NoC-based many-core systems, DRAM accesses behave differently due to their different communication distances and the latency gap of different DRAM accesses becomes bigger as the network size increases, which leads to unfair DRAM access performance among different nodes. This phenomenon may lead to high latencies for some DRAM accesses that become the performance bottleneck of the system. The paper addresses the DRAM access fairness problem in 3D NoC-based many-core systems by narrowing the latency difference of DRAM accesses as well as reducing the maximum latency. Firstly, the latency of a round-trip DRAM access is modeled and the factors causing DRAM access latency difference are discussed in detail. Secondly, the DRAM access fairness is further quantitatively analyzed through experiments. Thirdly, we propose to predict the network latency of round-trip DRAM accesses and use the predicted round-trip DRAM access time as the basis to prioritize the DRAM accesses in DRAM interfaces so that the DRAM accesses with potential high latencies can be transferred as early and fast as possible, thus achieving fair DRAM access. Experiments with synthetic and application workloads validate that our approach can achieve fair DRAM access and outperform the traditional First-Come-First-Serve (FCFS) scheduling policy and the scheduling policies proposed by reference [7] and [24] in terms of maximum latency, Latency Standard Deviation (LSD) 1 and speedup. In the experiments, the maximum improvement of the maximum latency, LSD, and speedup are 12.8%, 6.57%, and 8.3% respectively. Besides, our proposal brings very small extra hardware overhead (< 0.6%) in comparison to the three counterparts.

  • 24. Chen, Y.
    et al.
    Xie, L.
    Li, J.
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    A deadlock-free fault-tolerant routing algorithm based on pseudo-receiving mechanism for networks-on-chip of CMP2011Ingår i: 2011 International Conference on Multimedia Technology, ICMT 2011, 2011, s. 2825-2828Konferensbidrag (Refereegranskat)
    Abstract [en]

    As the size of CMOS technology scales down to nanometers domain, fault-tolerant is becoming a challenge for NoC. Turn model provides a simple and efficient systematic approach to the development of deadlock-free routing algorithms. In this paper, we propose a pseudo-receiving mechanism based on the support of local processor's cache to enable prohibited turn, and meanwhile make it keep deadlockfree. We present a fault-tolerant routing algorithm based on pseudo-receiving mechanism for 2D mesh. The routing algorithm is livelock-free in the cost of disable a few un-faulty links or nodes. The algorithm is applied to a single-cycle fixed output-buffered router. Experimental results show that, it achieves high performance even under high faulty rate.

  • 25. Chen, Y.
    et al.
    Xie, L.
    Li, J.
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Slice router: For fine-granularity fault-tolerant Networks-on-Chip2011Ingår i: 2011 International Conference on Multimedia Technology, ICMT 2011, 2011, s. 3230-3233Konferensbidrag (Refereegranskat)
    Abstract [en]

    Almost all existing Networks-on-Chip (NoC) faulttolerant schemes are based on fault-tolerant routing algorithms. In these fault-tolerant schemes, faulty links or routers will be discarded all together. However, only a few part of the discarded link or router is faulty in most cases. It is wasteful to discard the whole link or router. In this paper, we present a slice router architecture which can be used in fine-granularity fault-tolerant NoC. The major motivation of presenting slice router is to refine faulty links and routers. The major idea is that a router is split into several sub-link routers, noted slices. Different from several physically independent routers, slices are coupled together in input/output ports. The coupling of slices makes the network to be able to fine-granularity fault-tolerant. In order to evaluate the fault-tolerant capability of slice routers, we design a looselycoupled 4-slices router with a backup sub-link in each link. Each slice is a single-cycle output buffered switch. Simulation results prove its fault-tolerant capability in the present of high faulty rates. The critical latency is only increased 0.04ns, because the configuration of slice interfaces is parallel with the output arbiter of slices. Under 65nm technology synthesized results show that, the increased area overhead of a slice router is only a few logic gates compared with the non-coupled slice router.

  • 26. Chen, Yancang
    et al.
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Xie, Lunguo
    Li, Jinwen
    Zhang, Minxuan
    A single-cycle output buffered router with layered switching for Networks-on-Chips2012Ingår i: Computers & electrical engineering, ISSN 0045-7906, E-ISSN 1879-0755, Vol. 38, nr 4, s. 906-916Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    We present a single-cycle output buffered router based on layered switching for networks on chips (NoCs). Different from state-of-the-art NoC routers, the router has three important characteristics: (1) It employs layered switching, which implements wormhole on top of virtual cut-through (VCT) switching; (2) In contrast to input buffered architectures, it adopts an output buffered architecture; (3) It is single cycle, meaning that the router pipeline takes only one cycle for all flits. Experimental results show that the router achieves up to 80% of ideal network throughput under uniform random traffic pattern. Compared with wormhole switching, layered switching achieves up to 36.9% latency reduction for 12-flit packets under uniform random traffic with an injection rate of 0.5 flit/cycle/node. Under 65 nm technology synthesized results show that its critical path has only 20 logic gates, and it reduces 11% area compared to the input virtual-channel router with the same buffer capacity.

  • 27. Du, G.
    et al.
    Li, M.
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Gao, M.
    Wang, C.
    An analytical model for worst-case reorder buffer size of multi-path minimal routing NoCs2014Ingår i: Proceedings - 2014 8th IEEE/ACM International Symposium on Networks-on-Chip, NoCS 2014, IEEE , 2014, s. 49-56Konferensbidrag (Refereegranskat)
    Abstract [en]

    Reorder buffers are often needed in multi-path routing networks-on-chips (NoCs) to guarantee in-order packet delivery. However, the buffer sizes are usually over-dimensioned, due to lack of worst-case analysis, leading to unnecessary larger area overhead. Based on network calculus, we propose an analysis framework for the worst-case reorder buffer size in multi-path minimal routing NoCs. Experiments with synthetic traffic and an industry case show that our method can effectively explore the traffic splitting space, as well as the mapping effects in terms of reorder buffer size with a maximum improvement of 36.50%.

  • 28. Du, G.
    et al.
    Ma, S.
    Li, Z.
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektronik.
    Ouyang, Y.
    Gao, M.
    Work-in-progress: SSS: Self-aware system-on-chip using static-dynamic hybrid method2017Ingår i: Proceedings of the 2017 International Conference on Compilers, Architectures and Synthesis for Embedded Systems Companion, CASES 2017, Association for Computing Machinery (ACM), 2017, artikel-id 3125527Konferensbidrag (Refereegranskat)
    Abstract [en]

    Network on chip has become the de facto communication standard for multi-core or many-core system on chip, due to its scalability and flexibility. However, temperature is an important factor in NoC design, which affects the overall performance of SoC-decreasing circuit frequency, increasing energy consumption, and even shortening chip lifetime. In this paper, we propose SSS, a self-aware SoC using a static-dynamic hybrid method, which combines dynamic mapping and static mapping to reduce the hot-spots temperature for NoC based SoCs. First, we propose monitoring the thermal distribution for self-state sensoring. Then, in static mapping stage, we calculate the optimal mapping solutions under different temperature modes using discrete firefly algorithm to help self-decision making. Finally, in dynamic mapping stage, we achieve dynamic mapping through configuring NoC and SoC sentient unit for selfoptimizing. Experimental results show SSS can reduce the peak temperature by up to 30.64%. FPGA prototype shows the effectiveness and smartness of SSS in reducing hot-spots temperature. Self-awareness, SoC architecture, NoC.

  • 29. Du, G.
    et al.
    Zhang, C.
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Saggio, A.
    Gao, M.
    Worst-case performance analysis of 2-D mesh NoCs using multi-path minimal routing2012Ingår i: CODES+ISSS'12 - Proceedings of the 10th ACM International Conference on Hardware/Software-Codesign and System Synthesis, Co-located with ESWEEK, ACM , 2012, s. 123-132Konferensbidrag (Refereegranskat)
    Abstract [en]

    In Network-on-Chip (NoC), multi-path routing is often preferable than single-path routing since it can better balance workload and thus provide better performance. However, performance analysis with multi-path routing is much more difficult due to complicated contention scenarios. Based on network calculus, we study worst-case performance of deterministic multi-path minimal routing on 2-D mesh NoCs. We first present a per-flow delay bound analysis technique for multi-path routing, which extends the analysis for singlepath routing but deals with traffic splitting. Then we define a contention matrix to capture network congestion status. Based on the contention matrix, we propose an effective nonuniform traffic splitting strategy to improve worst-case performance. Experiments with synthetic traffic flows and an industrial case show that our analysis can effectively explore the traffic splitting space, and verify the effectiveness of the non-uniform splitting policy.

  • 30. Du, Gaoming
    et al.
    Ou, Yanghao
    Li, Xiangyang
    Song, Ping
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektronik och Inbyggda System.
    Gao, Minglun
    OLITS: An Ohm's Law-like Traffic Splitting Model Based on Congestion Prediction2016Ingår i: PROCEEDINGS OF THE 2016 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), IEEE conference proceedings, 2016, s. 1000-1005Konferensbidrag (Refereegranskat)
    Abstract [en]

    Through traffic splitting, multi-path routing in Network-on-Chip (NoC) outperforms single-path routing in terms of load balance and resource utilization. However, uncontrolled traffic splitting may aggravate network congestion and worsen the communication delay. We propose an Ohm's Law-like traffic splitting model aiming for application-specific NoC. We first characterize the flow congestion by redefining a contention matrix, which contains flow parameters such as average flow rate and burstiness. We then define flow resistance as the flow congestion factor extracted from the contention matrix, and use the parallel resistance theory to predicate the congestion state for every target sub-flow. Finally, the traffic splitting proportions of the parallel sub-flows are assigned according to the equivalent flow resistance. Experiments are taken both on 2D and 3D multi-path routing NoCs. The results show that the worst-case delay bound of target flow is significantly improved, and network congestion can be effectively balanced.

  • 31.
    Eslami Kiasari, Abbas
    et al.
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Jantsch, Axel
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Bekooij, M.
    Burns, A.
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Analytical approaches for performance evaluation of networks-on-chip2012Ingår i: CASES'12 - Proceedings of the 2012 ACM International Conference on Compilers, Architectures and Synthesis for Embedded Systems, Co-located with ESWEEK, ACM , 2012, s. 211-212Konferensbidrag (Refereegranskat)
    Abstract [en]

    This tutorial reviews four popular mathematical formalisms - dataflow analysis, schedulability analysis, network calculus, and queueing theory - and how they have been applied to the analysis of Network-on-Chip (NoC) performance. We review the basic concepts and results of each formalism and provide examples of how they have been used in on-chip communication performance analysis. The tutorial also discusses the respective strengths and weaknesses of each formalism, their suitability for a specific purpose, and the attempts that have been made to bridge these analytical approaches. Finally, we conclude the tutorial by discussing open research issues.

  • 32.
    Eslami Kiasari, Abbas
    et al.
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Jantsch, Axel
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    A Framework for Designing Congestion-Aware Deterministic Routing2010Ingår i: NoCArc '10 Proceedings of the Third International Workshop on Network on Chip Architectures, 2010, s. 45-50Konferensbidrag (Refereegranskat)
    Abstract [en]

    In this paper, we present a system-level Congestion-Aware Routing (CAR) framework for designing minimal deterministic routing algorithms. CAR exploits the peculiarities of the application workload to spread the load evenly across the network. To this end, we first formulate an optimization problem of minimizing the level of congestion in the network and then use the simulated annealing heuristic to solve this problem. The proposed framework assures deadlock-free routing, even in the networks without virtual channels. Experiments with both synthetic and realistic workloads show the effectiveness of the CAR framework. Results show that maximum sustainable throughput of the network is improved by up to 205% for different applications and architectures.

  • 33.
    Eslami Kiasari, Abbas
    et al.
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Jantsch, Axel
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    A Heuristic Framework for Designing and Exploring Deterministic Routing Algorithm for NoCs2013Ingår i: Algorithms in Networks-on-Chip, Springer, 2013, s. 21-39Kapitel i bok, del av antologi (Refereegranskat)
    Abstract [en]

    In this chapter, we present a system-level framework for designing minimal deterministic routing algorithms for Networks-on-Chip (NoCs) that are customized for a set of applications. To this end, we first formulate an optimization problem of minimizing average packet latency in the network and then use the simulated annealing heuristic to solve this problem. To estimate the average packet latency we use a queueing-based analytical model which can capture the burstiness of the traffic. The proposed framework does not require virtual channels to guarantee deadlock freedom since routes are extracted from an acyclic channel dependency graph. Experiments with both synthetic and realistic workloads show the effectiveness of the approach. Results show that maximum sustainable throughput of the network is improved for different applications and architectures.

  • 34.
    Eslami Kiasari, Abbas
    et al.
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Jantsch, Axel
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Mathematical formalisms for performance evaluation of networks-on-chip2013Ingår i: ACM Computing Surveys, ISSN 0360-0300, E-ISSN 1557-7341, Vol. 45, nr 3, s. 38-Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    This article reviews four popular mathematical formalisms-queueing theory, network calculus, schedulability analysis, anddataflow analysis-and how they have been applied to the analysis of on-chip communication performance in Systems-on-Chip. The article discusses the basic concepts and results of each formalism and provides examples of how they have been used in Networks-on-Chip (NoCs) performance analysis. Also, the respective strengths and weaknesses of each technique and its suitability for a specific purpose are investigated. An open research issue is a unified analytical model for a comprehensive performance evaluation of NoCs. To this end, this article reviews the attempts that have been made to bridge these formalisms.

  • 35.
    Eslami Kiasari, Abbas
    et al.
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Jantsch, Axel
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    An Analytical Latency Model for Networks-on-Chip2013Ingår i: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 21, nr 1, s. 113-123Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    We propose an analytical model based on queueing theory for delay analysis in a wormhole-switched network-on-chip (NoC). The proposed model takes as input an application communication graph, a topology graph, a mapping vector, and a routing matrix, and estimates average packet latency and router blocking time. It works for arbitrary network topology with deterministic routing under arbitrary traffic patterns. This model can estimate per-flow average latency accurately and quickly, thus enabling fast design space exploration of various design parameters in NoC designs. Experimental results show that the proposed analytical model can predict the average packet latency more than four orders of magnitude faster than an accurate simulation, while the computation error is less than 10% in non-saturated networks for different system-on-chip platforms.

  • 36. Feng, C. -C
    et al.
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Zhang, M. -X
    Li, J. -W
    A 1-cycle 2 GHz bufferless router for network-on-chip2011Ingår i: Guofang Keji Daxue Xuebao/Journal of National University of Defense Technology, ISSN 1001-2486, Vol. 33, nr 6, s. 42-47Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Recently, bufferless router, which does not need buffers, has become a low-cost solution for Network-on-Chip. To improve the performance of the bufferless router, a 1-cycle high-performance bufferless router was proposed for Network-on-Chip. The router used a simple permutation network instead of the serialized switch allocator and the crossbar to achieve high performance. Compared with the virtual channel router and the baseline bufferless router, the proposed bufferless router can achieve the frequency of 2 GHz with small area cost under TSMC 65 nm technology. Simulation results under both synthetic and application workloads demonstrate that the proposed bufferless router achieves much less average packet latency than the virtual channel router and other bufferless routers.

  • 37. Feng, C.
    et al.
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Jantsch, Axel
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Zhang, M.
    Xing, Z.
    Addressing transient and permanent faults in NoC with efficient fault-tolerant deflection router2013Ingår i: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 21, nr 6, s. 1053-1066Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Continuing decrease in the feature size of integrated circuits leads to increases in susceptibility to transient and permanent faults. This paper proposes a fault-tolerant solution for a bufferless network-on-chip, including an on-line fault-diagnosis mechanism to detect both transient and permanent faults, a hybrid automatic repeat request, and forward error correction link-level error control scheme to handle transient faults and a reinforcement-learning-based fault-tolerant deflection routing (FTDR) algorithm to tolerate permanent faults without deadlock and livelock. A hierarchical-routing-table-based algorithm (FTDR-H) is also presented to reduce the area overhead of the FTDR router. Synthesized results show that, compared with the FTDR router, the FTDR-H router can reduce the area by 27% in an 8×8 network. Simulation results demonstrate that under synthetic workloads, in the presence of permanent link faults, the throughput of an 8×8 network with FTDR and FTDR-H algorithms are 14% and 23% higher on average than that with the fault-on-neighbor (FoN) aware deflection routing algorithm and the cost-based deflection routing algorithm, respectively. Under real application workloads, the FTDR-H algorithm achieves 20% less hop counts on average than that of the FoN algorithm. For transient faults, the performance of the FTDR router can achieve graceful degradation even at a high fault rate. We also implement the fault-tolerant deflection router which can achieve 400 MHz in TSMC 65-nm technology.

  • 38.
    Feng, Chaochao
    et al.
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Li, Jinwen
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Jantsch, Axel
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Zhang, Minxuan
    Evaluation of Deflection Routing on Various NoC Topologies2011Ingår i: Proceedings of the IEEE International Conference on ASIC (ASICON), 2011Konferensbidrag (Refereegranskat)
    Abstract [en]

    In this paper, we propose two novel deflection routing algorithms for de Bruijn and Spidergon NoCs and evaluate the performance of the deflection routing on 5 NoC topologies with different synthetic traffic patterns. We also synthesize the routers in various NoC topologies with TSMC 65nm technology. The evaluation results illustrate that the performance of deflection routing is susceptible to the network topology and traffic pattern. The results can also guide the NoC architect to choose the suitable NoC topology for the specific application.

  • 39.
    Feng, Chaochao
    et al.
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Liao, Z.
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Jantsch, Axel
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Zhao, Z.
    Performance analysis of on-chip bufferless router with multi-ejection ports2015Ingår i: Proceedings - 2015 IEEE 11th International Conference on ASIC, ASICON 2015, IEEE conference proceedings, 2015Konferensbidrag (Refereegranskat)
    Abstract [en]

    In general, the bufferless NoC router has only one local output port for ejection, which may lead to multiple arriving flits competing for the only one output port. In this paper, we propose a reconfigurable bufferless router in which the number of ejection ports can be configured as 2, 3 and 4. Simulation results demonstrate that the average packet latency of the routers with multi-ejection ports is 18%, 10%, 6%, 14%, 9% and 7% on average less than that of the router with 1 ejection ports under six synthetic workloads respectively. For application workloads, the average packet latency of the router with more than two ejection ports is slightly better than the router with only one ejection port, which can be neglect. Making a compromise of hardware cost and performance, it can be concluded that it is no need to implement bufferless routers with 3 and 4 ejection ports, as the router with 2 ejection ports can achieve almost the same performance as the routers with 3 and 4 ejection ports.

  • 40.
    Feng, Chaochao
    et al.
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Jantsch, Axel
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Li, Jinwen
    Zhang, Minxuan
    A Reconfigurable Fault-tolerant Deflection Routing Algorithm Based on Reinforcement Learning for Networks-on-Chip2010Ingår i: Proceedings of the International Workshop on Network on Chip Architectures (NoCArc), 2010Konferensbidrag (Refereegranskat)
  • 41.
    Feng, Chaochao
    et al.
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Jantsch, Axel
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Li, Jinwen
    Zhang, Minxuan
    FoN: Fault-on-Neighbor aware Routing Algorithm for Networks-on-Chip2010Ingår i: Proceedings - IEEE International SOC Conference, SOCC 2010, 2010, s. 441-446Konferensbidrag (Refereegranskat)
    Abstract [en]

    Reliability has become a key issue of Networks-on-Chip (NoC) as the CMOS technology scales down to the nanoscale domain. This paper proposes a Fault-on-Neighbor (FoN) aware deflection routing algorithm for NoC which makes routing decision based on the link status of neighbor switches within 2 hops to avoid fault links and switches. Simulation results demonstrate that in the presence of faults, the saturated throughput of the FoN switch is 13% higher on average than a cost-based deflection switch for 88 mesh. The average hop counts can be up to 1.7 less than the cost-based switch. The FoN switch is also synthesized using 65nm TSMC technology and it can work at 500MHz with small area overhead.

  • 42.
    Feng, Chaochao
    et al.
    National University of Defense Technology, China.
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Jantsch, Axel
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Zhang, Minxuan
    A 1-Cycle 1.25 GHz Bufferless Router for 3D Network-on-Chip2012Ingår i: IEICE transactions on information and systems, ISSN 0916-8532, E-ISSN 1745-1361, Vol. E95D, nr 5, s. 1519-1522Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    In this paper, we propose a 1-cycle high-performance 3D bufferless router with a 3-stage permutation network. The proposed router utilizes the 3-stage permutation network instead of the serialized switch allocator and 7 x 7 crossbar to achieve the frequency of 1.25 GHz in TSMC 65 nm technology. Compared with the other two 3D bufferless routers, the proposed router occupies less area and consumes less power consumption. Simulation results under both synthetic and application workloads illustrate that the proposed router achieves less average packet latency than the other two 3D bufferless routers.

  • 43.
    Feng, Chaochao
    et al.
    National University of Defense Technology, China.
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Jantsch, Axel
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Zhang, Minxuan
    Yang, Xianju
    Support Efficient and Fault-Tolerant Multicast in Bufferless Network-on-Chip2012Ingår i: IEICE transactions on information and systems, ISSN 0916-8532, E-ISSN 1745-1361, Vol. E95D, nr 4, s. 1052-1061Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    In this paper, we propose three Deflection-Routing-based Multicast (DRM) schemes for a bufferless NoC. The DRM scheme without packets replication (DRM_noPR) sends multicast packet through a non-deterministic path. The DRM schemes with adaptive packets replication (DRM_PR_src and DRM_PR_all) replicate multicast packets at the source or intermediate node according to the destination position and the state of output ports to reduce the average multicast latency. We also provide fault-tolerant supporting in these schemes through a reinforcement-learning-based method to reconfigure the routing table to tolerate permanent faulty links in the network. Simulation results illustrate that the DRM_PR_all scheme achieves 41%, 43% and 37% less latency on average than that of the DRM_noPR scheme and 27%, 29% and 25% less latency on average than that of the DRM_PR_src scheme under three synthetic traffic patterns respectively. In addition, all three fault-tolerant DRM schemes achieve acceptable performance degradation at various link fault rates without any packet lost.

  • 44. Feng, Chaochao
    et al.
    Zhang, Minxuan
    Li, Jinwen
    Jiang, Jiang
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Jantsch, Axel
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    A Low-overhead Fault-aware Deflection Routing Algorithm for 3D Network-on-Chip2011Ingår i: Proceedings - 2011 IEEE Computer Society Annual Symposium on VLSI, ISVLSI 2011, 2011, s. 19-24Konferensbidrag (Refereegranskat)
  • 45. Grange, Matt
    et al.
    Weldezion, Awet Yemane
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektronik- och datorsystem, ECS.
    Pamunuwa, Dinesh
    Weerasekera, Roshan
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektronik- och datorsystem, ECS.
    Jantsch, Axel
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektronik- och datorsystem, ECS.
    Shippen, D.
    Physical mapping and performance study of a multi-clock 3-Dimensional Network-on-Chip mesh2009Ingår i: 2009 IEEE INTERNATIONAL CONFERENCE ON 3D SYSTEMS INTEGRATION, San Francisco: IEEE conference proceedings, 2009, s. 345-351Konferensbidrag (Refereegranskat)
    Abstract [en]

    The physical performance of a 3-Dimensional Network-on-Chip (NoC) mesh architecture employing through silicon vias (TSV) for vertical connectivity is investigated with a cycle-accurate RTL simulator. The physical latency and area impact of TSVs, switches, and the on-chip interconnect is evaluated to extract the maximum signaling speeds through the switches, horizontal and vertical network links. The relatively low parasitics of TSVs compared to the on-chip 2-D interconnect allow for higher signaling speeds between chip layers. The system-level impact on overall network performance as a result of clocking vertical packets at a higher rate through the TSV interconnect is simulated and reported.

  • 46. Hu, Wenmin
    et al.
    Liu, Hengzhu
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Jantsch, Axel
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Fu, Guitao
    Self-selection pseudo-circuit: a clever crossbar pre- allocation2012Ingår i: IEICE Electronics Express, ISSN 1349-2543, Vol. 9, nr 6, s. 558-564Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    This paper proposes self-selection pseudo- circuit (SP), a simple and effective approach to increase switch connection reusing rate and improve the network performance. It especially suits the network in which the performance is dominated by the number of hops. In SP scheme, multiple switch connections are allowed to be reserved for one inport, and the flit can reuse the partial switch connection(s) based on the routing information. For the evaluation with the traces from Splash-2, SP reduces the interconnection latency by up to 21.6% (16.9% average) with 16-core CMP configuration, and 22.2% ( 19.5 on average) with 64- core CMP configuration. Evaluated with synthetic traffic, the proposed scheme decreases the latency up to 19% ( 16% average).

  • 47.
    Hu, Wenmin
    et al.
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem. School of Computer, National University of Defense Technology, Changsha, China .
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Jantsch, Axel
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Liu, H.
    Wang, S.
    Liu, D.
    A flexible configuration approach for fault-tolerant multicast/unicast2011Ingår i: IEEE Int. Conf. Commun. Softw. Networks, ICCSN, 2011, s. 393-396Konferensbidrag (Refereegranskat)
    Abstract [en]

    A flexible approach for lookup table configuration is proposed. In this scheme, a predetermined path is setup in parallel by several unicastsetup packets. Compared with other approaches, our scheme eliminates the overhead of configuration bus by adding little logic to existing multicast router based on lookup table. This extension makes any-shaped path setup possible, which benefits fault-tolerance on Network-on-Chip (NoC).

  • 48.
    Hu, Wenmin
    et al.
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Jantsch, Axel
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Liu, Hengzhu
    Power-efficient Tree-based Multicast Support for Networks-on-Chip2011Ingår i: Proceedings of the Asian Pacific Design Automation Conference (ASPDAC), 2011, s. 363-368Konferensbidrag (Refereegranskat)
    Abstract [en]

    In this paper, a novel hardware support for multicast on mesh Networks-on-Chip (NoC) is proposed. It supports multicast routing on any shape of tree-based paths. Two power-efficient tree-based multicast routing algorithms, Optimized tree (OPT) and Left-XY-Right-Optimized tree (LXYROPT) are also proposed. XY tree-based (XYT) algorithm and multiple unicast copies (MUC) are also implemented on the router as baselines. Along with the increase of the destination size, compared with MUC, OPT and LXYROPT achieve a remarkable improvement in both latency and throughput while the average power consumption is reduced by 50% and 45%, respectively. Compared with XYT, OPT is 10% higher in latency but gains 17% saving in power consumption. LXYROPT is 3% lower in latency and 8% lower in power consumption. In some cases, OPT and LXYROPT give power saving up to 70% less than the XYT.

  • 49.
    Hu, Wenmin
    et al.
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Jantsch, Axel
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Liu, Hengzhu
    Zhang, Botao
    Liu, Dongpei
    Network-on-Chip Multicasting with Low Latency Path Setup2011Ingår i: Proceedings of the VLSI-SoC Conference, 2011Konferensbidrag (Refereegranskat)
    Abstract [en]

    A low-latency path setup approach with multiple setup packets for parallel set is presented. It reduces the header overhead compared to multiaddress encoding. Further, we propose four variants of deadlock-free multicast routing algorithms using different subpath generation methods, different destination partitioning, and channel sharing strategies. Experimental results show that the quatuor partitions path-like tree outperforms other algorithms.

  • 50. Hu, Wenmin
    et al.
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Liu, Hengzhu
    Jantsch, Axel
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Multicast Path Setup Incorporating Evicting2012Ingår i: Elektronika ir Elektrotechnika, ISSN 1392-1215, nr 8, s. 101-104Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    In this paper, we propose a novel multicast path setup scheme, which incorporates the evicting process. Compared with the previous work, our scheme either overcomes the limitation in evicting times or reduces the setup latency.

1234 1 - 50 av 194
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf