Change search
Refine search result
12345 1 - 50 of 241
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the 'Create feeds' function.
  • 1.
    Al Khatib, Iyad
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Bertozzi, Davide
    Poletti, Francesco
    Benini, Luca
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Bechara, Mohamed
    Khalifeh, Hasan
    Hajjar, Mazen
    Nabiev, Rustam
    Jonsson, Sven
    Hardware/Software architecture for real-time ECG monitoring and analysis leveraging MPSoC technology2007In: Transactions on High-Performance Embedded Architectures and Compilers I / [ed] tenstrom, P; OBoyle, M; Bodin, F; Cintra, M; McKee, SA, 2007, Vol. 4050, 239-258 p.Conference paper (Refereed)
    Abstract [en]

    The interest in high performance chip architectures for biomedical applications is gaining a lot of research and market interest. Heart diseases remain by far the main cause of death and a challenging problem for biomedical engineers to monitor and analyze. Electrocardiography (ECG) is an essential practice in heart medicine. However, ECG analysis still faces computational challenges, especially when 12 lead signals are to be analyzed in parallel, in real time, and under increasing sampling frequencies. Another challenge is the analysis of huge amounts of data that may grow to days of recordings. Nowadays, doctors use eyeball monitoring of the 12-lead ECG paper readout, which may seriously impair analysis accuracy. Our solution leverages the advance in multi-processor system-on-chip architectures, and it is centered on the parallelization of the ECG computation kernel. Our Hardware- Software (HW/SW) Multi-Processor System-on-Chip (MPSoQ design improves upon state-of-the-art mostly for its capability to perform real-time analysis of input data, leveraging the computation horsepower provided by many concurrent DSPs, more accurate diagnosis of cardiac diseases, and prompter reaction to abnormal heart alterations. The design methodology to go from the 12-lead ECG application specification to the final HW/SW architecture is the focus of this paper. We explore the design space by considering a number of hardware and software architectural variants, and deploy industrial components to build up the system.

  • 2.
    Al Khatib, Iyad
    et al.
    KTH, School of Information and Communication Technology (ICT), Microelectronics and Information Technology, IMIT.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Microelectronics and Information Technology, IMIT.
    Nabiev, Rustam
    KTH, School of Information and Communication Technology (ICT), Microelectronics and Information Technology, IMIT.
    ECG-BIONET: A global biomedical network for human heart monitoring and analysis: Performance needs of an electrocardiogram Telemedicine platform for medical aid at the point-of-need2006In: 25TH IEEE INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS: VOLS 1-7, PROCEEDINGS IEEE INFOCOM 2006, New York: IEEE , 2006, 3282-3283 p.Conference paper (Refereed)
    Abstract [en]

    In this paper, we propose a Tele-medicine application platform as a medical aid for patients suffering from Heart malfunction. We focus on heart diseases since they remain by far the major cause of death in the globe. Our solution utilizes the Satellite communication protocol DVB-RCS (Digital Video Broadcast- Return Channel Satellite), Wi-Fi, and the Network-on-Chip (NoC) technology. We utilize the 12-lead ECG biomedical technique to detect heart disorders via the biomedical NoC, which transmits the medical alarm and results via the biomedical network, ECG-BIONET. We do not investigate the DVB-RCS standard or Wi-Fi technology, but rather we try to utilize this technology, and we look at it from a performance point of view for our application by investigating three parameters, namely: delay, packet loss, and reliability. We follow a top down approach by looking at the needs of the application from a performance guarantee for our specific-purpose network.

  • 3.
    Al-Khatib, Iyad
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Bertozzi, Davide
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Benini, Luca
    Performance Analysis and Design Space Exploration for High-End Biomedical Applications: Challenges and Solutions2007In: Proceedings of the International Conference on Hardware - Software Codesign and System Synthesis, 2007, 217-226 p.Conference paper (Refereed)
    Abstract [en]

    High-end biomedical applications are a good target for specific-purpose system-on-chip (SoC) implementations. Human heart electrocardiogram (ECG) real-time monitoring andanalysis is an immediate example with a large potential market. Today, the lack of scalable hardware platforms limits real-time analysis capabilities of most portable ECG analyzers, and prevents the upgrade of analysis algorithms for better accuracy. Multiprocessor system-on-chip (MPSoC) technology, which is becoming main-stream in the domain of high-performance microprocessors, is becoming attractive even for power-constrained portable applications, due to the capability to provide scalable computation horsepower at an affordable power cost. This paper illustrates one of the first comprehensive HW/SW exploration frameworks to fully exploit MPSoC technology to improve the quality of real-time ECG analysis.

  • 4.
    Al-Khatib, Iyad
    et al.
    KTH, School of Information and Communication Technology (ICT), Microelectronics and Information Technology, IMIT.
    Poletti, Francesco
    Bertozzi, Davide
    Benini, Luca
    Bechara, Mohamed
    Khalifeh, Hasan
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Microelectronics and Information Technology, IMIT.
    Nabiev, Rustam
    A Multiprocessor System-on-Chip for Real-Time Biomedical Monitoring and Analysis: Architectural Design Space Exploration2006In: DAC '06: Proceedings of the 43rd annual Design Automation Conference, 2006, 125-130 p.Conference paper (Refereed)
    Abstract [en]

    In this paper we focus on MPSoC architectures for human heart ECGreal-time monitoring and analysis. This is a very relevant bio-medicalapplication, with a huge potential market, hence it is an ideal targetfor an application-specific SoC implementation. We investigate asymmetric multi-processor architecture based on STMicroelectronicsVLIW DSPs that process in real-time 12-lead ECG signals. Thisarchitecture improves upon state-of-the-art SoC designs for ECGanalysis in its ability to analyze the full 12 leads in real-time, evenwith high sampling frequencies, and ability to detect heartmalfunction. We explore the design space by considering a number ofhardware and software architectural options.

  • 5. Anagnostopoulos, Iraklis
    et al.
    Xydis, Sotirios
    Bartzas, Alexandros
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Soudris, Dimitrios
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Custom Microcoded Dynamic Memory Management for Distributed On-Chip Memory Organizations2011In: IEEE Embedded Systems Letters, ISSN 1943-0663, Vol. 3, no 2, 66-69 p.Article in journal (Refereed)
    Abstract [en]

    Multiprocessor system-on-chip (MPSoCs) have attracted significant attention since they are recognized as a scalable paradigm to interconnect and organize a high number of cores. Current multicore embedded systems exhibit increased levels of dynamicbehavior, leading to unexpected memory footprint variations unknown at design time.Dynamic memory management (DMM) is a promising solution for such types of dynamicsystems. Although some efficient dynamic memory managers have been proposed for conventional bus-based MPSoC platforms, there are no DMM solutions regarding the constraints and the opportunities delivered by the physical distribution of multiple memorynodes of the platform. In this work, we address the problem of providing customizedmicrocoded DMM on MPSoC platforms with distributed memory organization. Customization is enabled at application-and platform-level. Results show that customizedmicrocoded DMM can serve approximately 7× more allocation requests compared to puredistributed memory platforms and perform 25% faster than the corresponding high-level implementation in C language. 

  • 6.
    Badlund, Per
    et al.
    KTH.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    An analytical approach for dimensioning mixed traffic networks2007In: NOCS 2007: First International Symposium on Networks-on-Chip, Proceedings, 2007, 215-215 p.Conference paper (Refereed)
    Abstract [en]

    We present an analytical method for analyzing and dimensioning a network based communication architecture. The method is based on the classic (a, p) network calculus. We use a TDMA approach for creating logically separated networks which makes statistical methods possible for calculations on Best Effort traffic, and supports implementation of Guaranteed Bandwidth services by using Virtual Circuits with Looped Containers.

  • 7. Bjureus, P.
    et al.
    Jantsch, Axel
    KTH, Superseded Departments, Microelectronics and Information Technology, IMIT.
    Modeling of mixed control and dataflow systems in MASCOT2001In: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, Vol. 9, no 5, 690-703 p.Article in journal (Refereed)
    Abstract [en]

    The Matlab and SDL Codesign Technique (MASCOT) method integrates modeling of data flow and control dominated parts at the system level. Based on the established languages specification and description language (SDL) and Matlab, MASCOT provides a modeling and simulation technique which realizes the communication and synchronization between the two domains. Moreover, it offers modeling guidelines for a disciplined and efficient way of using the technique. Most of the tedious details of modeling synchronization and communication is handled automatically and is transparent to the user. Consequently, the user can focus on the application and on the important tradeoffs to be made at the system level.

  • 8. Candaele, Bernard
    et al.
    Aguirre, Sylvain
    Sarlotte, Michel
    Anagnostopoulos, Iraklis
    Xydis, Sotirios
    Bartzas, Alexandros
    Bekiaris, Dimitris
    Soudris, Dimitrios
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Chen, Xiaowen
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Chabloz, Jean-Michel
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Vanmeerbeeck, Geert
    Kreku, Jari
    Tiensyrja, Kari
    Ieromnimon, Fragkiskos
    Kritharidis, Dimitrios
    Wiefrink, Andreas
    Vanthournout, Bart
    Martin, Philippe
    Mapping Optimisation for Scalable multi-core ARchiTecture: The MOSART approach2010In: Proceedings - IEEE Annual Symposium on VLSI, ISVLSI 2010, 2010, 518-523 p.Conference paper (Refereed)
    Abstract [en]

    The project will address two main challenges of prevailing architectures: 1) The global Interconnect and memory bottleneck due to a single, globally shared memory with high access times and power consumption; 2) The difficulties in programming heterogeneous, multi-core platforms, in particular in dynamically managing data structures in distributed memory. MOSART aims to overcome these through a multi-core architecture with distributed memory organisation, a Network-on-Chip (NoC) communication backbone and configurable processing cores that are scaled, optimised and customised together to achieve diverse energy, performance, cost and size requirements of different classes of applications. MOSART achieves this by: A) Providing platform support for management of abstract data structures Including middleware services and a run-time data manager for NoC based communication infrastructure; 2) Developing tool support for parallelizing and mapping applications on the multi-core target platform and customizing the processing cores for the application.

  • 9. Candaele, Bernard
    et al.
    Aguirre, Sylvain
    Sarlotte, Michel
    Anagnostopoulos, Iraklis
    Xydis, Sotirios
    Bartzas, Alexandros
    Bekiaris, Dimitris
    Soudris, Dimitrios
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Chen, Xiaowen
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Chabloz, Jean-Michel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Vanmeerbeeck, Geert
    Kreku, Jari
    Tiensyrja, Kari
    Ieromnimon, Fragkiskos
    Kritharidis, Dimitrios
    Wiefrink, Andreas
    Vanthournout, Bart
    Martin, Philippe
    The MOSART Mapping Optimization for multi-core Architectures2011In: VLSI 2010 Annual Symposium, Springer Publishing Company, 2011, 181-195 p.Conference paper (Refereed)
    Abstract [en]

    MOSART project addresses two main challenges of prevailing architectures: (i) Theglobal interconnect and memory bottleneck due to a single, globally shared memorywith high access times and power consumption; (ii) The difficulties in programmingheterogeneous, multi-core platforms MOSART aims to overcome these through amulti-core architecture with distributed memory organization, a Network-on-Chip(NoC) communication backbone and configurable processing cores that are scaled,optimized and customized together to achieve diverse energy, performance, cost andsize requirements of different classes of applications. MOSART achieves this by:(i) Providing platform support for management of abstract data structures includingmiddleware services and a run-time data manager for NoC based communicationinfrastructure; (ii) Developing tool support for parallelizing and mapping applicationson the multi-core target platform and customizing the processing cores for theapplication.

  • 10.
    Chen, Xiaowen
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Chen, Shuming
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Area and Performance Optimization of Barrier Synchronization on Multi-core Network-on-Chips2010In: 3rd IEEE International Conference on Computer and Electrical Engineering (ICCEE), 2010Conference paper (Refereed)
    Abstract [en]

    Barrier synchronization is commonly and widelyused to synchronize the execution of parallel processor coreson multi-core Network-on-Chips (NoCs). Since its globalnature may cause heavy serialization resulting in largeperformance penalty, barrier synchronization should becarefully designed to have low latency communication and tominimize overall completion time. Therefore, in the paper, wepropose a fast barrier synchronization mechanism, targetingMulti-core NoCs. The fast barrier synchronization mechanismincludes a dedicated hardware module, named Fast BarrierSynchronizer (FBS), integrated with each processor node. Itoffers a set of barrier counters and can concurrently processsynchronization requests issued by the local node and remotenodes via the on-chip network. The salient feature of our fastbarrier synchronization mechanism is that, once the barriercondition is reached, the “barrier release” acknowledgement isrouted to all processor nodes in a broadcast way in order tosave chip area by avoiding storing source node informationand to minimize completion time by avoiding serialization ofbarrier releasing. Synthesis results suggest that the FBS canrun over 1 GHz in SMIC® 130nm technology with small areaoverhead. We implemented a FBS-enhanced multi-core NoCarchitecture on our FPGA platform using the Xilinx® Virtex 5as the FPGA chip. FPGA utilization and simulation resultsshow that our fast barrier synchronization demonstrates botharea and performance advantages over the barriersynchronization counterpart with unicast barrier releasing.

  • 11.
    Chen, Xiaowen
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Chen, Shuming
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hybrid distributed shared memory space in multi-core processors2011In: Journal of Software, ISSN 1796-217X, Vol. 6, no 12 SPEC. ISSUE, 2369-2378 p.Article in journal (Refereed)
    Abstract [en]

    On multi-core processors, memories are preferably distributed and supporting Distributed Shared Memory (DSM) is essential for the sake of reusing huge amount of legacy code and easy programming. However, the DSM organization imports the inherent overhead of translating virtual memory addresses into physical memory addresses, resulting in negative performance. We observe that, in parallel applications, different data have different properties (private or shared). For the private data accesses, it's unnecessary to perform Virtual-to-Physical address translations. Even for the same datum, its property may be changeable in different phases of the program execution. Therefore, this paper focuses on decreasing the overhead of Virtualto- Physical address translation and hence improving the system performance by introducing hybrid DSM organization and supporting run-time partitioning according to the data property. The hybrid DSM organization aims at supporting fast and physical memory accesses for private data and maintaining a global and single virtual memory space for shared data. Based on the data property of parallel applications, the run-time partitioning supports changing the hybrid DSM organization during the program execution. It ensures fast physical memory addressing on private data and conventional virtual memory addressing on shared data, improving the performance of the entire system by reducing virtual-to-physical address translation overhead as much as possible. We formulate the run-time partitioning of hybrid DSM organization in order to analyze its performance. A real DSM based multi-core platform is also constructed. The experimental results of real applications show that the hybrid DSM organization with run-time partitioning demonstrates performance advantage over the conventional DSM counterpart. The percentage of performance improvement depends on problem size, way of data partitioning and computation/communication ratio of parallel applications, network size of the system, etc. In our experiments, the maximal improvement is 34.42%, the minimal improvement 3.68%.

  • 12.
    Chen, Xiaowen
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Chen, Shuming
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Xu, Bangjian
    Luo, Heng
    Multi-FPGA Implementation of a Network-on-Chip Based Many-core Architecture with Fast Barrier Synchronization Mechanism2010In: Proceedings of the IEEE Norchip Conference, 2010Conference paper (Refereed)
    Abstract [en]

    In this paper, we propose a fast barrier synchronization mechanism, targetingNetwork-on-Chip based manycore architectures. Its salient feature is that, once thebarrier condition is reached, the "barrier release" acknowledgement is routed to all processor nodes in a broadcast way in order to save area by avoiding storing source node information and to minimize completion time by eliminating serialization of barrierreleasing. Then, we construct a multi-FPGA platform using Xilinx® Virtex 5 as FPGA chipsand implement a NoC based many-core architecture on it. FPGA utilization and simulation results show that our mechanism demonstrates both area and performance advantages over the barrier synchronization counterpart with unicast barrier releasing. 

  • 13.
    Chen, Xiaowen
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Chen, Shuming
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Run-time Partitioning of Hybrid Distributed Shared Memory on Multi-core Network-on-Chips2010In: The 3rd IEEE International Symposium on Parallel Architectures, Algorithms and Programming (PAAP 2010), 2010, 39-46 p.Conference paper (Refereed)
    Abstract [en]

    On multi-core Network-on-Chips (NoCs), mem- ories are preferably distributed and supporting Distributed Shared Memory (DSM) is essential for the sake of reusing huge amount of legacy code and easy programming. However, the DSM organization imports the inherent overhead of translating virtual memory addresses into physical memoryaddresses, resulting in negative performance. We observe that, in parallel applications, different data have different properties (private or shared). For the private data accesses, it's unnecessary to perform Virtual-to-Physical address translations. Even for the same datum, its property may be changeable in different phases of the program execution. Therefore, this paper focuses on decreasing the overhead of Virtual-to-Physical address translation and hence improving the system performance by introducing hybrid DSM organization and supporting run-time partitioning according to the data property. Thehybrid DSM organization aims at supporting fast and physical memory accesses for private data and maintaining a global and single virtual memory space for shared data. Based on the data property of parallel applications, the run-time partitioning supports changing the hybrid DSM organization during the program execution. It ensures fast physical memory addressing on private data and conventional virtual memory addressingon shared data, improving the performance of the entire system by reducing virtual-to-physical address translation overhead as much as possible. We formulate the run-timepartitioning of hybrid DSM organization in order to analyze its perfor- mance. A real DSM based multi-core NoC platform is also constructed. The experimental results of real applications show that the hybrid DSM organization with run-time partitioningdemonstrates performance advantage over the conventional DSM counterpart. The percentage of performance improve- ment depends on problem size, way of datapartitioning and computation/ communication ratio of parallel applications, network size of the system, etc. In our experiments, the maximal improvement is 34.42%, the minimal improvement 3.68%.

  • 14.
    Chen, Xiaowen
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Chen, Shuming
    Handling Shared Variable Synchronization in Multi-core Network-on-Chips with Distributed Memory2010In: Proceedings: IEEE International SOC Conference, SOCC 2010, 2010, 467-472 p.Conference paper (Refereed)
    Abstract [en]

    Parallelized shared variable applications running on multi-core Network-on-Chips(NoCs) require efficient support for synchronization, since communication is on the critical path of system performance and contended synchronization requests may cause large performance penalty. In this paper, we propose a dedicated hardware module forsynchronization management. This module is called Synchronization Handler (SH), integrated with each processor-memory node on the multi-core NoCs. It uses two physical buffers to concurrently process synchronization requests issued by the local processor and remote processors via the on-chip network. One salient feature is that the two physical buffers are dynamically allocated to form multiple virtual buffers (a virtual buffer is related to a shared synchronization variable) so as to improve the buffer utilization and alleviate the head-of-line blocking. Synthesis results suggest that the SH can run over 900 MHz in 130nm technology with small area overhead. To justify the SH-enhanced multicore NoCs, we employ synthetic workloads to evaluate synchronizationcost and buffer utilization, and run synchronization-intensive applications to investigate speedup. The results show that our approach is viable.

  • 15.
    Chen, Xiaowen
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Chen, Shuming
    Speedup Analysis of Data-parallel Applications on Multi-core NoCs2009In: Proceedings of the IEEE International Conference on ASIC (ASICON), 2009, 105-108 p.Conference paper (Refereed)
    Abstract [en]

    As more computing cores are integrated onto a single chip, the effect of network communication latency is becoming more and more significant on Multi-core Network-onChips (NoCs). For data-parallel applications, we study the model ofparallel speedup by including network communication latency in Amdahl's law. The speedup analysis considers the effect of network topology, network size, traffic model and computation/communication ratio. We also study the speedup efficiency. In our Multi-core NoC platform, a real data-parallel application, i.e. matrix multiplication, is used to validate the analysis. Our theoretical analysis and the application results show that the speedup improvement is nonlinear and the speedup efficiency decreases as the system size is scaled up. Such analysis can be used to guide architects and programmers to improve parallel processing efficiency by reducing network latency with optimized network design and increasing computation proportion in the program.

  • 16.
    Chen, Xiaowen
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Chen, Shuming
    Supporting Distributed Shared Memory on Multi-core Network-on-Chips Using a Dual Microcoded Controller2010In: Proceedings of the confernece for Design Automation and Test in Europe, 2010, 39-44 p.Conference paper (Refereed)
    Abstract [en]

    Supporting Distributed Shared Memory (DSM) is essential for multi-coreNetwork-on-Chips for the sake of reusing huge amount of legacy code and easy programmability. We propose a microcoded controller as a hardware module in each node to connect the core, the local memory and the network. The controller is programmable where the DSM functions such as virtual-to-physical address translation,memory access and synchronization etc. are realized using microcode. To enable concurrent processing of memory requests from the local and remote cores, ourcontroller features two mini-processors, one dealing with requests from the local coreand the other from remote cores. Synthesis results suggest that the controller consumes 51k gates for the logic and can run up to 455 MHz in 130 nm technology. To evaluate its performance, we use synthetic and application workloads. Results show that, when the system size is scaled up, the delay overhead incurred by the controller may become less significant when compared with the network delay. In this way, the delay efficiency of our DSM solution is close to hardware solutions on average but still have all the flexibility of software solutions.

  • 17.
    Chen, Xiaowen
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Chen, Shuming
    Supporting Efficient Synchronization in Multi-core NoCs Using Dynamic Buffer Allocation Technique2010In: Proceedings of the IEEE Annual Symposium on VLSI, 2010, 462-463 p.Conference paper (Refereed)
    Abstract [en]

    This paper explores a dynamic buffer allocation technique to guide a distributedsynchronization architecture to support efficient synchronization on multi-core Network-on-Chips (NoCs). The synchronization architecture features two physical buffers to be able to concurrently queue and handle synchronization requests issued by the local processor and remote processors via the on-chip network. Using the dynamic bufferallocation technique, the two physical buffers are dynamically allocated to form multiple virtual buffers in order to improve buffers' utilization. Experiments are carried on to evaluate buffers' utilization.

  • 18.
    Chen, Xiaowen
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Chen, Shuming
    Chen, Shenggang
    Gu, Huitao
    Reducing Virtual-to-Physical address translation overhead in Distributed Shared Memory based multi-core Network-on-Chips according to data property2013In: Computers & electrical engineering, ISSN 0045-7906, E-ISSN 1879-0755, Vol. 39, no 2, 596-612 p.Article in journal (Refereed)
    Abstract [en]

    In Network-on-Chip (NoC) based multi-core platforms, Distributed Shared Memory (DSM) preferably uses virtual addressing in order to hide the physical locations of the memories. However, this incurs performance penalty due to the Virtual-to-Physical (V2P) address translation overhead for all memory accesses. Based on the data property which can be either private or shared, this paper proposes a hybrid DSM which partitions a local memory into a private and a shared part. The private part is accessed directly using physical addressing and the shared part using virtual addressing. In particular, the partitioning boundary can be configured statically at design time and dynamically at runtime. The dynamic configuration further removes the V2P address translation overhead for those data with changeable property when they become private at runtime. In the experiments with three applications (matrix multiplication, 2D FFT, and H.264/AVC encoding), compared with the conventional DSM, our techniques show performance improvement up to 37.89%.

  • 19.
    Chen, Xiaowen
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. National University of Defense Technology, China .
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Chen, Shuming
    Guo, Yang
    Liu, Hengzhu
    Cooperative communication for efficient and scalable all-to-all barrier synchronization on mesh-based many-core NoCs2014In: IEICE Electronics Express, ISSN 1349-2543, Vol. 11, no 18, 20140542- p.Article in journal (Refereed)
    Abstract [en]

    On many-core Network-on-Chips (NoCs), communication is on the critical path of system performance and contended synchronization requests may cause large performance penalty. Different from conventional algorithm-based approaches, the paper addresses the barrier synchronization problem from the angle of optimizing its communication performance and proposes cooperative communication as a means to achieve efficient and scalable all-to-all barrier synchronization on mesh-based many-core NoCs. With the cooperative communication, routers collaborate with one another to accomplish a fast barrier synchronization task. The cooperative communication is implemented in our router at low cost. Through comparative experiments, our approach evidently exhibits high efficiency and good scalability.

  • 20.
    Chen, Xiaowen
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Chen, Shuming
    Liu, Hai
    Cooperative communication based barrier synchronization in on-chip mesh architectures2011In: IEICE ELECTRON EXPR, ISSN 1349-2543, Vol. 8, no 22, 1856-1862 p.Article in journal (Refereed)
    Abstract [en]

    We propose cooperative communication as a means to enable efficient and scalable barrier synchronization on mesh-based many-core architectures. Our approach is different from but orthogonal to conventional algorithm-based optimizations. It relies on collaborating routers to provide efficient gather and multicast communication. In conjunction with a master-slave algorithm, it exploits the mesh regularity to achieve efficiency. The gather and multicast functions have been implemented in our router. Synthesis results suggest marginal area overhead. With synthetic and benchmark experiments, we show that our approach significantly reduces synchronization completion time and increases speedup.

  • 21.
    Chen, Zhipeng
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    A Worst Case Performance Model for TDM Virtual Circuit in NoCs2010In: NETWORK AND PARALLEL COMPUTING / [ed] Ding C, Shao ZY, Zheng R, Berlin: Springer Berlin/Heidelberg, 2010, Vol. 6289, 452-461 p.Conference paper (Refereed)
    Abstract [en]

    In Network-on-Chip (NoC), Time-Division-Mutiplexing (TDM) Virtual Circuit (VC) is well recognized as being capable to provide guaranteed services in both latency and bandwidth. We propose a method of modeling TDM based VC by using Network Calculus. We derive a tight upper bound of end-to-end delay and buffer requirement for individual VC. The performance analysis using Latency-Rate server is also presented in comparsion with our Performance model for TDM Virtual Circuit in NoCs (Pemvin). We conducted experiments on comparing Pemvin to the Latency-Rate server model. Our experiment results show the improvement of Pemvin on tightening the upper bound of end-to-end delay and buffer requirement.

  • 22.
    Deb, Abhijit Kumar
    et al.
    KTH, Superseded Departments, Microelectronics and Information Technology, IMIT.
    Jantsch, Axel
    KTH, Superseded Departments, Microelectronics and Information Technology, IMIT.
    Öberg, Johnny
    KTH, Superseded Departments, Microelectronics and Information Technology, IMIT.
    System design for DSP applications in transaction level modeling paradigm2004In: 41st Design Automation Conference, Proceedings 2004, 2004, 466-471 p.Conference paper (Refereed)
    Abstract [en]

    In this paper, we systematically define three transaction level models (TLMs), which reside at different levels of abstraction between the functional and the implementation model of a DSP system. We also show a unique language support to build the TLMs. Our results show that the abstract TLMs can be built and simulated much faster than the implementation model at the expense of a reasonable amount of simulation accuracy.

  • 23.
    Deb, Abhijit Kumar
    et al.
    KTH, Superseded Departments, Microelectronics and Information Technology, IMIT.
    Jantsch, Axel
    KTH, Superseded Departments, Microelectronics and Information Technology, IMIT.
    Öberg, Johnny
    KTH, Superseded Departments, Microelectronics and Information Technology, IMIT.
    System design for DSP applications using the MASIC methodology2004In: DESIGN, AUTOMATION AND TEST IN EUROPE CONFERENCE AND EXHIBITION, VOLS 1 AND 2, PROCEEDINGS, LOS ALAMITOS: IEEE COMPUTER SOC , 2004, 630-635 p.Conference paper (Refereed)
    Abstract [en]

    Expensive top-down iterations are often required in the design cycle of complex DSP systems. In this paper, we introduce two levels of abstraction in the design flow by, systematically categorizing the architectural decisions. As a result, the top-down iteration loop is broken. We also present a technique to capture and inject the architectural decisions such that the system models can be created and simulated efficiently. The concepts are illustrated by a realistic speech processing example, which is implemented using the AMBA on-chip architecture. Our methodology offers a smooth path from the functional modeling phase to the implementation level, facilitates the reuse of HW and SW components, and enjoys existing tool support at the backend.

  • 24. Deivasigamani, M.
    et al.
    Tabatabaei, Shaghayeghsadat
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Mustafa, Naveed Ul
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Ijaz, Hamza
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Aslam, Haris Bin
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Liu, Shaoteng
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Concept and design of exhaustive-parallel search algorithm for Network-on-Chip2011In: Int. Syst. Chip Conf., 2011, 150-155 p.Conference paper (Refereed)
    Abstract [en]

    This paper presents the concept and design of exhaustive-parallel search algorithm for Network-on-Chip. The proposed parallel algorithm searches minimal path between source and destination in a forward-wave-propagation manner. The algorithm guarantees setup latency if the setup path exists. A high performance switch is designed to support exhaustive-parallel search algorithm. The NoC fabric is designed for 88 mesh architecture and its performance is evaluated.

  • 25.
    Ebrahimi, Masoumeh
    et al.
    KTH, School of Information and Communication Technology (ICT), Industrial and Medical Electronics. University of Turku, Finland .
    Wang, J.
    Huang, L.
    Daneshtalab, Masoud
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems. University of Turku, Finland .
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Rescuing healthy cores against disabled routers2014Conference paper (Refereed)
    Abstract [en]

    A router may be temporarily or permanently disabled in NoCs for several reasons such as saving power, occurring faults or testing. Disabling a router, however, may have a severe impact on the performance or functionality of the entire system if it results in disconnecting the core from the network. In this paper, we propose a deadlock-free routing algorithm which allows the core to stay connected to the system and continue its normal operation when its connected router is disabled. Our analysis and experiments show that the proposed technique has 100%, 93.60%, and 87.19% network availability by 100% packet delivery when 1, 2 and 3 routers are defunct or intentionally disabled. The algorithm provides adaptivity and it is lightweight, requiring one and two virtual channels along the X and Y dimension, respectively.

  • 26.
    Ejaz, Ahsen
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Costs and benefits of flexibility in spatial division Circuit Switched Networks-on-Chip2013In: NoCArc '13 Proceedings of the Sixth International Workshop on Network on Chip Architectures, Association for Computing Machinery (ACM), 2013, 41-46 p.Conference paper (Refereed)
    Abstract [en]

    Although most Network-on-Chip (NoC) designs are based on Packet Switching (PS), the importance of Circuit Switching (CS) should not be underestimated. Many MPSoC executing real-time applications require an underlying communication backbone that can relay messages from one node to another with guaranteed throughput. Compared to PS, CS can provide guaranteed throughput with lower area and power overheads. It is also highly suited for applications where nodes transfer long messages. Spatial Division Multiplexing (SDM) can allow more efficient use of available network resources by dividing them among multiple simultaneous transactions. The network developed by Vali [1] has three design variations based on the number of sub-channels, has a predictable connection setup time, and uses CS to provide guaranteed throughput once a connection is established. In this paper we use this network as a basis to study the effect of flexibility based on SDM, on the performance of a CS networks. A network evaluation platform has been developed to configure and evaluate networks with a maximum of 8 sub-networks, with each sub-network comprising of 1, 2 or 4 sub-channels. We show that under uniform traffic pattern with requests of uniform random bandwidth (BW) requirement, a less flexible network outperforms a network with higher flexibility due to a phenomenon we call 'stray requests'. We conclude this paper by showing that under high network traffic, performance of our flexible networks can be as much as 113% better than HAGAR [2] and Liu's [3] network. Co

  • 27.
    Ellervee, Peeter
    et al.
    KTH, Superseded Departments, Electronic Systems Design.
    Jantsch, Axel
    KTH, Superseded Departments, Electronic Systems Design.
    Öberg, Johnny
    KTH, Superseded Departments, Electronic Systems Design.
    Hemani, Ahmed
    KTH, Superseded Departments, Electronic Systems Design.
    Tenhunen, Hannu
    KTH, Superseded Departments, Electronic Systems Design.
    Exploring ASIC Design Space at System Level with a Neural Network Estimator1994In: Proc. of IEEE ASIC-conference, 1994, 1994Conference paper (Refereed)
    Abstract [en]

    Estimators are critical tools in doing architectural level exploration of the design space. We present a novel approach to estimation based on the multilayer perceptron which builds the estimation function during the learning process and thus allows to describe arbitrary complex functions. We also describe how the control data flow graph is encoded for the neural network input and we present results of the first experiments made with realistic design examples.

  • 28.
    Ellervee, Peeter
    et al.
    KTH, Superseded Departments, Electronic Systems Design.
    Öberg, Johnny
    KTH, Superseded Departments, Electronic Systems Design.
    Jantsch, Axel
    KTH, Superseded Departments, Electronic Systems Design.
    Hemani, Ahmed
    KTH, Superseded Departments, Electronic Systems Design.
    Area Estimation in the High Level Synthesis Using Neural Networks1994Conference paper (Refereed)
  • 29.
    Eslami Kiasari, Abbas
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Bekooij, M.
    Burns, A.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Analytical approaches for performance evaluation of networks-on-chip2012In: CASES'12 - Proceedings of the 2012 ACM International Conference on Compilers, Architectures and Synthesis for Embedded Systems, Co-located with ESWEEK, ACM , 2012, 211-212 p.Conference paper (Refereed)
    Abstract [en]

    This tutorial reviews four popular mathematical formalisms - dataflow analysis, schedulability analysis, network calculus, and queueing theory - and how they have been applied to the analysis of Network-on-Chip (NoC) performance. We review the basic concepts and results of each formalism and provide examples of how they have been used in on-chip communication performance analysis. The tutorial also discusses the respective strengths and weaknesses of each formalism, their suitability for a specific purpose, and the attempts that have been made to bridge these analytical approaches. Finally, we conclude the tutorial by discussing open research issues.

  • 30.
    Eslami Kiasari, Abbas
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    A Framework for Designing Congestion-Aware Deterministic Routing2010In: NoCArc '10 Proceedings of the Third International Workshop on Network on Chip Architectures, 2010, 45-50 p.Conference paper (Refereed)
    Abstract [en]

    In this paper, we present a system-level Congestion-Aware Routing (CAR) framework for designing minimal deterministic routing algorithms. CAR exploits the peculiarities of the application workload to spread the load evenly across the network. To this end, we first formulate an optimization problem of minimizing the level of congestion in the network and then use the simulated annealing heuristic to solve this problem. The proposed framework assures deadlock-free routing, even in the networks without virtual channels. Experiments with both synthetic and realistic workloads show the effectiveness of the CAR framework. Results show that maximum sustainable throughput of the network is improved by up to 205% for different applications and architectures.

  • 31.
    Eslami Kiasari, Abbas
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    A Heuristic Framework for Designing and Exploring Deterministic Routing Algorithm for NoCs2013In: Algorithms in Networks-on-Chip, Springer, 2013, 21-39 p.Chapter in book (Refereed)
    Abstract [en]

    In this chapter, we present a system-level framework for designing minimal deterministic routing algorithms for Networks-on-Chip (NoCs) that are customized for a set of applications. To this end, we first formulate an optimization problem of minimizing average packet latency in the network and then use the simulated annealing heuristic to solve this problem. To estimate the average packet latency we use a queueing-based analytical model which can capture the burstiness of the traffic. The proposed framework does not require virtual channels to guarantee deadlock freedom since routes are extracted from an acyclic channel dependency graph. Experiments with both synthetic and realistic workloads show the effectiveness of the approach. Results show that maximum sustainable throughput of the network is improved for different applications and architectures.

  • 32.
    Eslami Kiasari, Abbas
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Mathematical formalisms for performance evaluation of networks-on-chip2013In: ACM Computing Surveys, ISSN 0360-0300, E-ISSN 0010-4892, Vol. 45, no 3, 38- p.Article in journal (Refereed)
    Abstract [en]

    This article reviews four popular mathematical formalisms-queueing theory, network calculus, schedulability analysis, anddataflow analysis-and how they have been applied to the analysis of on-chip communication performance in Systems-on-Chip. The article discusses the basic concepts and results of each formalism and provides examples of how they have been used in Networks-on-Chip (NoCs) performance analysis. Also, the respective strengths and weaknesses of each technique and its suitability for a specific purpose are investigated. An open research issue is a unified analytical model for a comprehensive performance evaluation of NoCs. To this end, this article reviews the attempts that have been made to bridge these formalisms.

  • 33.
    Eslami Kiasari, Abbas
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    An Analytical Latency Model for Networks-on-Chip2013In: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, Vol. 21, no 1, 113-123 p.Article in journal (Refereed)
    Abstract [en]

    We propose an analytical model based on queueing theory for delay analysis in a wormhole-switched network-on-chip (NoC). The proposed model takes as input an application communication graph, a topology graph, a mapping vector, and a routing matrix, and estimates average packet latency and router blocking time. It works for arbitrary network topology with deterministic routing under arbitrary traffic patterns. This model can estimate per-flow average latency accurately and quickly, thus enabling fast design space exploration of various design parameters in NoC designs. Experimental results show that the proposed analytical model can predict the average packet latency more than four orders of magnitude faster than an accurate simulation, while the computation error is less than 10% in non-saturated networks for different system-on-chip platforms.

  • 34. Feng, C.
    et al.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Zhang, M.
    Xing, Z.
    Addressing transient and permanent faults in NoC with efficient fault-tolerant deflection router2013In: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, Vol. 21, no 6, 1053-1066 p.Article in journal (Refereed)
    Abstract [en]

    Continuing decrease in the feature size of integrated circuits leads to increases in susceptibility to transient and permanent faults. This paper proposes a fault-tolerant solution for a bufferless network-on-chip, including an on-line fault-diagnosis mechanism to detect both transient and permanent faults, a hybrid automatic repeat request, and forward error correction link-level error control scheme to handle transient faults and a reinforcement-learning-based fault-tolerant deflection routing (FTDR) algorithm to tolerate permanent faults without deadlock and livelock. A hierarchical-routing-table-based algorithm (FTDR-H) is also presented to reduce the area overhead of the FTDR router. Synthesized results show that, compared with the FTDR router, the FTDR-H router can reduce the area by 27% in an 8×8 network. Simulation results demonstrate that under synthetic workloads, in the presence of permanent link faults, the throughput of an 8×8 network with FTDR and FTDR-H algorithms are 14% and 23% higher on average than that with the fault-on-neighbor (FoN) aware deflection routing algorithm and the cost-based deflection routing algorithm, respectively. Under real application workloads, the FTDR-H algorithm achieves 20% less hop counts on average than that of the FoN algorithm. For transient faults, the performance of the FTDR router can achieve graceful degradation even at a high fault rate. We also implement the fault-tolerant deflection router which can achieve 400 MHz in TSMC 65-nm technology.

  • 35.
    Feng, Chaochao
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Li, Jinwen
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Zhang, Minxuan
    Evaluation of Deflection Routing on Various NoC Topologies2011In: Proceedings of the IEEE International Conference on ASIC (ASICON), 2011Conference paper (Refereed)
    Abstract [en]

    In this paper, we propose two novel deflection routing algorithms for de Bruijn and Spidergon NoCs and evaluate the performance of the deflection routing on 5 NoC topologies with different synthetic traffic patterns. We also synthesize the routers in various NoC topologies with TSMC 65nm technology. The evaluation results illustrate that the performance of deflection routing is susceptible to the network topology and traffic pattern. The results can also guide the NoC architect to choose the suitable NoC topology for the specific application.

  • 36.
    Feng, Chaochao
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Liao, Z.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Zhao, Z.
    Performance analysis of on-chip bufferless router with multi-ejection ports2015In: Proceedings - 2015 IEEE 11th International Conference on ASIC, ASICON 2015, IEEE conference proceedings, 2015Conference paper (Refereed)
    Abstract [en]

    In general, the bufferless NoC router has only one local output port for ejection, which may lead to multiple arriving flits competing for the only one output port. In this paper, we propose a reconfigurable bufferless router in which the number of ejection ports can be configured as 2, 3 and 4. Simulation results demonstrate that the average packet latency of the routers with multi-ejection ports is 18%, 10%, 6%, 14%, 9% and 7% on average less than that of the router with 1 ejection ports under six synthetic workloads respectively. For application workloads, the average packet latency of the router with more than two ejection ports is slightly better than the router with only one ejection port, which can be neglect. Making a compromise of hardware cost and performance, it can be concluded that it is no need to implement bufferless routers with 3 and 4 ejection ports, as the router with 2 ejection ports can achieve almost the same performance as the routers with 3 and 4 ejection ports.

  • 37.
    Feng, Chaochao
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Li, Jinwen
    Zhang, Minxuan
    A Reconfigurable Fault-tolerant Deflection Routing Algorithm Based on Reinforcement Learning for Networks-on-Chip2010In: Proceedings of the International Workshop on Network on Chip Architectures (NoCArc), 2010Conference paper (Refereed)
  • 38.
    Feng, Chaochao
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Li, Jinwen
    Zhang, Minxuan
    FoN: Fault-on-Neighbor aware Routing Algorithm for Networks-on-Chip2010In: Proceedings - IEEE International SOC Conference, SOCC 2010, 2010, 441-446 p.Conference paper (Refereed)
    Abstract [en]

    Reliability has become a key issue of Networks-on-Chip (NoC) as the CMOS technology scales down to the nanoscale domain. This paper proposes a Fault-on-Neighbor (FoN) aware deflection routing algorithm for NoC which makes routing decision based on the link status of neighbor switches within 2 hops to avoid fault links and switches. Simulation results demonstrate that in the presence of faults, the saturated throughput of the FoN switch is 13% higher on average than a cost-based deflection switch for 88 mesh. The average hop counts can be up to 1.7 less than the cost-based switch. The FoN switch is also synthesized using 65nm TSMC technology and it can work at 500MHz with small area overhead.

  • 39.
    Feng, Chaochao
    et al.
    National University of Defense Technology, China.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Zhang, Minxuan
    A 1-Cycle 1.25 GHz Bufferless Router for 3D Network-on-Chip2012In: IEICE transactions on information and systems, ISSN 0916-8532, E-ISSN 1745-1361, Vol. E95D, no 5, 1519-1522 p.Article in journal (Refereed)
    Abstract [en]

    In this paper, we propose a 1-cycle high-performance 3D bufferless router with a 3-stage permutation network. The proposed router utilizes the 3-stage permutation network instead of the serialized switch allocator and 7 x 7 crossbar to achieve the frequency of 1.25 GHz in TSMC 65 nm technology. Compared with the other two 3D bufferless routers, the proposed router occupies less area and consumes less power consumption. Simulation results under both synthetic and application workloads illustrate that the proposed router achieves less average packet latency than the other two 3D bufferless routers.

  • 40.
    Feng, Chaochao
    et al.
    National University of Defense Technology, China.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Zhang, Minxuan
    Yang, Xianju
    Support Efficient and Fault-Tolerant Multicast in Bufferless Network-on-Chip2012In: IEICE transactions on information and systems, ISSN 0916-8532, E-ISSN 1745-1361, Vol. E95D, no 4, 1052-1061 p.Article in journal (Refereed)
    Abstract [en]

    In this paper, we propose three Deflection-Routing-based Multicast (DRM) schemes for a bufferless NoC. The DRM scheme without packets replication (DRM_noPR) sends multicast packet through a non-deterministic path. The DRM schemes with adaptive packets replication (DRM_PR_src and DRM_PR_all) replicate multicast packets at the source or intermediate node according to the destination position and the state of output ports to reduce the average multicast latency. We also provide fault-tolerant supporting in these schemes through a reinforcement-learning-based method to reconfigure the routing table to tolerate permanent faulty links in the network. Simulation results illustrate that the DRM_PR_all scheme achieves 41%, 43% and 37% less latency on average than that of the DRM_noPR scheme and 27%, 29% and 25% less latency on average than that of the DRM_PR_src scheme under three synthetic traffic patterns respectively. In addition, all three fault-tolerant DRM schemes achieve acceptable performance degradation at various link fault rates without any packet lost.

  • 41. Feng, Chaochao
    et al.
    Zhang, Minxuan
    Li, Jinwen
    Jiang, Jiang
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    A Low-overhead Fault-aware Deflection Routing Algorithm for 3D Network-on-Chip2011In: Proceedings - 2011 IEEE Computer Society Annual Symposium on VLSI, ISVLSI 2011, 2011, 19-24 p.Conference paper (Refereed)
  • 42. Forsell, Martti
    et al.
    Soininen, Juha-Pekka
    Tiensyriä, Kari
    Jantsch, Axel
    KTH, Superseded Departments, Microelectronics and Information Technology, IMIT.
    Kronlöf, Klaus
    Hadjiski, Bojidar
    Networks on Chip: Approaches and Challenges2004In: Research and Development Activities in Telecommunication Systems, VTT Electronics , 2004, 55-61 p.Chapter in book (Refereed)
  • 43. Grange, Matt
    et al.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Weerasekera, Roshan
    Pamunuwa, Dinesh
    Modeling the Computational Efficiency of 2-D and 3-D Silicon Processors for Early-Chip Planning2011In: 2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2011, 310-317 p.Conference paper (Refereed)
    Abstract [en]

    Hierarchical models from physical to system-level are proposed for architectural exploration of high-performance silicon systems to quantify the performance and cost trade offs for 2-D and 3-D IC implementations. We show that 3-D systems can reduce interconnect delay and energy by up to an order of magnitude over 2-D, with an increase of 20-30% in performance-per-watt for every doubling of stack height. Contrary to previous analysis, the improved energy efficiency is achievable at a favorable cost. The models are packaged as a standalone tool and can provide fast estimation of coarse-grain performance and cost limitations for a variety of processing systems to be used at the early chip-planning phase of the design cycle.

  • 44. Grange, Matt
    et al.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Weerasekera, Roshan
    Pamunuwa, Dinesh
    Modeling the Efficiency of Stacked Silicon Systems: Computational, Thermal and Electrical Performance2011Conference paper (Refereed)
    Abstract [en]

    Technological advances in processor design have typically reliedon scaling feature size and frequency. Recently however, many new design choiceshave emerged partly due to the slowing of scaling:– Many-core architectures arebeginning to replace single-core ICs to circumvent 2-D bottlenecks, The number ofI/Os are on the rise, so the cost of off-chip transactions is becoming heftier. Moreover,3-D Integration may provide further performance benefits without investment in lowertechnology nodes. Understanding these trade-offs can provide guidelines to optimizethe architecture of future systems under performance, thermal and cost constraints.We have constructed a model and tool that assesses computational efficiency underthese criteria.

  • 45. Grange, Matt
    et al.
    Weerasekera, Roshan
    Pamunuwa, Dinesh
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Weldezion, Awet Yemane
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Optimal Network Architectures for Minimizing Average Distance in k-ary n-dimensional Mesh Networks2011In: NOCS 2011: The 5th ACM/IEEE International Symposium on Networks-on-Chip, ACM Digital Library, 2011, 57-64 p.Conference paper (Refereed)
    Abstract [en]

    A general expression for the average distance for meshes of any dimension and radix, including unequal radices in different dimensions, valid for any traffic pattern under zero-load condition is formulated rigorously to allow its calculation without network-level simulations. The average distance expression is solved analytically for uniform random traffic and for a set of local random traffic patterns. Hot spot traffic patterns are also considered and the formula is empirically validated by cycle true simulations for uniform random, local, and hot spot traffic. Moreover, a methodology to attain closed-form solutions for other traffic patterns is detailed. Furthermore, the model is applied to guide design decisions. Specifically, we show that the model can predict the optimal 3-D topology for uniform and local traffic patterns. It can also predict the optimal placement of hot spots in the network. The fidelity of the approach in suggesting the correct design choices even for loaded and congested networks is surprising. For those cases we studied empirically it is 100%.

  • 46. Grange, Matt
    et al.
    Weldezion, Awet Yemane
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Pamunuwa, Dinesh
    Weerasekera, Roshan
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Shippen, D.
    Physical mapping and performance study of a multi-clock 3-Dimensional Network-on-Chip mesh2009In: 2009 IEEE INTERNATIONAL CONFERENCE ON 3D SYSTEMS INTEGRATION, San Francisco: IEEE conference proceedings, 2009, 345-351 p.Conference paper (Refereed)
    Abstract [en]

    The physical performance of a 3-Dimensional Network-on-Chip (NoC) mesh architecture employing through silicon vias (TSV) for vertical connectivity is investigated with a cycle-accurate RTL simulator. The physical latency and area impact of TSVs, switches, and the on-chip interconnect is evaluated to extract the maximum signaling speeds through the switches, horizontal and vertical network links. The relatively low parasitics of TSVs compared to the on-chip 2-D interconnect allow for higher signaling speeds between chip layers. The system-level impact on overall network performance as a result of clocking vertical packets at a higher rate through the TSV interconnect is simulated and reported.

  • 47. Grecu, Cristian
    et al.
    Ivanov, Andre
    Pande, Partha
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Salminen, Erno
    Ogras, Umit
    Marculescu, Radu
    An Initiative towards Open Network-on-Chip Benchmarks2007Report (Other academic)
  • 48. Grecu, Cristian
    et al.
    Ivanov, Andre
    Pande, Partha
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Salminen, Erno
    Ogras, Umit
    Marculescu, Radu
    Towards open network-on-chip benchmarks2007In: NOCS 2007: FIRST INTERNATIONAL SYMPOSIUM ON NETWORKS-ON-CHIP, PROCEEDINGS, 2007, 205-212 p.Conference paper (Refereed)
    Abstract [en]

    Measuring and comparing performance, cost, and other features of advanced communication architectures for complex multi core/multiprocessor systems on chip is a significant challenge which has hardly been addressed so far. This document outlines the top-level view on a system of benchmarks for Networks on Chip (NoC), which intends to cover a wide spectrum of NoC design aspects, from application modeling to performance evaluation and post-manufacturing test and reliability. For performance benchmarking, requirements and features are described for application programs, synthetic micro-benchmarks, and abstract benchmark applications. Then, it proposes ways to measure and benchmark reliability, fault tolerance and testability of the on-chip communication fabric. This paper introduces the main concepts and ideas for benchmarking NoCs in a systematic and comparable way. It will be followed up by a report that will define a benchmark framework and the syntax of interfaces for benchmark programs that will allow the community to build-up a benchmark suite.

  • 49. Grimm, Christoph
    et al.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Shukla, Sandeep
    Villar, Eugenio
    C-Based Design of Embedded Systems - Editorial2008In: EURASIP Journal on Embedded Systems, ISSN 1687-3955, no 1, 243890- p.Article in journal (Other academic)
  • 50. Guang, Liang
    et al.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Adaptive Power Management for the On-Chip Communication Network2006In: DSD 2006: 9th EUROMICRO Conference on Digital System Design: Architectures, Methods and Tools, Proceedings, 2006, 649-656 p.Conference paper (Refereed)
    Abstract [en]

    An on-chip communication network is most power efficient when it operates just below the saturation point. For any given traffic load the network can be operated in this region by adjusting frequency and voltage. For a deflective routing network we propose the design of a central controller for dynamic frequency and voltage scaling. Given history information including the load and frequency in the network, the controller adjusts the frequency and voltage such that the network operates just below the saturation point. We provide control mechanisms for continuous and discrete frequency ranges. With a discrete frequency range and taking into account voltage switching delays, we evaluate the control mechanism under stochastic, smoothly varying and very bursty traffic. Experiments demonstrate that adaptive control is very effective in minimizing power consumption at reasonable performance. Compared with a fixed high frequency network, the adaptively controlled network is significantly more power efficient. We compare it to fixed frequency networks, which are either too slow exhibiting unbounded delays, or are dimensioned for the worst case with very high frequency and are very power hungry.

12345 1 - 50 of 241
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf