Change search
Refine search result
1 - 23 of 23
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Badawi, Mohammad
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronics.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronics.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronics.
    Quality-of-service-aware adaptation scheme for multi-core protocol processing architecture2017In: Microprocessors and microsystems, ISSN 0141-9331, E-ISSN 1872-9436, Vol. 54, p. 47-59Article in journal (Refereed)
    Abstract [en]

    Employing adaptable protocol processing architectures has shown a high potential in provisioning Quality-of-Service (QoS) while retaining efficient use of available energy budget. Nevertheless, successful QoS provisioning using adaptable protocol processing architectures requires adaption to be agile and to have low latency. That is, a long adaptation latency might lead to violating desired packet processing latency, desired throughput or loss of packets if the memory fails to accommodate packet accumulation. This paper presents an elastic management scheme to permit agile and QoS-aware adaptation of processing elements (PEs) within the protocol processing architecture, such that desired QoS is maintained. Moreover, our proposed scheme has the potential to reduce energy consumption since it employs the PEs upon demand. We quantify the latency required for PEs adaptation, the reduction in energy and the reduction in area that can be achieved using our scheme. We also consider two different real-life use cases to demonstrate the effectiveness of our proposed management scheme in maintaining QoS while conserving available energy.

  • 2. Bakhouya, Mohamed
    et al.
    Daneshtalab, Masoud
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Palesi, Maurizio
    Ghasemzadeh, Hassan
    Many-core System-on-Chip: architectures and applications2016In: Microprocessors and microsystems, ISSN 0141-9331, E-ISSN 1872-9436, Vol. 43, p. 1-3Article in journal (Refereed)
  • 3. Daneshtalab, Masoud
    et al.
    Palesi, Maurizio
    Plosila, Juha
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Special issue on many-core embedded systems2014In: Microprocessors and microsystems, ISSN 0141-9331, E-ISSN 1872-9436, Vol. 38, no 6, p. 525-525Article in journal (Other academic)
  • 4. Fakih, M.
    et al.
    Lenz, A.
    Azkarate-Askasua, M.
    Coronel, J.
    Crespo, A.
    Davidmann, S.
    Diaz Garcia, J. C.
    Romero, N. G.
    Grüttner, K.
    Schreiner, S.
    Seyyedi, R.
    Obermaisser, R.
    Maleki, A.
    Öberg, Johnny
    KTH, School of Information and Communication Technology (ICT), Electronics.
    Mohammadat, Mohamed Tagelsir
    KTH, School of Information and Communication Technology (ICT), Electronics.
    Pérez-Cerrolaza, J.
    Sander, Ingo
    KTH, School of Information and Communication Technology (ICT), Electronics.
    Söderquist, I.
    SAFEPOWER project: Architecture for safe and power-efficient mixed-criticality systems2017In: Microprocessors and microsystems, ISSN 0141-9331, E-ISSN 1872-9436, Vol. 52, p. 89-105Article in journal (Refereed)
    Abstract [en]

    With the ever increasing industrial demand for bigger, faster and more efficient systems, a growing number of cores is integrated on a single chip. Additionally, their performance is further maximized by simultaneously executing as many processes as possible without regarding their criticality. Even safety critical domains like railway and avionics apply these paradigms under strict certification regulations. As the number of cores is continuously expanding, the importance of cost-effectiveness grows. One way to increase the cost-efficiency of such System on Chip (SoC) is to enhance the way the SoC handles its power resources. By increasing the power efficiency, the reliability of the SoC is raised because the lifetime of the battery lengthens. Secondly, by having less energy consumed, the emitted heat is reduced in the SoC which translates into fewer cooling devices. Though energy efficiency has been thoroughly researched, there is no application of those power saving methods in safety critical domains yet. The EU project SAFEPOWER1.

  • 5.
    Farahini, Nasim
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Sohofi, Hassan
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jafri, Syed M. A. H.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Tajammul, Muhammad Adeel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Paul, Kolin
    Parallel distributed scalable runtime address generation scheme for a coarse grain reconfigurable computation and storage fabric2014In: Microprocessors and microsystems, ISSN 0141-9331, E-ISSN 1872-9436, Vol. 38, no 8, p. 788-802Article in journal (Refereed)
    Abstract [en]

    This paper presents a hardware based solution for a scalable runtime address generation scheme for DSP applications mapped to a parallel distributed coarse grain reconfigurable computation and storage fabric. The scheme can also deal with non-affine functions of multiple variables that typically correspond to multiple nested loops. The key innovation is the judicious use of two categories of address generation resources. The first category of resource is the low cost AGU that generates addresses for given address bounds for affine functions of up to two variables. Such low cost AGUs are distributed and associated with every read/write port in the distributed memory architecture. The second category of resource is relatively more complex but is also distributed but shared among a few storage units and is capable of handling more complex address generation requirements like dynamic computation of address bounds that are then used to configure the AGUs, transformation of non-affine functions to affine function by computing the affine factor outside the loop, etc. The runtime computation of the address constraints results in negligibly small overhead in latency, area and energy while it provides substantial reduction in program storage, reconfiguration agility and energy compared to the prevalent pre-computation of address constraints. The efficacy of the proposed method has been validated against the prevalent address generation schemes for a set of six realistic DSP functions. Compared to the pre-computation method, the proposed solution achieved 75% average code compaction and compared to the centralized runtime address generation scheme, the proposed solution achieved 32.7% average performance improvement.

  • 6. Farahnakian, Fahimeh
    et al.
    Ebrahimi, Masoumeh
    Daneshtalab, Masoud
    Liljeberg, Pasi
    Plosila, Juha
    Bi-LCQ: A Low-weight Clustering-based Q-learning Approach for NoCs2014In: Microprocessors and microsystems, ISSN 0141-9331, E-ISSN 1872-9436, ISSN 0141-9331, Vol. 38, no 1, p. 64-75Article in journal (Refereed)
    Abstract [en]

    Network congestion has a negative impact on the performance of on-chip networks due to the increasedpacket latency. Many congestion-aware routing algorithms have been developed to alleviate trafficcongestion over the network. In this paper, we propose a congestion-aware routing algorithm basedon the Q-learning approach for avoiding congested areas in the network. By using the learning method,local and global congestion information of the network is provided for each switch. This information canbe dynamically updated, when a switch receives a packet. However, Q-learning approach suffers fromhigh area overhead in NoCs due to the need for a large routing table in each switch. In order to reducethe area overhead, we also present a clustering approach that decreases the number of routing tablesby the factor of 4. Results show that the proposed approach achieves a significant performance improvementover the traditional Q-learning, C-routing, DBAR and Dynamic XY algorithms.

  • 7. Grüttner, K.
    et al.
    Görgen, R.
    Schreiner, S.
    Herrera, F.
    Peñil, P.
    Medina, J.
    Villar, E.
    Palermo, G.
    Fornaciari, W.
    Brandolese, C.
    Gadioli, D.
    Vitali, E.
    Zoni, D.
    Bocchio, S.
    Ceva, L.
    Azzoni, P.
    Poncino, M.
    Vinco, S.
    Macii, E.
    Cusenza, S.
    Favaro, J.
    Valencia, R.
    Sander, Ingo
    KTH, School of Information and Communication Technology (ICT), Electronics.
    Rosvall, Kathrin
    KTH, School of Information and Communication Technology (ICT), Electronics.
    Moghaddami Khalilzad, Nima
    KTH, School of Information and Communication Technology (ICT).
    Quaglia, D.
    CONTREX: Design of embedded mixed-criticality CONTRol systems under consideration of EXtra-functional properties2017In: Microprocessors and microsystems, ISSN 0141-9331, E-ISSN 1872-9436, Vol. 51, p. 39-55Article in journal (Refereed)
    Abstract [en]

    The increasing processing power of today's HW/SW platforms leads to the integration of more and more functions in a single device. Additional design challenges arise when these functions share computing resources and belong to different criticality levels. CONTREX complements current activities in the area of predictable computing platforms and segregation mechanisms with techniques to consider the extra-functional properties, i.e., timing constraints, power, and temperature. CONTREX enables energy efficient and cost aware design through analysis and optimization of these properties with regard to application demands at different criticality levels. This article presents an overview of the CONTREX European project, its main innovative technology (extension of a model based design approach, functional and extra-functional analysis with executable models and run-time management) and the final results of three industrial use-cases from different domain (avionics, automotive and telecommunication).

  • 8. Guang, Liang
    et al.
    Nigussie, Ethiopia
    Isoaho, Jouni
    Rantala, Pekka
    Tenhunen, Hannu
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Interconnection alternatives for hierarchical monitoring communication in parallel SoCs2010In: Microprocessors and microsystems, ISSN 0141-9331, E-ISSN 1872-9436, Vol. 34, no 5, p. 118-128Article in journal (Refereed)
    Abstract [en]

    Interconnection architectures for hierarchical monitoring communication in parallel System-on-Chip (SoC) platforms are explored. Hierarchical agent monitoring design paradigm is an efficient and scalable approach for the design of parallel embedded systems. Between distributed agents on different levels, monitoring communication is required to exchange information, which forms a prioritized traffic class over data traffic. The paper explains the common monitoring operations in SoCs, and categorizes them into different types of functionality and various granularities. Requirements for on-chip interconnections to support the monitoring communication are outlined. Baseline architecture with best-effort service, time division multiple access (TDMA) and two types of physically separate interconnections are discussed and compared, both theoretically and quantitatively on a Network-on-Chip (NoC)-based platform. The simulation uses power estimation of 65 nm technology and NoC microbenchmarks as traffic traces. The evaluation points out the benefits and issues of each interconnection alternative. In particular, hierarchical monitoring networks are the most suitable alternative, which decouple the monitoring communication from data traffic, provide the highest energy efficiency with simple switching, and enable flexible reconfiguration to tradeoff power and performance.

  • 9. Huang, L.
    et al.
    Zhang, X.
    Ebrahimi, Masoumeh
    KTH, School of Information and Communication Technology (ICT), Industrial and Medical Electronics. University of Turku, Finland.
    Li, G.
    Tolerating transient illegal turn faults in NoCs2016In: Microprocessors and microsystems, ISSN 0141-9331, E-ISSN 1872-9436, Vol. 43, no SI, p. 104-115Article in journal (Refereed)
    Abstract [en]

    Network-on-Chip (NoC) is becoming a competitive solution to connect hundreds of processing elements in modern computing platforms. Under the trend of shrinking feature sizes, circuits are likely to suffer from faults which lead to degraded performance and erroneous behaviour. Compared to permanent faults, transient faults happen even more frequently and seriously while they are hidden within complex on chip behaviours. One of the serious consequences caused by transient faults is taking illegal turns by the packets after the damage of control logic in on-chip routers which may lead to a deadlock situation and eventually crashing the entire system. To avoid this situation, in this paper, we propose a comprehensive scheme called ODT including an improved router architecture, an illegal-turn-resilient routing algorithm, online fault-detect units and a fault classification method. By applying ODT, more turns are supported on routing level and the deadlock situations can be significantly reduced. Experimental results indicate up to 22% increase of the survived packets in the network when 4% of routing computation units in failure. The extra area overhead and power consumption of ODT method is around 9.22% and 9.63%.

  • 10.
    Jafri, Syed M. A. H.
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems. University of Turku, Finland.
    Daneshtalab, Masoud
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems. University of Turku, Finland.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Abbas, N.
    Awan, M. A.
    Plosila, J.
    TEA: Timing and Energy Aware compression architecture for Efficient Configuration in CGRAs2015In: Microprocessors and microsystems, ISSN 0141-9331, E-ISSN 1872-9436Article in journal (Refereed)
    Abstract [en]

    Coarse Grained Reconfigurable Architectures (CGRAs) are emerging as enabling platforms to meet the high performance demanded by modern applications (e.g. 4G, CDMA, etc.). Recently proposed CGRAs offer time-multiplexing and dynamic applications parallelism to enhance device utilization and reduce energy consumption at the cost of additional memory (up to 50% area of the overall platform). To reduce the memory overheads, novel CGRAs employ either statistical compression, intermediate compact representation, or multicasting. Each compaction technique has different properties (i.e. compression ratio, decompression time and decompression energy) and is best suited for a particular class of applications. However, existing research only deals with these methods separately. Moreover, they only analyze the compaction ratio and do not evaluate the associated energy overheads. To tackle these issues, we propose a polymorphic compression architecture that interleaves these techniques in a unique platform. The proposed architecture allows each application to take advantage of a separate compression/decompression hierarchy (consisting of various types and implementations of hardware/software decoders) tailored to its needs. Simulation results, using different applications (FFT, Matrix multiplication, and WLAN), reveal that the choice of compression hierarchy has a significant impact on compression ratio (up to 52%), decompression energy (up to 4 orders of magnitude), and configuration time (from 33. n to 1.5. s) for the tested applications. Synthesis results reveal that introducing adaptivity incurs negligible additional overheads (1%) compared to the overall platform area.

  • 11.
    Jafri, Syed Mohammad Asad Hassan
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. University of Turku, Finland.
    Guang, Liang
    University of Turku, Finland.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Paul, Kolin
    Indian Institute of Technology, Delhi, India.
    Plosila, Juha
    University of Turku, Finland.
    Tenhunen, Hannu
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Energy-aware fault-tolerant network-on-chips for addressing multiple traffic classes2013In: Microprocessors and microsystems, ISSN 0141-9331, E-ISSN 1872-9436, Vol. 37, no 8, p. 811-822Article in journal (Refereed)
    Abstract [en]

    This paper presents an energy efficient architecture to provide on-demand fault tolerance to multiple traffic classes, running simultaneously on single network on chip (NoC) platform. Today, NoCs host multiple traffic classes with potentially different reliability needs. Providing platform-wide worst-case (maximum) protection to all the classes is neither optimal nor desirable. To reduce the overheads incurred by fault tolerance, various adaptive strategies have been proposed. The proposed techniques rely on individual packet fields and operating conditions to adjust the intensity and hence the overhead of fault tolerance. Presence of multiple traffic classes undermines the effectiveness of these methods. To complement the existing adaptive strategies, we propose on-demand fault tolerance, capable of providing required reliability, while significantly reducing the energy overhead. Our solution relies on a hierarchical agent based control layer and a reconfigurable fault tolerance data path. The control layer identifies the traffic class and directs the packet to the path providing the needed reliability. Simulation results using representative applications (matrix multiplication, FFT, wavefront, and HiperLAN) showed up to 95% decrease in energy consumption compared to traditional worst case methods. Synthesis results have confirmed a negligible additional overhead, for providing on-demand protection (up to 5.3% area), compared to the overall fault tolerance circuitry.

  • 12.
    Jafri, Syed Mohammad Asad Hassan
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Piestrak, S. J.
    Sentieys, O.
    Pillement, S.
    Design of the coarse-grained reconfigurable architecture DART with on-line error detection2014In: Microprocessors and microsystems, ISSN 0141-9331, E-ISSN 1872-9436, Vol. 38, no 2, p. 124-136Article in journal (Refereed)
    Abstract [en]

    This paper presents the implementation of the coarse-grained reconfigurable architecture (CGRA) DART with on-line error detection intended for increasing fault-tolerance. Most parts of the data paths and of the local memory of DART are protected using residue code modulo 3, whereas only the logic unit is protected using duplication with comparison. These low-cost hardware techniques would allow to tolerate temporary faults (including so called soft errors caused by radiation), provided that some technique based on re-execution of the last operation is used. Synthesis results obtained for a 90 nm CMOS technology have confirmed significant hardware and power consumption savings of the proposed approach over commonly used duplication with comparison. Introducing one extra pipeline stage in the self-checking version of the basic arithmetic blocks has allowed to significantly reduce the delay overhead compared to our previous design.

  • 13.
    Li, Nan
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Dubrova, Elena
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Area-efficient high-coverage LBIST2014In: Microprocessors and microsystems, ISSN 0141-9331, E-ISSN 1872-9436, Vol. 38, no 5, p. 368-374Article in journal (Refereed)
    Abstract [en]

    Logic Built-In Self Test (LBIST) is a popular technique for applications requiring in-field testing of digital circuits. LBIST incorporates test generation and response-capture on-chip. It requires no interaction with a large, expensive tester. LBIST offers test time reduction due to at-speed test pattern application, makes possible test data re-usability at many levels, and enables test-ready IP. However, the traditional pseudo-random pattern-based LBIST often has a low test coverage. This paper presents a new method for on-chip generation of deterministic test patterns based on registers with non-linear update. Our experimental results on 7 real designs show that the presented approach can achieve a higher stuck-at coverage than the test point insertion with less area overhead. We also show that registers with non-linear update are asymptotically smaller than memories required to store the same test patterns in a compressed form.

  • 14.
    Ma, Ning
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK.
    Zheng, Lirong
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK.
    System design of full HD MVC decoding on mesh-based multicore NoCs2011In: Microprocessors and microsystems, ISSN 0141-9331, E-ISSN 1872-9436, Vol. 35, no 2, p. 217-229Article in journal (Refereed)
    Abstract [en]

    Future multimedia applications such as full HD (1920 x 1080) multiview video coding (MVC) present great challenges on computing architectures. Even if with the state-of-the-art ASIC technology which can process single view HD decoding, dealing with multiple views would require times of computation capacity in proportion to the number of views, which is difficult to achieve. In this paper, we explore the system-level design space for full HD MVC applications mapped onto mesh-based multicore Network-on-Chip (NoC) architectures. To this end, we establish a simulation framework capable of simulating the combination of communication networks with computing cores. We investigate two task assignment schemes: picture-level assignment and view-level assignment. With an eight-view MVC decoding, we explore the design options with respect to network size, single-core performance and link bandwidth under both task assignment schemes. Our studies show that, to achieve a certain decoding performance, the computation capability and communication capacity should be balanced in the system. Also, to realize the eight-view HD decoding, the system only requires twice or less than twice of the single-core processing capacity required by single view decoding, thanks to the parallel computation and communication enabled by the multicore NoC architectures. Our results exhibit feasibility and potential of efficiently implementing the full HD MVC decoding on multicore NoC architectures.

  • 15. Rahmati, Dara
    et al.
    Sarbazi-Azad, Hamid
    Hessabi, Shaahin
    Eslami Kiasari, Abbas
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Power-efficient deterministic and adaptive routing in torus networks-on-chip2012In: Microprocessors and microsystems, ISSN 0141-9331, E-ISSN 1872-9436, Vol. 36, no 7, p. 571-585Article in journal (Refereed)
    Abstract [en]

    Modern SoC architectures use NoCs for high-speed inter-IP communication. For NoC architectures, high-performance efficient routing algorithms with low power consumption are essential for real-time applications. NoCs with mesh and torus interconnection topologies are now popular due to their simple structures. A torus NoC is very similar to the mesh NoC, but has rather smaller diameter. For a routing algorithm to be deadlock-free in a torus, at least two virtual channels per physical channel must be used to avoid cyclic channel dependencies due to the warp-around links; however, in a mesh network deadlock freedom can be insured using only one virtual channel. The employed number of virtual channels is important since it has a direct effect on the power consumption of NoCs. In this paper, we propose a novel systematic approach for designing deadlock-free routing algorithms for torus NoCs. Using this method a new deterministic routing algorithm (called TRANC) is proposed that uses only one virtual channel per physical channel in torus NoCs. We also propose an algorithmic mapping that enables extracting TRANC-based routing algorithms from existing routing algorithms, which can be both deterministic and adaptive. The simulation results show power consumption and performance improvements when using the proposed algorithms.

  • 16.
    Redell, Ola
    et al.
    KTH, Superseded Departments, Machine Design.
    El-Khoury, Jad
    KTH, Superseded Departments, Machine Design.
    Törngren, Martin
    KTH, Superseded Departments, Machine Design.
    The AIDA toolset for design and implementation analysis of distributed real-time control systems2004In: Microprocessors and microsystems, ISSN 0141-9331, E-ISSN 1872-9436, Vol. 28, no 4, p. 163-182Article in journal (Refereed)
    Abstract [en]

    This article introduces a toolset that integrates the design and performance analysis of control systems with embedded real-time system design. The toolset enables specification and analysis of real-time implementations of control applications. Control system designs are imported to a real-time system-modelling domain in which the functionality is distributed on a target computer system. The control functionality is partitioned into operating system processes, inter-process communications are defined and the triggering of processes is specified. Once the real-time design is complete, the response times and release jitter of the processes and their contained functions can be analysed and the system information exported back to the control domain. This enables analysis of the resulting control performance with account taken to implementation effects such as delays and release jitter due to resource sharing and scheduling. The usage of the toolset is demonstrated on a dual leg controller for a walking robot. The case study shows how the toolset is used to describe a system, from the control system specification to the design of its implementation on a distributed network of processors. Different implementation solutions are suggested and evaluated based on simulated control system performance.

  • 17. Rezaei, A.
    et al.
    Daneshtalab, Masoud
    KTH, School of Information and Communication Technology (ICT), Electronics, Electronic and embedded systems.
    Zhao, D.
    CAP-W: Congestion-aware platform for wireless-based network-on-chip in many-core era2017In: Microprocessors and microsystems, ISSN 0141-9331, E-ISSN 1872-9436, Vol. 52, p. 23-33Article in journal (Refereed)
    Abstract [en]

    In order to fulfill the ever-increasing demand for high-speed and high-bandwidth, wireless-based MCSoC is presented based on a NoC communication infrastructure. Inspiring the separation between the communication and the computation demands as well as providing the flexible topology configurations, makes wireless-based NoC a promising future MCSoC architecture. However, congestion occurrence in wireless routers reduces the benefit of high-speed wireless links and significantly increases the network latency. Therefore, in this paper, a congestion-aware platform, named CAP-W, is introduced for wireless-based NoC in order to reduce congestion in the network and especially over wireless routers. The triple-layer platform of CAP-W is composed of mapping, migration, and routing layers. In order to minimize the congestion probability, the mapping layer is responsible for selecting the suitable free core as the first candidate, finding the suitable first task to be mapped onto the selected core, and allocating other tasks with respect to contiguity. Considering dynamic variation of application behaviors, the migration layer modifies the primary task mapping to improve congestion situation. Furthermore, the routing layer balances utilization of wired and wireless networks by separating short-distance and long-distance communications. Experimental results show meaningful gain in congestion control of wireless-based NoC compared to state-of-the-art works.

  • 18.
    Törngren, Martin
    et al.
    KTH, Superseded Departments, Machine Design.
    Redell, O.
    A modelling framework to support the design and analysis of distributed real-time control systems2000In: Microprocessors and microsystems, ISSN 0141-9331, E-ISSN 1872-9436, Vol. 24, no 2, p. 81-93Article in journal (Refereed)
    Abstract [en]

    Within the automatic control in distributed applications project a modelling framework has been developed to support design issues related to the implementation of control applications in embedded distributed computer systems. At a relatively high level of abstraction the models describe the structure and timing behaviour of a control application (in terms of functions and operational models) and its implementation (hardware, operating system threads and resources). The resource description allows the timing behaviour of the implementation to be analysed and fed back into the application models. The models form the basis for a decentralization tool-set, where a first prototype is under development. Examples of the models are given and the framework is compared to related modelling approaches.

  • 19. Wang, J.
    et al.
    Ebrahimi, Masoumeh
    KTH, School of Information and Communication Technology (ICT), Electronics, Integrated devices and circuits. University of Turku, Finland.
    Huang, L.
    Li, Q.
    Li, G.
    Jantsch, A.
    Minimizing the system impact of router faults by means of reconfiguration and adaptive routing2017In: Microprocessors and microsystems, ISSN 0141-9331, E-ISSN 1872-9436, Vol. 51, p. 252-263Article in journal (Refereed)
    Abstract [en]

    To tolerate faults in Networks-on-Chip (NoC), routers are often disconnected from the NoC, which affects the system integrity. This is because cores connected to the disabled routers cannot be accessed from the network, resulting in loss of function and performance. We propose E-Rescuer, a technique offering a re configurable router architecture and a fault-tolerant routing algorithm. By taking advantage of bypassing channels, the reconfigurable router architecture maintains the connection between the cores and the network regardless of the router status. The routing algorithm allows the core to access the network when the local router is disabled. Our analysis and experiments show that the proposed technique provides 100% packet delivery in 100%, 92.56%, and 83.25% of patterns when 1, 2 and 3 routers are faulty, respectively. Moreover, the throughput increases up to 80%, 46% and 33% in comparison with FTLR, HiPFaR, and CoreRescuer, respectively.

  • 20. Wang, X.
    et al.
    Zhao, B.
    Wang, L.
    Mak, T.
    Yang, M.
    Jiang, Y.
    Daneshtalab, Masoud
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    A pareto-optimal runtime power budgeting scheme for many-core systems2016In: Microprocessors and microsystems, ISSN 0141-9331, E-ISSN 1872-9436Article in journal (Refereed)
    Abstract [en]

    Due to the ever-escalating power consumption, a significant proportion of the future many-core chips is mandatory to be switched off to meet the power budgets. This trend has brought up a paradigm shift from conventional low-power to power budgeting designs, where performance optimization needs to be performed under a tight power budget constraint. There are two key issues to be considered when moving this new design paradigm forward. Firstly, with per-core frequency scaling, the number of frequency combinations of the cores grows exponentially. As more cores are integrated onto a chip, it becomes more challenging to achieve the optimal performance over a given power budget. Secondly, the power budgets of many-core system might undergo a rapid fluctuation. Consequently, the power budgeting scheme needs to be prompt to make appropriate changes to track such power budget variation. This paper is aiming at resolving the problem of optimizing overall performance over a power budget using frequency scaling technique. To solve the problem efficiently at runtime, we propose a parallel dynamic programming network, in which the Pareto-optimal solutions can be obtained using linear time complexity. Experimental results have confirmed that the proposed approach can reduce the execution time by 45% when compared to other existing methods. The runtime overhead and hardware cost of the proposed approach are reasonably small, such as the average area and power consumption are less than 1% of the whole network-on-chip. This paper demonstrates an effective formulation for delivering Pareto-optimal solutions for power budgeting in future many-core systems.

  • 21.
    Weldezion, Awet Yemane
    et al.
    KTH, School of Information and Communication Technology (ICT), Industrial and Medical Electronics.
    Grange, Matt
    Jantsch, Axel
    Tenhunen, Hannu
    KTH, School of Information and Communication Technology (ICT), Industrial and Medical Electronics.
    Pamunuwa, Dinesh
    Zero-load Predictive Model for Performance Analysis in Deflection Routing NoCs2015In: Microprocessors and microsystems, ISSN 0141-9331, E-ISSN 1872-9436, Vol. 39, no 8, p. 634-647Article in journal (Refereed)
    Abstract [en]

    We study a static model for 2-D and 3-D networks that accurately represents the average distance travelled by packets under deflection routing, which is a specific form of adaptive routing. The model captures static properties of the network topology and the spatial distribution of traffic, but does not take into account traffic loading and congestion. Even though this static model cannot accurately predict packet latency under high load, we contend that it is a perfect predictor of deflection routing networks’ relative performance under any load condition below saturation, and thus always correctly predicts the optimum network configuration. This is verified through cycle-accurate simulations of congested and uncongested networks with fully adaptive, deflection routing for regular traffic patterns such as uniform random, localised, bursty, and others, as well as irregular patterns in both regular and irregular networks. As the networks with minimal average distance perform best even under high traffic load, the average distance model establishes a robust relation between a static network property, average distance, and network performance under load, providing new insight into network behaviour and an opportunity to identify the optimal network configuration without time-consuming simulations.

  • 22. Xu, T. C.
    et al.
    Yin, A. W.
    Liljeberg, P.
    Tenhunen, Hannu
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK.
    A study of 3d network-on-chip design for data parallel h. 264 coding2011In: Microprocessors and microsystems, ISSN 0141-9331, E-ISSN 1872-9436, Vol. 35, no 7, p. 603-612Article in journal (Refereed)
    Abstract [en]

    In this paper, we implement, analyze and compare different Network-on-Chip (NoC) architectures aiming at higher efficiencies for MPEG-4/H.264 coding. Two-dimensional (2D) and three-dimensional (3D) NoCs based on Non-Uniform Cache Access (NUCA) are analyzed. We present results using a full system simulator with realistic workloads. Experiments show the average network latencies in two 3D NoCs are reduced by 28% and 34% respectively, comparing with 20 design. It is also shown that heat dissipation is a trade-off in improving performance of 3D chips. Our analysis and experiment results provide a guideline to design efficient 3D NoCs for data parallel H.264 coding applications.

  • 23.
    Zhang, Yuang
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS. Institute of VLSI Design, Key Laboratory of Advanced Photonic and Electronic Materials, Nanjing University, China .
    Li, Li
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Gao, Minglun
    Pan, Hongbing
    Han, Feng
    A survey of memory architecture for 3D chip multi-processors2014In: Microprocessors and microsystems, ISSN 0141-9331, E-ISSN 1872-9436, Vol. 38, no 5, p. 415-430Article in journal (Refereed)
    Abstract [en]

    3D chip multi-processors (3D CMPs) combine the advantages of 3D integration and the parallelism of CMPs, which are emerging as active research topics in VLSI and multi-core computer architecture communities. One significant potentiality of 3D CMPs is to exploit the diversity of integration processes and high volume of vertical TSV bandwidth to mitigate the well-known "Memory Wall" problem. Meanwhile, the 3D integration techniques are under the severe thermal, manufacture yield and cost constraints. Research on 3D stacking memory hierarchy explores the high performance and power/thermal efficient memory architectures for 3D CMPs. The micro-architectures of memories can be designed in the 3D integrated circuit context and integrated into 3D CMPs. This paper surveys the design of memory architectures for 3D CMPs. We summarize current research into two categories: stacking cache-only architectures and stacking main memory architectures for 3D CMPs. The representative works are reviewed and the remaining opportunities and challenges are discussed to guide the future research in this emerging area.

1 - 23 of 23
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf