Endre søk
Begrens søket
1 - 20 of 20
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Treff pr side
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
Merk
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 1.
    Dubrova, Elena
    et al.
    KTH, Tidigare Institutioner                               , Mikroelektronik och informationsteknik, IMIT.
    Muzio, J. C.
    Easily testable multiple-valued logic circuits derived from Reed-Muller circuits2000Inngår i: I.E.E.E. transactions on computers (Print), ISSN 0018-9340, E-ISSN 1557-9956, Vol. 49, nr 11, s. 1285-1289Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    In 1972, Reddy showed that the binary circuits realizing Reed-Muller canonical form are easily testable. In this paper, we extend Reddy's result to multiple-valued logic circuits. employing more than two discrete levels of signal, The electronic fabrication of such circuits became feasible due to the recent advances in integrated circuit technology. We show that, in the multiple-valued case, several new phenomena occur which allow us to asymptotically reduce the upper bound on the number of tests required for fault detection, but make the generation of tests harder.

  • 2.
    Ebrahimi, M.
    et al.
    Turku Centre for Computer Science (TUCS).
    Daneshtalab, M.
    Turku Centre for Computer Science (TUCS).
    Liljeberg, P.
    Turku Centre for Computer Science (TUCS).
    Plosila, J.
    Turku Centre for Computer Science (TUCS).
    Flich, J.
    Turku Centre for Computer Science (TUCS).
    Tenhunen, Hannu
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem. KTH, Skolan för informations- och kommunikationsteknik (ICT), Centra, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK.
    Path-based Partitioning Methods for 3D Networks-on-Chip with Minimal Adaptive Routing2012Inngår i: I.E.E.E. transactions on computers (Print), ISSN 0018-9340, E-ISSN 1557-9956, Vol. 99, s. 1-16Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Combining the benefits of 3D ICs and Networks-on-Chip (NoCs) schemes provides a significant performance gain for 3D architectures. Since multicast communication is commonly used in cache coherence protocols for CMPs and in various parallel applications, the performance in these systems can be significantly improved if multicast operations are supported at hardware level. In this paper, we present several partitioning methods for the path-based multicast approach in 3D mesh-based NoCs, each with different levels of efficiency. In addition, we develop novel analytical models for unicast and multicast traffic to explore the efficiency of each approach. In order to distribute the unicast and multicast traffic more efficiently over the network, we propose Minimal Adaptive Routing (MAR) algorithm for the presented partitioning methods. The analytical and experimental results show that an advantageous method named Recursive Partitioning (RP) outperforms the other approaches. RP recursively partitions the network until all partitions contain a comparable number of switches and the multicast traffic is equally distributed among several subsets. The simulation results reveal that the RP method can achieve performance improvement across all workloads while the performance can be further improved by utilizing MAR, 19% average and 42% maximum latency reduction, on SPLASH-2 and PARSEC benchmarks.

  • 3. Ebrahimi, Masoumeh
    et al.
    Daneshtalab, Masoud
    Liljeberg, Pasi
    Plosila, Juha
    Flich, Jose
    Tenhunen, Hannu
    Path-Based Partitioning Methods for 3D Networks-on-Chip with Minimal Adaptive Routing2014Inngår i: I.E.E.E. transactions on computers (Print), ISSN 0018-9340, E-ISSN 1557-9956, Vol. 63, nr 3, s. 718-733Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Combining the benefits of 3D ICs and Networks-on-Chip (NoCs) schemes provides a significant performance gain in ChipMultiprocessors (CMPs) architectures. As multicast communication is commonly used in cache coherence protocols for CMPs and invarious parallel applications, the performance of these systems can be significantly improved if multicast operations are supported at thehardware level. In this paper, we present several partitioning methods for the path-based multicast approach in 3D mesh-based NoCs,each with different levels of efficiency. In addition, we develop novel analytical models for unicast and multicast traffic to explore theefficiency of each approach. In order to distribute the unicast and multicast traffic more efficiently over the network, we propose theMinimal and Adaptive Routing (MAR) algorithm for the presented partitioning methods. The analytical and experimental results show thatan advantageous method named Recursive Partitioning (RP) outperforms the other approaches. RP recursively partitions the networkuntil all partitions contain a comparable number of switches and thus the multicast traffic is equally distributed among several subsetsand the network latency is considerably decreased. The simulation results reveal that the RP method can achieve performanceimprovement across all workloads while performance can be further improved by utilizing the MAR algorithm. Nineteen percent averageand 42 percent maximum latency reduction are obtained on SPLASH-2 and PARSEC benchmarks running on a 64-core CMP.

  • 4. Haghbayan, Mohammad-Hashem
    et al.
    Rahmani, Amir-Mohammad
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Industriell och Medicinsk Elektronik. University of Turku, Finland.
    Miele, Antonio
    Fattah, Mohammad
    Plosila, Juha
    Liljeberg, Pasi
    Tenhunen, Hannu
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Industriell och Medicinsk Elektronik. University of Turku, Finland.
    A Power-Aware Approach for Online Test Scheduling in Many-Core Architectures2016Inngår i: I.E.E.E. transactions on computers (Print), ISSN 0018-9340, E-ISSN 1557-9956, Vol. 65, nr 3, s. 730-743Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Aggressive technology scaling triggers novel challenges to the design of multi-/many-core systems, such as limited power budget and increased reliability issues. Today's many-core systems employ dynamic power management and runtime mapping strategies trying to offer optimal performance while fulfilling power constraints. On the other hand, due to the reliability challenges, online testing techniques are becoming a necessity in current and near future technologies. However, state-of-the-art techniques are not aware of the other power/performance requirements. This paper proposes a power-aware non-intrusive online testing approach for many-core systems. The approach schedules software based self-test routines on the various cores during their idle periods, while honoring the power budget and limiting delays in the workload execution. A test criticality metric, based on a device aging model, is used to select cores to be tested at a time. Moreover, power and reliability issues related to the testing at different voltage and frequency levels are also handled. Extensive experimental results reveal that the proposed approach can i) efficiently test the cores within the available power budget causing a negligible performance penalty, ii) adapt the test frequency to the current cores' aging status, and iii) cover available voltage and frequency levels during the testing.

  • 5. Hojabr, Reza
    et al.
    Modarressi, Mehdi
    Daneshtalab, Masoud
    Yasoubi, Ali
    Khonsari, Ahmad
    Customizing Clos Network-on-Chip for Neural Networks2017Inngår i: I.E.E.E. transactions on computers (Print), ISSN 0018-9340, E-ISSN 1557-9956, Vol. 66, nr 11, s. 1865-1877Artikkel i tidsskrift (Fagfellevurdert)
  • 6. Huang, Letian
    et al.
    Wang, Junshi
    Ebrahimi, Masoumeh
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Industriell och Medicinsk Elektronik.
    Daneshtalab, Masoud
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektronik och Inbyggda System.
    Zhang, Xiaofan
    Li, Guangjun
    Jantsch, Axel
    Non-Blocking Testing for Network-on-Chip2016Inngår i: I.E.E.E. transactions on computers (Print), ISSN 0018-9340, E-ISSN 1557-9956, Vol. 65, nr 3, s. 679-692Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    To achieve high reliability in on-chip networks, it is necessary to test the network as frequently as possible to detect physical failures before they lead to system-level failures. A main obstacle is that the circuit under test has to be isolated, resulting in network cuts and packet blockage which limit the testing frequency. To address this issue, we propose a comprehensive network-level approach which could test multiple routers simultaneously at high speed without blocking or dropping packets. We first introduce a reconfigurable router architecture allowing the cores to keep their connections with the network while the routers are under test. A deadlock-free and highly adaptive routing algorithm is proposed to support reconfigurations for testing. In addition, a testing sequence is defined to allow testing multiple routers to avoid dropping of packets. A procedure is proposed to control the behavior of the affected packets during the transition of a router from the normal to the testing mode and vice versa. This approach neither interrupts the execution of applications nor has a significant impact on the execution time. Experiments with the PARSEC benchmarks on an 8x8 NoC-based chip multiprocessors show only 3 percent execution time increase with four routers simultaneously under test.

  • 7.
    Jafri, Syed M. A. H.
    et al.
    KTH, Skolan för informations- och kommunikationsteknik (ICT).
    Daneshtalab, Masoud
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektronik och Inbyggda System.
    Abbas, Naeem
    Serrano Leon, Guillermo
    Hemani, Ahmed
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektronik och Inbyggda System.
    TransMap: Transformation Based Remapping and Parallelism for High Utilization and Energy Efficiency in CGRAs2016Inngår i: I.E.E.E. transactions on computers (Print), ISSN 0018-9340, E-ISSN 1557-9956, Vol. 65, nr 11, s. 3456-3469Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    In the era of platforms hosting multiple applications with arbitrary inter application communication and computation patterns, compile time mapping decisions are neither optimal nor desirable. As a solution to this problem, recently proposed architectures offer run-time remapping-. The run-time remapping techniques displace or parallelize/serialize an application to optimize different parameters (e.g., utilization and energy). To implement the dynamic remapping, reconfigurable architectures commonly store multiple (compile-time generated) implementations of an application. Each implementation represents a different platform location and/or degree of parallelism. The optimal implementation is selected at run-time. However, the compile-time binding either incurs excessive configuration memory overheads and/or is unable to map/parallelize an application even when sufficient resources are available. As a solution to this problem, we present Transformation based reMapping and parallelism (TransMap). TransMap stores only a single implementation and applies a series for transformations to the stored bitstream for remapping or parallelizing an application. Compared to state of the art, in addition to simple relocation in horizontal/vertical directions, TransMap also allows to rotate an application for mapping or parallelizing an application in resource constrained scenarios. By storing only a single implementation, TransMap offers significant reductions in configuration memory requirements (up to 73 percent for the tested applications), compared to state of the art compaction techniques. Simulation results reveal that the additional flexibility reduces the energy requirements by 33 percent and enhances the device utilization by 50 percent for the tested applications. Gate level analysis reveals that TransMap incurs negligible silicon (0.2 percent of the platform) and timing (6 additional cycles per application) penalty.

  • 8.
    Johnsson, Lennart
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Ho, Ching-Tien
    On the Conversion between Binary Code and Binary Reflected Gray Code1995Inngår i: I.E.E.E. transactions on computers (Print), ISSN 0018-9340, E-ISSN 1557-9956, Vol. 44, nr 1, s. 47-53Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    We present a new algorithm for conversion between binary code and binary-reflected Gray code that requires ap proximately 2K/3 element transfers in sequence for K elements per node, compared to K element transfers for previously known algorithms. For a binary cube of n = 2 dimensions the new algorithm degenerates to yield a complexity of K/2 + 1 element transfers, which is optimal. The new algorithm is optimal to within a multiplicative factor of 4/3 with respect to the best known lower bound for any routing strategy. We show that the minimum number of element transfers for minimum path length routing is K with concurrent communication on all channels of every node of a binary cube.

  • 9.
    Johnsson, Lennart
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Ho, Ching-Tien
    Spanning Graphs for Optimum Broadcasting and Personalized Communication in Hypercubes1989Inngår i: I.E.E.E. transactions on computers (Print), ISSN 0018-9340, E-ISSN 1557-9956, Vol. 38, nr 9, s. 1249-1268Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Four different communication problems are addressed in Booleann-cube configured multiprocessors: (1) one-to-all broadcasting: distribution of common data from a single source to all other nodes; (2) one-to-all personalized communication: a single node sending unique data to all other nodes; (3) all-to-all broadcasting: distribution of common data from each node to all other nodes; and (4) all-to-all personalized communication: each node sending a unique piece of information to every other node. Three communication graphs (spanning trees) for the Booleann-cube are proposed for the routing, and scheduling disciplines provably optimum within a small constant factor are proposed. With appropriate scheduling and concurrent communication on all ports of every processor, routings based on these two communication graphs offer a speedup of up ton/2, and O(√n) over the routings based on the spanning binomial tree for cases (2)-(4) respectively. All three spanning trees offer optimal communication times for cases (2)-(4) and concurrent communication on all ports of every processor. Timing models and complexity analysis are verified by experiments on a Boolean-cube-configured multiprocessor

  • 10.
    Lu, Zhonghai
    et al.
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektronik.
    Yao, Yuan
    KTH, Skolan för teknikvetenskap (SCI), Fysik.
    Marginal Performance: Formalizing and Quantifying Power Over/Under Provisioning in NoC DVFS2017Inngår i: I.E.E.E. transactions on computers (Print), ISSN 0018-9340, E-ISSN 1557-9956, Vol. 66, nr 11, s. 1903-1917Artikkel i tidsskrift (Fagfellevurdert)
  • 11.
    Lu, Zhonghai
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Elektronik, Elektronik och inbyggda system.
    Yao, Yuan
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Elektronik, Elektronik och inbyggda system.
    Thread Voting DVFS for Manycore NoCs2018Inngår i: I.E.E.E. transactions on computers (Print), ISSN 0018-9340, E-ISSN 1557-9956, Vol. 67, nr 10, s. 1506-1524, artikkel-id 8338086Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    We present a thread-voting DVFS technique for manycore networks-on-chip (NoCs). This technique has two remarkable features which differentiate from conventional NoC DVFS schemes. (1) Not only network-level but also thread-level runtime performance indicatives are used to guide DVFS decisions. (2) To resolve multiple perhaps conflicting performance indicatives from many cores, it allows each thread to 'vote' for a V/F level in its own performance interest, and a region-based V/F controller makes dynamic per-region V/F decision according to the major vote. We evaluate our technique on a 64-core CMP in full-system simulation environment GEM5 with both PARSEC and SPEC OMP2012 benchmarks. Compared to a network metric (router buffer occupancy) based approach, it can improve the network energy efficacy measured in MPPJ (million packets per joule) by up to 22 percent for PARSEC and 20 percent for SPEC OMP2012, and the system energy efficacy measured in MIPJ (million instructions per joule) by up to 35 percent for PARSEC and 33 percent for SPEC OMP2012. 

  • 12. Moller, Niels
    et al.
    Granlund, Torbjörn
    KTH, Skolan för datavetenskap och kommunikation (CSC), Teoretisk datalogi, TCS.
    Improved Division by Invariant Integers2011Inngår i: I.E.E.E. transactions on computers (Print), ISSN 0018-9340, E-ISSN 1557-9956, Vol. 60, nr 2, s. 165-175Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    This paper considers the problem of dividing a two-word integer by a single-word integer, together with a few extensions and applications. Due to lack of efficient division instructions in current processors, the division is performed as a multiplication using a precomputed single-word approximation of the reciprocal of the divisor, followed by a couple of adjustment steps. There are three common types of unsigned multiplication instructions: we define full word multiplication (umul), which produces the two-word product of two single-word integers; low multiplication (umullo), which produces only the least significant word of the product; and high multiplication (umulhi), which produces only the most significant word. We describe an algorithm that produces a quotient and remainder using one umul and one umullo. This is an improvement over earlier methods, since the new method uses cheaper multiplication operations. It turns out that we also get some additional savings from simpler adjustment conditions. The algorithm has been implemented in version 4.3 of the GMP library. When applied to the problem of dividing a large integer by a single word, the new algorithm gives a speedup of roughly 30 percent, benchmarked on AMD and Intel processors in the x86_64 family.

  • 13. Salamat, Ronak
    et al.
    Khayambashi, Misagh
    Ebrahimi, Masoumeh
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Industriell och Medicinsk Elektronik.
    Bagherzadeh, Nader
    A Resilient Routing Algorithm with Formal Reliability Analysis for Partially Connected 3D-NoCs2016Inngår i: I.E.E.E. transactions on computers (Print), ISSN 0018-9340, E-ISSN 1557-9956, Vol. 65, nr 11, s. 3265-3279Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    3D ICs can take advantage of a scalable communication platform, commonly referred to as the Networks-on-Chip (NoC). In the basic form of 3D-NoC, all routers are vertically connected. Partially connected 3D-NoC has emerged because of physical limitations of using vertical links. Routing is of great importance in such partially connected architectures. A high-performance, fault-tolerant and adaptive routing strategy with respect to the communication flow among the cores is crucial while freedom from livelock and deadlock has to be guaranteed. In this paper we introduce a new routing algorithm for partially connected 3D-NoCs. The routing algorithm is adaptive and tolerates the faults on vertical links as compared to the predesigned routing algorithms. Our results show a 40 - 50% improvement in the fraction of intact inter-level communications when the fault tolerant algorithm is used. This routing algorithm is lightweight and has only one virtual channel along the Y dimension.

  • 14.
    Salamat, Ronak
    et al.
    Univ Calif Irvine, Dept Elect Engn & Comp Sci, Irvine, CA 92697 USA..
    Khayambashi, Misagh
    Univ Calif Irvine, Dept Elect Engn & Comp Sci, Irvine, CA 92697 USA..
    Ebrahimi, Masoumeh
    KTH. Univ Turku, SF-20500 Turku, Finland..
    Bagherzadeh, Nader
    Univ Calif Irvine, Dept Elect Engn & Comp Sci, Irvine, CA 92697 USA..
    LEAD: An Adaptive 3D-NoC Routing Algorithm with Queuing-Theory Based Analytical Verification2018Inngår i: I.E.E.E. transactions on computers (Print), ISSN 0018-9340, E-ISSN 1557-9956, Vol. 67, nr 8, s. 1153-1166Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    2D-NoCs have been the mainstream approach used to interconnect multi-core systems. 3D-NoCs have emerged to compensate for deficiencies of 2D-NoCs such as long latency and power overhead. A low-latency routing algorithm for 3D-NoC is designed to accommodate high-speed communication between cores. Both simulation and analytical models are applied to estimate the communication latency of NoCs. Generally, simulations are time-consuming and slow down the design process. Analytical models provide, within a fraction of the time, nearly accurate results which can be used by simulation to fine-tune the design. In this paper, a high performance and adaptive routing algorithm has been proposed for partially connected 3D-NoCs. Latency of the routing algorithm under different traffic patterns, different number of elevators and different elevator assignment mechanisms are reported. An analytical model, tailored to the adaptivity of the algorithm and under low traffic scenarios, has been developed and the results have been verified by simulation. According to the results, simulation and analytical results are consistent within a 10 percent margin.

  • 15.
    Teslenko, Maxim
    et al.
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Mikroelektronik och Informationsteknik, IMIT.
    Martinelli, Andrés
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Mikroelektronik och Informationsteknik, IMIT.
    Dubrova, Elena
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Mikroelektronik och Informationsteknik, IMIT.
    Bound-set preserving ROBDD variable orderings may not be optimum2005Inngår i: I.E.E.E. transactions on computers (Print), ISSN 0018-9340, E-ISSN 1557-9956, Vol. 54, nr 2, s. 236-237Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    This paper reports a result concerning the relation between the best variable orderings of an ROBDD G(f) and the decomposition structure of the Boolean function f represented by G(f). It was stated in [1] that, if f has a decomposition of type f(X) = g(h(1)(Y-1), h(2)(Y-2), ..., h(k)(Y-k)), where {Y-i}, i is an element of {1, 2, ..., k}, is a partition of X, then one of the orderings which keeps the variables within the sets {Y-i} adjacent is a best ordering for G(f). Using a counterexample, we show that this statement is incorrect.

  • 16. Tran, Kim-Anh
    et al.
    Carlson, Trevor E.
    Koukos, Konstantinos
    KTH.
    Själander, Magnus
    Spiliopoulos, Vasileios
    Kaxiras, Stefanos
    Jimborean, Alexandra
    Static Instruction Scheduling for High Performance on Limited Hardware2018Inngår i: I.E.E.E. transactions on computers (Print), ISSN 0018-9340, E-ISSN 1557-9956, Vol. 67, nr 4, s. 513-527Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Complex out-of-order (OoO) processors have been designed to overcome the restrictions of outstanding long-latency misses at the cost of increased energy consumption. Simple, limited OoO processors are a compromise in terms of energy consumption and performance, as they have fewer hardware resources to tolerate the penalties of long-latency loads. In worst case, these loads may stall the processor entirely. We present Clairvoyance, a compiler based technique that generates code able to hide memory latency and better utilize simple OoO processors. By clustering loads found across basic block boundaries, Clairvoyance overlaps the outstanding latencies to increases memory-level parallelism. We show that these simple OoO processors, equipped with the appropriate compiler support, can effectively hide long-latency loads and achieve performance improvements for memory-bound applications. To this end, Clairvoyance tackles (i) statically unknown dependencies, (ii) insufficient independent instructions, and (iii) register pressure. Clairvoyance achieves a geomean execution time improvement of 14 percent for memory-bound applications, on top of standard O3 optimizations, while maintaining compute-bound applications' high-performance.

  • 17.
    Wang, Junshi
    et al.
    Univ Elect Sci & Technol China, Chengdu 610054, Sichuan, Peoples R China.;Beijing Zhaoxin Elect Technol Co Ltd, Beijing 100084, Peoples R China..
    Ebrahimi, Masoumeh
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Elektronik.
    Huang, Letian
    Univ Elect Sci & Technol China, Chengdu 610054, Sichuan, Peoples R China..
    Xie, Xuan
    Univ Elect Sci & Technol China, Chengdu 610054, Sichuan, Peoples R China..
    Li, Qiang
    Univ Elect Sci & Technol China, Chengdu 610054, Sichuan, Peoples R China..
    Li, Guangjun
    Univ Elect Sci & Technol China, Chengdu 610054, Sichuan, Peoples R China..
    Jantsch, Axel
    Tech Univ Wien, A-1040 Vienna, Austria..
    Efficient Design-for-Test Approach for Networks-on-Chip2019Inngår i: I.E.E.E. transactions on computers (Print), ISSN 0018-9340, E-ISSN 1557-9956, Vol. 68, nr 2, s. 198-213Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    To achieve high reliability in on-chip networks, it is necessary to test the network continuously with Built-in Self-Tests (BIST) so that the faults can be detected quickly and the number of affected packets can be minimized. However, BISTcauses significant performance loss due to data dependencies. We propose EsyTest, a comprehensive test strategy with minimized influence on system performance. EsyTest tests the data path and the control path separately. The data path test starts periodically, but the actual test performs in the free time slots to avoid deactivating the router for testing. A reconfigurable router architecture and an adaptive fault-tolerant routing algorithm are proposed to guarantee the access to the processing core when the associated router is under test. During the whole test procedure of the network, all processing cores are accessible, and thus the system performance is maintained during the test. At the same time, EsyTest provides a full test coverage for the NoC and a better hardware compatibility comparing with the existing test strategies. Under the PARSEC benchmark and different test frequencies, the execution time increases less than 5 percent at the cost of 9.9 percent more area and 4.6 percent more power in comparison with the execution where no test procedure is applied.

  • 18. Wang, Xiaohang
    et al.
    Zhao, Baoxin
    Mak, Terrence
    Yang, Mei
    Jiang, Yingtao
    Daneshtalab, Masoud
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektronik och Inbyggda System.
    On Fine-Grained Runtime Power Budgeting for Networks-on-Chip Systems2016Inngår i: I.E.E.E. transactions on computers (Print), ISSN 0018-9340, E-ISSN 1557-9956, Vol. 65, nr 9, s. 2780-2793Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Power budgeting is an essential aspect of networks-on-chip (NoC) to meet the power constraint for on-chip communications while assuring the best possible overall system performance. For simplicity and ease of implementation, existing NoC power budgeting schemes treat all the individual routers uniformly when allocating power to them. However, such homogeneous power budgeting schemes ignore the fact that the workloads of different NoC routers may vary significantly, and thus may provide excess power to routers with low workloads, whereas insufficient power to those with high workloads. In this paper, we formulate the NoC power budgeting problem in order to optimize the network performance over a power budget through per-router frequency scaling. We take into account of heterogeneous workloads across different routers as imposed by variations in traffic. Correspondingly, we propose a fine-grained solution using an agile algorithm with low time complexity. Frequency of each router is set individually according to its contribution to the average network latency while meeting the power budget. Experimental results have confirmed that with fairly low runtime and hardware overhead, the proposed scheme can help save up to 50 percent application execution time when compared with the latest proposed methods.

  • 19. Xiong, Q.
    et al.
    Wu, F.
    Lu, Zhonghai
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektronik.
    Xie, C.
    Extending Real-Time Analysis for Wormhole NoCs2017Inngår i: I.E.E.E. transactions on computers (Print), ISSN 0018-9340, E-ISSN 1557-9956, Vol. 66, nr 9, s. 1532-1546, artikkel-id 7884964Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    The delay upper-bound analysis problem is of fundamental importance to real-Time applications in Network-on-Chips (NoCs). In the paper, we revisit two state-of-The-Art analysis models for real-Time communication in wormhole NoCs with priority-based preemptive arbitration and show that the models only support specific router architectures with large buffer sizes. We then propose an extended analysis model to estimate delay upper-bounds for all router architectures and buffer sizes by identifying and analyzing the differences between upstream and downstream indirect interferences according to the relative positions of traffic flows and taking the buffer influence into consideration. Simulated evaluations show that our model supports one more router architecture and applies to small buffer sizes compared to the previous models.

  • 20. Yakovlev, A.
    et al.
    Furber, S.
    Krenz, René
    KTH, Tidigare Institutioner, Elektroniksystemkonstruktion.
    Bystrov, A.
    Design and analysis of a self-timed duplex communication system2004Inngår i: I.E.E.E. transactions on computers (Print), ISSN 0018-9340, E-ISSN 1557-9956, Vol. 53, nr 7, s. 798-814Artikkel i tidsskrift (Fagfellevurdert)
1 - 20 of 20
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf