kth.sePublications KTH
Change search
Link to record
Permanent link

Direct link
Publications (10 of 94) Show all publications
Jantsch, A. (2018). Models of Computation for Distributed Embedded Systems (2nd ed.ed.). In: Embedded Systems Handbook: Second Edition: (pp. 1-3). CRC Press
Open this publication in new window or tab >>Models of Computation for Distributed Embedded Systems
2018 (English)In: Embedded Systems Handbook: Second Edition, CRC Press , 2018, 2nd ed., p. 1-3Chapter in book (Other academic)
Place, publisher, year, edition, pages
CRC Press, 2018 Edition: 2nd ed.
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-332342 (URN)2-s2.0-85148403824 (Scopus ID)
Note

Part of ISBN 9781420074116 9781315218687

QC 20230723

Available from: 2023-07-23 Created: 2023-07-23 Last updated: 2023-07-23Bibliographically approved
Wang, J., Huang, L., Ebrahimi, M., Li, Q., Li, G. & Jantsch, A. (2017). Non-blocking BIST for continuous reliability monitoring of Networks-on-Chip. In: 2017 IEEE International Symposium on Circuits and Systems (ISCAS): . Paper presented at 50th IEEE International Symposium on Circuits and Systems, ISCAS 2017, Baltimore, United States, 28 May 2017 through 31 May 2017. Institute of Electrical and Electronics Engineers (IEEE), Article ID 8050828.
Open this publication in new window or tab >>Non-blocking BIST for continuous reliability monitoring of Networks-on-Chip
Show others...
2017 (English)In: 2017 IEEE International Symposium on Circuits and Systems (ISCAS), Institute of Electrical and Electronics Engineers (IEEE), 2017, article id 8050828Conference paper, Published paper (Refereed)
Abstract [en]

To achieve high reliability in on-chip networks, frequent runs of Built-in Self-Test allow the detection of and recovery from faults before they affect packets and the system functionality. However, to test routers, wrappers isolate cores from the network which leads to execution blocking and performance loss. In this paper, we propose a design-for-test reconfigurable router with two alternative bypassing channels. The router architecture allows maintaining the connection between cores and the network during the testing procedure by utilizing the bypassing channels. With the help of an adaptive routing algorithm and a testing strategy, networks can be fully tested at a high testing frequency with <15% increase of execution time.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2017
Series
Proceedings - IEEE International Symposium on Circuits and Systems, ISSN 0271-4310
National Category
Other Engineering and Technologies
Identifiers
urn:nbn:se:kth:diva-217486 (URN)10.1109/ISCAS.2017.8050828 (DOI)000439261800057 ()2-s2.0-85032686628 (Scopus ID)9781467368520 (ISBN)
Conference
50th IEEE International Symposium on Circuits and Systems, ISCAS 2017, Baltimore, United States, 28 May 2017 through 31 May 2017
Note

QC 20171113

Available from: 2017-11-13 Created: 2017-11-13 Last updated: 2024-03-15Bibliographically approved
Zhang, X., Ebrahimi, M., Huang, L., Li, G. & Jantsch, A. (2015). A Network-Level Solution for Fault Detection, Masking, and Tolerance in NoCs. In: : . Paper presented at PDP 2015, the 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, Turku, Finland, March 4-6 2015.
Open this publication in new window or tab >>A Network-Level Solution for Fault Detection, Masking, and Tolerance in NoCs
Show others...
2015 (English)Conference paper, Published paper (Refereed)
National Category
Embedded Systems
Identifiers
urn:nbn:se:kth:diva-169446 (URN)
Conference
PDP 2015, the 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, Turku, Finland, March 4-6 2015
Note

QC 20150615

Available from: 2015-06-15 Created: 2015-06-15 Last updated: 2024-03-15Bibliographically approved
Zhang, X., Ebrahimi, M., Huang, L., Li, G. & Jantsch, A. (2015). A routing-level solution for fault Detection, masking, and Tolerance in NoCs. In: Proceedings - 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, PDP 2015: . Paper presented at 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, PDP 2015, 4 March 2015 through 6 March 2015 (pp. 365-369). IEEE conference proceedings
Open this publication in new window or tab >>A routing-level solution for fault Detection, masking, and Tolerance in NoCs
Show others...
2015 (English)In: Proceedings - 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, PDP 2015, IEEE conference proceedings, 2015, p. 365-369Conference paper, Published paper (Refereed)
Abstract [en]

Faults may occur in numerous locations of a router in a NoC platform. Compared with the faults in the data path, faults in the control path may cause more severe effects which may result in crashing the entire system. Most of the current efforts in literature focus on disabling a router when a fault is detected. Considering this level of coarse-granularity, the functioning parts of a router have to be unnecessarily disabled which may severely affect the performance or functionality of the on-chip network. To cope with this problem, in this paper we propose a mechanism to tolerate faults in the control path which largely avoid disabling a router as long as the fault is not severe. This mechanism is called DMT, standing for three distinguishing characteristics of the proposed method as fault Detection, fault Masking and fault Tolerance. The proposed mechanism can efficiently detect the faults expressed as illegal turns while it has the capability to tolerate faults without a prior knowledge on where and why a fault has happened.

Place, publisher, year, edition, pages
IEEE conference proceedings, 2015
Keywords
Fault-tolerance, Network-on-chip, Fault tolerance, Fault tolerant computer systems, Routers, VLSI circuits, Control path, Data paths, Entire system, Fault masking, On-chip networks, Prior knowledge, Tolerate faults, Fault detection
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-187509 (URN)10.1109/PDP.2015.87 (DOI)000380471500055 ()2-s2.0-84962815511 (Scopus ID)9781479984909 (ISBN)
External cooperation:
Conference
23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, PDP 2015, 4 March 2015 through 6 March 2015
Note

Funding Details: Academy of Finland

QC 20160614

Available from: 2016-06-14 Created: 2016-05-25 Last updated: 2024-03-15Bibliographically approved
Liu, S., Lu, Z. & Jantsch, A. (2015). Highway in TDM NoCs. In: Proceedings of the Ninth ACM/IEEE International Symposium on Networks-on-Chip (NoCS'15): . Paper presented at the 9th International Symposium on Networks-on-Chip, Vancouver, Canada, 2015. ACM Digital Library
Open this publication in new window or tab >>Highway in TDM NoCs
2015 (English)In: Proceedings of the Ninth ACM/IEEE International Symposium on Networks-on-Chip (NoCS'15), ACM Digital Library, 2015Conference paper, Published paper (Refereed)
Abstract [en]

TDM (Time Division Multiplexing) is a well-known technique to provide QoS guarantees in NoCs. However, unused time slots commonly exist in TDM NoCs. In the paper, we propose a TDM highway technique which can enhance the slot utilization of TDM NoCs. A TDM highway is an express TDM connection composed of special buffer queues, called highway channels (HWCs). It can enhance the throughput and reduce data transfer delay of the connection, while keeping the quality of service (QoS) guarantee on minimum bandwidth and in-order packet delivery. We have developed a dynamic and repetitive highway setup policy which has no dependency on particular TDM NoC techniques and no overhead on traffic flows. As a result, highways can be efficiently established and utilized in various TDM NoCs.

According to our experiments, compared to a traditional TDM NoC, adding one HWC with two buffers to every input port of routers in an 8×8 mesh can reduce data delay by up to 80% and increase the maximum throughput by up to 310%. More improvements can be achieved by adding more HWCs per input per router, or more buffers per HWC. We also use a set of MPSoC application benchmarks to evaluate our highway technique. The experiment results suggest that with highway, we can reduce application run time up to 51%.

Place, publisher, year, edition, pages
ACM Digital Library, 2015
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-176630 (URN)10.1145/2786572.2786577 (DOI)2-s2.0-84984653822 (Scopus ID)978-1-4503-3396-2 (ISBN)
Conference
the 9th International Symposium on Networks-on-Chip, Vancouver, Canada, 2015
Note

QC 20151109

Available from: 2015-11-09 Created: 2015-11-09 Last updated: 2024-03-15Bibliographically approved
Haghbayan, M.-H., Kanduri, A., Rahmani, A.-M., Liljeberg, P., Jantsch, A. & Tenhunen, H. (2015). MapPro: Proactive Runtime Mapping for Dynamic Workloads by Quantifying Ripple Effect of Applications on Networks-on-Chip. In: NOCS '15 Proceedings of the 9th International Symposium on Networks-on-Chip: . Paper presented at IEEE/ACM International Symposium on Networks-on-Chip (NOCS),September 28-30 2015, Vancouver, Canada. ACM Digital Library
Open this publication in new window or tab >>MapPro: Proactive Runtime Mapping for Dynamic Workloads by Quantifying Ripple Effect of Applications on Networks-on-Chip
Show others...
2015 (English)In: NOCS '15 Proceedings of the 9th International Symposium on Networks-on-Chip, ACM Digital Library, 2015Conference paper, Published paper (Refereed)
Abstract [en]

Increasing dynamic workloads running on NoC-based many-core systems necessitates efficient runtime mapping strategies. With an unpredictable nature of application profiles, selecting a rational region to map an incoming application is an NP-hard problem in view of minimizing congestion and maximizing performance. In this paper, we propose a proactive region selection strategy which prioritizes nodes that offer lower congestion and dispersion. Our proposed strategy, MapPro, quantitatively represents the propagated impact of spatial availability and dispersion on the network with every new mapped application. This allows us to identify a suitable region to accommodate an incoming application that results in minimal congestion and dispersion. We cluster the network into squares of different radii to suit applications of different sizes and proactively select a suitable square for a new application, eliminating the overhead caused with typical reactive mapping approaches. We evaluated our proposed strategy over different traffic patterns and observed gains of up to 41% in energy efficiency, 28% in congestion and 21% dispersion when compared to the state-of-the-art region selection methods.

Place, publisher, year, edition, pages
ACM Digital Library, 2015
National Category
Computer Systems Embedded Systems
Identifiers
urn:nbn:se:kth:diva-169509 (URN)
Conference
IEEE/ACM International Symposium on Networks-on-Chip (NOCS),September 28-30 2015, Vancouver, Canada
Note

QC 20160212

Available from: 2015-06-15 Created: 2015-06-15 Last updated: 2024-03-15Bibliographically approved
Feng, C., Liao, Z., Lu, Z., Jantsch, A. & Zhao, Z. (2015). Performance analysis of on-chip bufferless router with multi-ejection ports. In: Proceedings - 2015 IEEE 11th International Conference on ASIC, ASICON 2015: . Paper presented at 11th IEEE International Conference on Advanced Semiconductor Integrated Circuits (ASIC), ASICON 2015, 3 November 2015 through 6 November 2015. IEEE conference proceedings
Open this publication in new window or tab >>Performance analysis of on-chip bufferless router with multi-ejection ports
Show others...
2015 (English)In: Proceedings - 2015 IEEE 11th International Conference on ASIC, ASICON 2015, IEEE conference proceedings, 2015Conference paper, Published paper (Refereed)
Abstract [en]

In general, the bufferless NoC router has only one local output port for ejection, which may lead to multiple arriving flits competing for the only one output port. In this paper, we propose a reconfigurable bufferless router in which the number of ejection ports can be configured as 2, 3 and 4. Simulation results demonstrate that the average packet latency of the routers with multi-ejection ports is 18%, 10%, 6%, 14%, 9% and 7% on average less than that of the router with 1 ejection ports under six synthetic workloads respectively. For application workloads, the average packet latency of the router with more than two ejection ports is slightly better than the router with only one ejection port, which can be neglect. Making a compromise of hardware cost and performance, it can be concluded that it is no need to implement bufferless routers with 3 and 4 ejection ports, as the router with 2 ejection ports can achieve almost the same performance as the routers with 3 and 4 ejection ports.

Place, publisher, year, edition, pages
IEEE conference proceedings, 2015
Keywords
Network-on-chip, Reconfigurable hardware, Average packet latencies, Bufferless routers, Hardware cost, Local output, Output ports, Performance analysis, Reconfigurable, Synthetic workloads, Routers
National Category
Computer Engineering
Identifiers
urn:nbn:se:kth:diva-197128 (URN)10.1109/ASICON.2015.7517174 (DOI)000398709000300 ()2-s2.0-84982242165 (Scopus ID)9781479984831 (ISBN)
Conference
11th IEEE International Conference on Advanced Semiconductor Integrated Circuits (ASIC), ASICON 2015, 3 November 2015 through 6 November 2015
Note

QC 20161214

Available from: 2016-12-14 Created: 2016-11-30 Last updated: 2024-03-15Bibliographically approved
Liu, S., Jantsch, A. & Lu, Z. (2014). A Fair and Maximal Allocator for Single-Cycle On-Chip Homogeneous Resource Allocation. IEEE Transactions on Very Large Scale Integration (vlsi) Systems, 23(10), 2229-2233
Open this publication in new window or tab >>A Fair and Maximal Allocator for Single-Cycle On-Chip Homogeneous Resource Allocation
2014 (English)In: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 23, no 10, p. 2229-2233Article in journal (Refereed) Published
Abstract [en]

Traditional allocators for network-on-chip (NoC) routers suffer from either poor-matching quality or limited fairness. We propose a waterfall (WTF) allocator targeting homogeneous resource allocation, which provides single-cycle maximal matching while guaranteeing strong fairness based on the round-robin principle. It can be implemented with a loop-free structure. In 90 nm technology, the allocator operates at about 1 GHz clock frequency. We compare WTF with wave-front, separable-input-first, and separable-output-first allocators and find that it is at least 10% smaller, has 50% less delay under high load, and uses 3% less power than any of these alternatives. Also, WTF is at least as fair or clearly fairer. We also find that in a 4 x 4 circuit switched NoC the use of WTF gives up to 20% higher network performance.

Keywords
Allocator, fairness, maximal matching, network-on-chip (NoC), round-robin
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-155469 (URN)10.1109/TVLSI.2013.2284563 (DOI)000343014100020 ()2-s2.0-84907660701 (Scopus ID)
Note

QC 20141112

Available from: 2014-11-12 Created: 2014-11-06 Last updated: 2024-03-15Bibliographically approved
Zhang, Y., Li, L., Lu, Z., Jantsch, A., Gao, M., Pan, H. & Han, F. (2014). A survey of memory architecture for 3D chip multi-processors. Microprocessors and microsystems, 38(5), 415-430
Open this publication in new window or tab >>A survey of memory architecture for 3D chip multi-processors
Show others...
2014 (English)In: Microprocessors and microsystems, ISSN 0141-9331, E-ISSN 1872-9436, Vol. 38, no 5, p. 415-430Article in journal (Refereed) Published
Abstract [en]

3D chip multi-processors (3D CMPs) combine the advantages of 3D integration and the parallelism of CMPs, which are emerging as active research topics in VLSI and multi-core computer architecture communities. One significant potentiality of 3D CMPs is to exploit the diversity of integration processes and high volume of vertical TSV bandwidth to mitigate the well-known "Memory Wall" problem. Meanwhile, the 3D integration techniques are under the severe thermal, manufacture yield and cost constraints. Research on 3D stacking memory hierarchy explores the high performance and power/thermal efficient memory architectures for 3D CMPs. The micro-architectures of memories can be designed in the 3D integrated circuit context and integrated into 3D CMPs. This paper surveys the design of memory architectures for 3D CMPs. We summarize current research into two categories: stacking cache-only architectures and stacking main memory architectures for 3D CMPs. The representative works are reviewed and the remaining opportunities and challenges are discussed to guide the future research in this emerging area.

Keywords
3D integrated circuit, Chip multi-processor, Memory architecture, Non-uniform cache architecture
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-150538 (URN)10.1016/j.micpro.2014.03.007 (DOI)000340300900005 ()2-s2.0-84903304051 (Scopus ID)
Note

QC 20140908

Available from: 2014-09-08 Created: 2014-09-05 Last updated: 2024-03-15Bibliographically approved
Chen, X., Lu, Z., Jantsch, A., Chen, S., Guo, Y. & Liu, H. (2014). Cooperative communication for efficient and scalable all-to-all barrier synchronization on mesh-based many-core NoCs. IEICE Electronics Express, 11(18), 20140542
Open this publication in new window or tab >>Cooperative communication for efficient and scalable all-to-all barrier synchronization on mesh-based many-core NoCs
Show others...
2014 (English)In: IEICE Electronics Express, E-ISSN 1349-2543, Vol. 11, no 18, p. 20140542-Article in journal (Refereed) Published
Abstract [en]

On many-core Network-on-Chips (NoCs), communication is on the critical path of system performance and contended synchronization requests may cause large performance penalty. Different from conventional algorithm-based approaches, the paper addresses the barrier synchronization problem from the angle of optimizing its communication performance and proposes cooperative communication as a means to achieve efficient and scalable all-to-all barrier synchronization on mesh-based many-core NoCs. With the cooperative communication, routers collaborate with one another to accomplish a fast barrier synchronization task. The cooperative communication is implemented in our router at low cost. Through comparative experiments, our approach evidently exhibits high efficiency and good scalability.

Keywords
cooperative communication, all-to-all barrier synchronization, many-core Network-on-Chips
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-158310 (URN)10.1587/elex.11.20140542 (DOI)000344925900002 ()2-s2.0-84907809112 (Scopus ID)
Note

QC 20150107

Available from: 2015-01-07 Created: 2015-01-07 Last updated: 2024-03-15Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0003-2251-0004

Search in DiVA

Show all publications