Change search
Refine search result
1 - 28 of 28
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1. Ben Dhaou, I.
    et al.
    Tenhunen, Hannu
    KTH, Superseded Departments, Electronic Systems Design.
    Efficient library characterization for high-level power estimation2004In: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 12, no 6, p. 657-661Article in journal (Refereed)
    Abstract [en]

    This paper describes LP-DSM, which is an algorithm used for efficient library characterization in high-level power estimation. LP-DSM characterizes the power consumption of building blocks using the entropy of primary inputs and primary outputs. The experimental results showed that over a wide range of benchmark circuits implemented using full custom design in 0.35-mum 3.3 V CMOS process the statistical performance (mean and maximum error) of LP-DSM is comparable or sometimes better than most of the published algorithms. Moreover, it was found that LP-DSM has the lowest prediction sum of squares, which makes it an efficient tool for power prediction. Furthermore, the complexity of the LP-DSM is linear in relation to the number of primary inputs (O(NI)), whereas state of the art published library characterization algorithms have a complexity of O(NI2).

  • 2. Bjureus, P.
    et al.
    Jantsch, Axel
    KTH, Superseded Departments, Microelectronics and Information Technology, IMIT.
    Modeling of mixed control and dataflow systems in MASCOT2001In: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 9, no 5, p. 690-703Article in journal (Refereed)
    Abstract [en]

    The Matlab and SDL Codesign Technique (MASCOT) method integrates modeling of data flow and control dominated parts at the system level. Based on the established languages specification and description language (SDL) and Matlab, MASCOT provides a modeling and simulation technique which realizes the communication and synchronization between the two domains. Moreover, it offers modeling guidelines for a disciplined and efficient way of using the technique. Most of the tedious details of modeling synchronization and communication is handled automatically and is transparent to the user. Consequently, the user can focus on the application and on the important tradeoffs to be made at the system level.

  • 3.
    Chabloz, Jean-Michel
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Low-Latency Maximal-Throughput Communication Interfaces for Rationally Related Clock Domains2014In: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 22, no 3, p. 641-654Article in journal (Refereed)
    Abstract [en]

    In this paper, we introduce a source-synchronous adaptive interface for the globally ratiochronous, locally synchronous design style, a subset of the globally asynchronous, locally synchronous (GALS) design style in which the frequencies of all clocks are not phase-aligned but are constrained to be rationally related, i.e., they are all submultiple of the same physical or virtual frequency. The interface can be designed using only standard cells and guarantees maximal throughput in addition to an average latency four times lower compared with state-of-the-art asynchronous first-input, first-output GALS interfaces. Several properties of the interface are formally stated and proved. We also demonstrate that the interface has a low area overhead, with only four flip-flops per data line, and is robust against nonidealities such as clock jitters and propagation delay misalignments. For a realistic link in 90-nm application-specific integrated circuit technology, we derive a 1-GHz upper bound for the least common multiple among the frequencies.

  • 4.
    Chen, Xiaowen
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS). Natl Univ Def Technol, Coll Comp, Changsha 410073, Hunan, Peoples R China.
    Lei, Yuanwu
    Natl Univ Def Technol, Coll Comp, Changsha 410073, Hunan, Peoples R China..
    Lu, Zhonghai
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics, Electronic and embedded systems.
    Chen, Shuming
    Natl Univ Def Technol, Coll Comp, Changsha 410073, Hunan, Peoples R China..
    A Variable-Size FFT Hardware Accelerator Based on Matrix Transposition2018In: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 26, no 10, p. 1953-1966Article in journal (Refereed)
    Abstract [en]

    Fast Fourier transform (FFT) is the kernel and the most time-consuming algorithm in the domain of digital signal processing, and the FFT sizes of different applications are very different. Therefore, this paper proposes a variable-size FFT hardware accelerator, which fully supports the IEEE-754 single-precision floating-point standard and the FFT calculation with a wide size range from 2 to 220 points. First, a parallel Cooley-Tukey FFT algorithm based on matrix transposition (MT) is proposed, which can efficiently divide a large size FFT into several small size FFTs that can be executed in parallel. Second, guided by this algorithm, the FFT hardware accelerator is designed, and several FFT performance optimization techniques such as hybrid twiddle factor generation, multibank data memory, block MT, and token-based task scheduling are proposed. Third, its VLSI implementation is detailed, showing that it can work at 1 GHz with the area of 2.4 mm(2) and the power consumption of 91.3 mW at 25 degrees C, 0.9 V. Finally, several experiments are carried out to evaluate the proposal's performance in terms of FFT execution time, resource utilization, and power consumption. Comparative experiments show that our FFT hardware accelerator achieves at most 18.89x speedups in comparison to two software-only solutions and two hardware-dedicated solutions.

  • 5. Eles, Petru
    et al.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
    Special Section on Hardware/Software Codesign and System Synthesis I: Guest editorial2006In: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 14, no 7, p. 665-666Article in journal (Other academic)
  • 6. Eles, Petru
    et al.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Special Section on Hardware/Software Codesign and System Synthesis II: Guest editorial2006In: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 14, no 8, p. 789-790Article in journal (Other academic)
  • 7.
    Eslami Kiasari, Abbas
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    An Analytical Latency Model for Networks-on-Chip2013In: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 21, no 1, p. 113-123Article in journal (Refereed)
    Abstract [en]

    We propose an analytical model based on queueing theory for delay analysis in a wormhole-switched network-on-chip (NoC). The proposed model takes as input an application communication graph, a topology graph, a mapping vector, and a routing matrix, and estimates average packet latency and router blocking time. It works for arbitrary network topology with deterministic routing under arbitrary traffic patterns. This model can estimate per-flow average latency accurately and quickly, thus enabling fast design space exploration of various design parameters in NoC designs. Experimental results show that the proposed analytical model can predict the average packet latency more than four orders of magnitude faster than an accurate simulation, while the computation error is less than 10% in non-saturated networks for different system-on-chip platforms.

  • 8. Feng, C.
    et al.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Zhang, M.
    Xing, Z.
    Addressing transient and permanent faults in NoC with efficient fault-tolerant deflection router2013In: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 21, no 6, p. 1053-1066Article in journal (Refereed)
    Abstract [en]

    Continuing decrease in the feature size of integrated circuits leads to increases in susceptibility to transient and permanent faults. This paper proposes a fault-tolerant solution for a bufferless network-on-chip, including an on-line fault-diagnosis mechanism to detect both transient and permanent faults, a hybrid automatic repeat request, and forward error correction link-level error control scheme to handle transient faults and a reinforcement-learning-based fault-tolerant deflection routing (FTDR) algorithm to tolerate permanent faults without deadlock and livelock. A hierarchical-routing-table-based algorithm (FTDR-H) is also presented to reduce the area overhead of the FTDR router. Synthesized results show that, compared with the FTDR router, the FTDR-H router can reduce the area by 27% in an 8×8 network. Simulation results demonstrate that under synthetic workloads, in the presence of permanent link faults, the throughput of an 8×8 network with FTDR and FTDR-H algorithms are 14% and 23% higher on average than that with the fault-on-neighbor (FoN) aware deflection routing algorithm and the cost-based deflection routing algorithm, respectively. Under real application workloads, the FTDR-H algorithm achieves 20% less hop counts on average than that of the FoN algorithm. For transient faults, the performance of the FTDR router can achieve graceful degradation even at a high fault rate. We also implement the fault-tolerant deflection router which can achieve 400 MHz in TSMC 65-nm technology.

  • 9. Jafari, Fahimeh
    et al.
    Jantsch, Axel
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronics.
    Weighted Round Robin Configuration for Worst-Case Delay Optimization in Network-on-Chip2016In: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 24, no 12, p. 3387-3400Article in journal (Refereed)
    Abstract [en]

    We propose an approach for computing the end-to-end delay bound of individual variable bit-rate flows in an First Input First Output multiplexer with aggregate scheduling under weighted round robin (WRR) policy. To this end, we use a network calculus to derive per-flow end-to-end equivalent service curves employed for computing least upper delay bounds (LUDBs) of the individual flows. Since the real-time applications are going to meet guaranteed services with lower delay bounds, we optimize the weights in WRR policy to minimize the LUDBs while satisfying the performance constraints. We formulate two constrained delay optimization problems, namely, minimize-delay and multiobjective optimization. Multiobjective optimization has both the total delay bounds and their variance as the minimization objectives. The proposed optimizations are solved using a genetic algorithm. A video object plane decoder case study exhibits a 15.4% reduction of the total worst case delays and a 40.3% reduction on the variance of delays when compared with round robin policy. The optimization algorithm has low run-time complexity, enabling quick exploration of the large design spaces. We conclude that an appropriate weight allocation can be a valuable instrument for the delay optimization in on-chip network designs.

  • 10.
    Jafri, Syed Mohammad Asad Hassan
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems. University of Turku, Finland.
    Tajammul, Muhammad Adeel
    KTH.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Paul, Kolin
    Plosila, Juha
    Ellervee, Peeter
    Tenuhnen, Hannu
    KTH, School of Information and Communication Technology (ICT), Industrial and Medical Electronics.
    Polymorphic Configuration Architecture for CGRAs2016In: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 24, no 1, p. 403-407Article in journal (Refereed)
    Abstract [en]

    In the era of platforms hosting multiple applications with arbitrary reconfiguration requirements, static configuration architectures are neither optimal nor desirable. The static reconfiguration architectures either incur excessive overheads or cannot support advanced features (like time-sharing and runtime parallelism). As a solution to this problem, we present a polymorphic configuration architecture (PCA) that provides each application with a configuration infrastructure tailored to its needs.

  • 11. Kanduri, Anil
    et al.
    Haghbayan, Mohammad-Hashem
    Rahmani, Amir M.
    Liljeberg, Pasi
    Jantsch, Axel
    Tenhunen, Hannu
    KTH, School of Information and Communication Technology (ICT), Electronics, Integrated devices and circuits. University of Turku, Finland.
    Dutt, Nikil
    Accuracy-Aware Power Management for Many-Core Systems Running Error-Resilient Applications2017In: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 25, no 10, p. 2749-2762Article in journal (Refereed)
    Abstract [en]

    Power capping techniques based on dynamic voltage and frequency scaling (DVFS) and power gating (PG) are oriented toward power actuation, compromising on performance and energy. Inherent error resilience of emerging application domains, such as Internet-of-Things (IoT) and machine learning, provides opportunities for energy and performance gains. Leveraging accuracy-performance tradeoffs in such applications, we propose approximation (APPX) as another knob for close-looped power management, to complement power knobs with performance and energy gains. We design a power management framework, APPEND+, that can switch between accurate and approximate modes of execution subject to system throughput requirements. APPEND+ considers the sensitivity of the application to error to make disciplined alteration between levels of APPX such that performance is maximized while error is minimized. We implement a power management scheme that uses APPX, DVFS, and PG knobs hierarchically. We evaluated our proposed approach over machine learning and signal processing applications along with two case studies on IoT-early warning score system and fall detection. APPEND+ yields 1.9x higher throughput, improved latency up to five times, better performance per energy, and dark silicon mitigation compared with the state-of-the-art power management techniques over a set of applications ranging from high to no error resilience.

  • 12.
    Liu, Lizheng
    et al.
    Fudan Univ, State Key Lab ASIC & Syst, Shanghai 200433, Peoples R China..
    Jin, Yi
    Fudan Univ, State Key Lab ASIC & Syst, Shanghai 200433, Peoples R China..
    Liu, Yi
    Fudan Univ, State Key Lab ASIC & Syst, Shanghai 200433, Peoples R China..
    Ma, Ning
    KTH, School of Information and Communication Technology (ICT).
    Huan, Yuxiang
    KTH, School of Information and Communication Technology (ICT).
    Zou, Zhuo
    Fudan Univ, State Key Lab ASIC & Syst, Shanghai 200433, Peoples R China..
    Zheng, Lirong
    Fudan Univ, State Key Lab ASIC & Syst, Shanghai 200433, Peoples R China.;KTH Royal Inst Technol, Sch Informat & Commun Technol, S-16440 Kista, Sweden..
    A Design of Autonomous Error-Tolerant Architectures for Massively Parallel Computing2018In: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 26, no 10, p. 2143-2154Article in journal (Refereed)
    Abstract [en]

    The massively parallel computing systems composed of many processors are connected on chips, which will become more and more complex and unreliable. This paper presents an error-tolerant design based on the autonomous error-tolerant (AET) architecture that aims to have a self-repairing capability. A nearby error sensing mechanism is designed to discover faults, and an active evolution scheme is studied to handle unrecoverable errors. A circuit backup switching mechanism is proposed to bypass the failed nodes. The board-level prototype is implemented based on dual-core embedded processors. The analysis shows that the error-tolerant capability of the proposed architecture is better than the conventional multimodular redundant system when the failure rate of a single core is less than 0.7. In the AET test system consisting of 16 processors, the error-tolerant capability is verified. The results show that the relative variation of the overall performance of the AET system will not be changed due to the high reliability requirements of the system. Through experimental comparison, under the premise that the architecture of AET and the triple modular redundancy method are basically consistent in reliability, whether on the logical-level error tolerant or on the physical-level error tolerant, the former has lower power consumption.

  • 13.
    Liu, Shaoteng
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    A Fair and Maximal Allocator for Single-Cycle On-Chip Homogeneous Resource Allocation2014In: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 23, no 10, p. 2229-2233Article in journal (Refereed)
    Abstract [en]

    Traditional allocators for network-on-chip (NoC) routers suffer from either poor-matching quality or limited fairness. We propose a waterfall (WTF) allocator targeting homogeneous resource allocation, which provides single-cycle maximal matching while guaranteeing strong fairness based on the round-robin principle. It can be implemented with a loop-free structure. In 90 nm technology, the allocator operates at about 1 GHz clock frequency. We compare WTF with wave-front, separable-input-first, and separable-output-first allocators and find that it is at least 10% smaller, has 50% less delay under high load, and uses 3% less power than any of these alternatives. Also, WTF is at least as fair or clearly fairer. We also find that in a 4 x 4 circuit switched NoC the use of WTF gives up to 20% higher network performance.

  • 14.
    Lu, Zhonghai
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    TDM virtual-circuit configuration for network-on-chip2008In: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 16, no 8, p. 1021-1034Article in journal (Refereed)
    Abstract [en]

    In network-on-chip (NoC), time-division-multiplexing (TDM) virtual circuits (VCs) have been proposed to satisfy the quality-of-service requirements of applications. TDM VC is a connection-oriented communication service by which two or more connections take turns to share buffers and link bandwidth using dedicated time slots. In the paper, we first give a formulation of the multinode VC configuration problem for arbitrary NoC topologies. A multinode VC allows multiple source and destination nodes on it. Then we address the two problems of path selection and slot allocation for TDM VC configuration. For the path selection, we use a backtracking algorithm to explore the path diversity, constructively searching the solution space. In the slot allocation phase, overlapped VCs must be configured such that no conflict occurs and their bandwidth requirements are satisfied. We define the concept of a logical network (LN) as an infinite set of associated (time slot, buffer) pairs with respect to a buffer on a given VC. Based on this concept, we develop and prove theorems that constitute sufficient and necessary conditions to establish conflict-free VCs. They are applicable for networks where all nodes operate with the same clock frequency but allowing different phases. Using these theorems, slot allocation for VCs is a procedure of assigning VCs to different LNs. TDM VC configuration can thus be predictable and correct-by-construction. Our experiments on synthetic and real applications validate the effectiveness and efficiency of our approach.

  • 15.
    Lu, Zhonghai
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Yao, Yuan
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Dynamic Traffic Regulation in NoC-Based Systems2017In: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 25, no 2, p. 556-569Article in journal (Refereed)
    Abstract [en]

    In network-on-chip (NoC)-based systems, performance enhancement has primarily focused on the network itself, with little attention paid on controlling traffic injection at the network boundary. This is unsatisfactory because traffic may be over injected, aggravating congestion, and lowering performance. Recently, traffic regulation is proposed as an orthogonal means for performance improvement. Rather than as soon as possible admission, traffic regulation may hold back packet injection by admitting packets into the network only when the accumulated traffic volume at any time interval does not exceed a threshold. These regulation techniques are, however, often static, likely causing overregulation and underregulation. We propose dynamic traffic regulation to improve the system performance for NoC-based multi/many-processor systemson- chip (MPSoC) and chip multi/many-core processor (CMP) designs. It can be applied to MPSoCs for intellectual property integration in an open-loop fashion by injecting traffic according to its run-time profiled characteristics. It can also be applied to CMPs in a closed-loop fashion by admitting traffic fully adaptive to the traffic and network states. Through extensive experiments and results, we show that both the open-loop and closed-loop dynamic regulation techniques can significantly improve the network and system performance.

  • 16.
    Malik, Jamshaid Sarwar
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Malik, J. N.
    Silmane, Ben
    KTH, School of Information and Communication Technology (ICT), Communication Systems, CoS, Radio Systems Laboratory (RS Lab).
    Gohar, N. D.
    Revisiting central limit theorem: Accurate Gaussian random number generation in VLSI2015In: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 23, no 5, p. 842-855, article id 6834810Article in journal (Refereed)
    Abstract [en]

    Gaussian random numbers (GRNs) generated by central limit theorem (CLT) suffer from errors due to deviation from ideal Gaussian behavior for any finite number of additions. In this paper, we will show that it is possible to compensate the error in CLT, thereby correcting the resultant probability density function, particularly in the tail regions. We will provide a detailed mathematical analysis to quantify the error in CLT. This provides a design space with more than four degrees of freedom to build a variety of GRN generators (GRNGs). A framework utilizes this design space to generate customized hardware architectures. We will demonstrate designs of five different architectures of GRNGs, which vary in terms of consumed memory, logic slices, and multipliers on field-programmable gate array. Similarly, depending upon application, these architectures exhibit statistical accuracy from low (4 σ ) to extremely high (12 σ). A comparison with previously published designs clearly indicate advantages of this methodology in terms of both consumed hardware resources and accuracy. We will also provide synthesis results of same designs in application-specific integrated circuit using 65-nm standard cell library. Finally, we will highlight some shortcomings associated with such architectures followed by their remedies.

  • 17.
    Nigussie, E.
    et al.
    Turku Centre for Computer Science (TUCS).
    Tuuna, S.
    Turku Centre for Computer Science (TUCS).
    Plosila, J.
    Turku Centre for Computer Science (TUCS).
    Isoaho, J.
    Turku Centre for Computer Science (TUCS).
    Tenhunen, Hannu
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK.
    Semi-Serial On-Chip Link Implementation for Energy Efficiency and High Throughput2012In: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 20, no 12, p. 2265-2277Article in journal (Refereed)
    Abstract [en]

    A high-throughput and low-energy semi-serial on-chip communication link based on novel design techniques and circuit solutions is presented. This self-timed link is designed using high-speed serialization/deserializtion and pulse dual-rail encoding techniques. The link also employs wave-pipelined differential pulse current-mode signaling to maintain the high speed data intake from the serializer. The energy efficiency of the proposed semi-serial link, which consists of bit-serial links in parallel, mainly comes from the sharing of the novel serializer's control circuit among the bit-serial links. In addition, the integration of pulse signaling with wave-pipelining, the use of a new low-complexity data validity detection technique, and the avoidance of data decoding logic also contribute to the power reduction. Furthermore, the formulated pulse dual-rail encoding provides an opportunity to implement pulse signaling at no cost. The ability to detect data validity at bit level allows acknowledgment per word without losing the delay-insensitivity of the transmission. The proposed semi-serial link is analyzed and compared with bit-serial and fully bit-parallel links for 64-bit data and communication distances of 1 to 8 mm. The semi-serial link which consists of eight bit-serial links provides 72.72 Gbps throughput with 286 fJ/bit energy dissipation for 8 mm transmission. It dissipates the lowest energy per bit compared to fully bit-parallel links while achieving the same throughput. The links are designed and simulated in Cadence Analog Spectre using 65-nm technology from STMicroelectronics.

  • 18. Pamunuwa, D.
    et al.
    Zheng, Li-Rong
    KTH, Superseded Departments, Microelectronics and Information Technology, IMIT.
    Tenhunen, Hannu
    KTH, Superseded Departments, Microelectronics and Information Technology, IMIT.
    Maximizing throughput over parallel wire structures in the deep submicrometer regime2003In: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 11, no 2, p. 224-243Article in journal (Refereed)
    Abstract [en]

    In a parallel multiwire structure, the exact spacing and size of the wires determine both the resistance and the distribution of the capacitance between the ground plane and the adjacent signal carrying conductors, and have a direct effect on the delay. Using closed-form equations that map the geometry to the wire parasitics and empirical switch factor based delay models that show how repeaters can be optimized to compensate for dynamic effects, we devise a method of analysis for optimizing throughput over a given metal area. This analysis is used to show that there is a clear optimum configuration for the wires which maximizes the total bandwidth. Additionally, closed form equations are derived, the roots of which give close to optimal solutions. It is shown that for wide buses, the optimal wire width and spacing are independent of the total width of the bus, allowing easy optimization of on-chip buses. Our analysis and results are valid for lossy interconnects as are typical of wires in sub-micron technologies.

  • 19. Pop, P.
    et al.
    Izosimov, Viacheslav
    Eles, P.
    Peng, Z.
    Design Optimization of Time- and Cost-Constrained Fault-Tolerant Embedded Systems with Checkpointing and Replication2009In: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 17, no 3, p. 389-402Article in journal (Refereed)
    Abstract [en]

    We present an approach to the synthesis of fault-tolerant hard real-time systems for safety-critical applications. We use checkpointing with rollback recovery and active replication for tolerating transient faults. Processes and communications are statically scheduled. Our synthesis approach decides the assignment of fault-tolerance policies to processes, the optimal placement of checkpoints and the mapping of processes to processors such that multiple transient faults are tolerated and the timing constraints of the application are satisfied. We present several design optimization approaches which are able to find fault-tolerant implementations given a limited amount of resources. The developed algorithms are evaluated using extensive experiments, including a real-life example.

  • 20. Rahmani, Amir M.
    et al.
    Haghbayan, Mohammad-Hashem
    Miele, Antonio
    Liljeberg, Pasi
    Jantsch, Axel
    Tenhunen, Hannu
    KTH, School of Information and Communication Technology (ICT), Industrial and Medical Electronics.
    Reliability-Aware Runtime Power Management for Many-Core Systems in the Dark Silicon Era2017In: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 25, no 2, p. 427-440Article in journal (Refereed)
    Abstract [en]

    Power management of networked many-core systems with runtime application mapping becomes more challenging in the dark silicon era. It necessitates considering network characteristics at runtime to achieve better performance while honoring the peak power upper bound. On the other hand, power management has a direct effect on chip temperature, which is the main driver of the aging effects. Therefore, alongside performance fulfillment, the controlling mechanism must also consider the current cores' reliability in its actuator manipulation to enhance the overall system lifetime in the long term. In this paper, we propose a multiobjective dynamic power management technique that uses current power consumption and other network characteristics including the reliability of the cores as the feedback while utilizing fine-grained voltage and frequency scaling and per-core power gating as the actuators. In addition, disturbance rejecter and reliability balancer are designed to help the controller to better smooth power consumption in the short term and reliability in the long term, respectively. Simulations of dynamic workloads and mixed criticality application profiles show that our method not only is effective in honoring the power budget while considerably boosting the system throughput, but also increases the overall system lifetime by minimizing aging effects by means of power consumption balancing.

  • 21.
    Sleiman, Sleiman Bou
    et al.
    Ohio State Univ, Analog VLSI Lab.
    Atallah, Jad G.
    Rodriguez Duenas, Saul
    KTH, School of Information and Communication Technology (ICT), Integrated Devices and Circuits.
    Rusu, Ana
    KTH, School of Information and Communication Technology (ICT), Integrated Devices and Circuits.
    Elnaggar, Mohammed Ismail
    KTH, School of Information and Communication Technology (ICT), Integrated Devices and Circuits.
    Optimal Sigma Delta Modulator Architectures for Fractional-N Frequency Synthesis2010In: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 18, no 2, p. 194-200Article in journal (Refereed)
    Abstract [en]

    This paper presents a comparative study of Sigma Delta modulators for use in fractional-N phase-locked loops. It proposes favorable modulator architectures while taking into consideration not only the quantization noise of the modulator but also other loop nonidealities such as the charge pump current mismatch that contributes to the degradation in the synthesized tone's phase noise. The proper choice of the modulator architecture is found to be dependent upon the extent of the nonideality, reference frequency, and loop bandwidth. Three modulator architectures are then proposed for low, medium, and high levels of nonidealities.

  • 22.
    Tuuna, S.
    et al.
    Turku Centre for Computer Science (TUCS).
    Nigussie, E.
    Turku Centre for Computer Science (TUCS).
    Isoaho, J.
    Turku Centre for Computer Science (TUCS).
    Tenhunen, Hannu
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK.
    Modeling of energy dissipation in RLC current-mode signaling2012In: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 20, no 6, p. 1146-1151Article in journal (Refereed)
    Abstract [en]

    In this paper, energy dissipation in resistance-inductance-capacitance (RLC) current-mode signaling is modeled. The energy dissipation is derived separately for driver, wire, and receiver termination. The effects of rise time and clock cycle are included. A realizable Pi-model for the driving-point impedance of an RLC current-mode transmission line is derived. The output current of an RLC current-mode transmission line is also derived. The model is extended to multiple parallel coupled interconnects with inductive and capacitive coupling between them. The model is verified by comparing it to HSPICE in 65-nm technology and applied to differential current-mode signaling.

  • 23. Tuuna, Sampo
    et al.
    Zheng, Li-Rong
    KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK. KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Isoaho, Jouni
    Tenhunen, Hannu
    KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK. KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Modeling of on-chip bus switching current and its impact on noise in power supply grid2008In: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 16, no 6, p. 766-770Article in journal (Refereed)
    Abstract [en]

    In this paper, an analytical model for the current draw of an on-chip bus is presented. The model is combined with an on-chip power supply grid model in order to analyze noise caused by switching buses in a power supply grid. The bus is modeled as distributed resistance-inductance-capacitance (RLC) lines that are capacitively and inductively coupled to each other. Different switching patterns and driver skewing times are also included in the model. The power supply grid is modeled as a network of RLC segments. The model is verified by comparing it to HSPICE. The error was below 8%. The model is applied to determine the influence of driver skewing times on maximum power supply noise.

  • 24.
    Wang, Jian
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems. Univ Elect Sci & Technol China, China.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Li, Yubai
    A New CDMA Encoding/Decoding Method for on-Chip Communication Network2016In: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 24, no 4, p. 1607-1611Article in journal (Refereed)
    Abstract [en]

    As a high performance on-chip communication method, the code division multiple access (CDMA) technique has recently been applied to networks on chip (NoCs). We propose a new standard-basis-based encoding/decoding method to leverage the performance and cost of CDMA NoCs in area, power assumption, and network throughput. In the transmitter module, source data from different senders are separately encoded with an orthogonal code of a standard basis and these coded data are mixed together by an XOR operation. Then, the sums of data can be transmitted to their destinations through the onchip communication infrastructure. In the receiver module, a sequence of chips is retrieved by taking an AND operation between the sums of data and the corresponding orthogonal code. After a simple accumulation of these chips, original data can be reconstructed. We implement our encoding/decoding method and apply it to a CDMA NoC with a star topology. Compared with the state-of-the-art Walsh-code-based (WB) encoding/decoding technique, our method achieves up to 67.46% power saving and 81.24% area saving together with decrease of 30%-50% encoding/decoding latency. Moreover, the CDMA NoC with different sizes applying our encoding/decoding method gains power saving, area saving, and maximal throughput improvement up to 20.25%, 22.91%, and 103.26%, respectively, than the WB CDMA NoC.

  • 25. Weerasekera, Roshan
    et al.
    Pamunuwa, Dinesh
    Zheng, Li-Rong
    KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK. KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Tenhunen, Hannu
    KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK. KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Minimal-power, delay-balanced SMART repeaters for global interconnects in the nanometer regime2008In: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 16, no 5, p. 589-593Article in journal (Refereed)
    Abstract [en]

    A SMART repeater is proposed for driving capacitively-coupled, global-length on-chip interconnects that alters its drive strength dynamically to match the relative bit pattern on the wires and thus the effective capacitive load. This is achieved by partitioning the driver into main and assistant drivers; for a higher effective load capacitance both drivers switch, while for a lower effective capacitance the assistant driver is quiet. In a UMC 0.18-mu m technology the potential energy saving is around 10% and the reduction in jitter 20%, in comparison to a traditional repeater for typical global wire lengths. It is also shown that the average energy saving for nanometer technologies is in the range of 20% to 25%. The driver architecture exploits the fact that as feature sizes decrease, the capacitive load per transistor shrinks, whereas global wire loads remain relatively unchanged. Hence, the smaller the technology, the greater the potential saving.

  • 26. Ykman-Couvreur, C.
    et al.
    Lambrecht, J.
    Verkest, D.
    Catthoor, F.
    Svantesson, B.
    Hemani, Ahmed
    Wolf, F.
    Dynamic memory management methodology applied to embedded telecom network systems2002In: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 10, no 5, p. 650-667Article in journal (Refereed)
    Abstract [en]

    This paper presents a new methodology for dynamic memory management of embedded telecom network systems. This methodology enables the designer to further raise the abstraction level of the initial system specification and to achieve optimized embedded system designs. This methodology is well suited for systems characterized by a set of concurrent and dynamic processes, very high-bit-rate data streams, and intensive data transfer and storage, as encountered in telecom network applications. Up to now, it has been successfully applied to four telecom network systems. This methodology can be easily integrated into any C++-based system synthesis approach that bridges the gap between a concurrent process-level system specification and an optimized (for area, performance, or power) embedded implementation of communicating hardware/software processors. This is in contrast to current system design practice, where VHDL/C is derived without room for exploration, refinement, and verification, leading to expensive late design iterations. In this paper, the main focus lies on the system-level specification model and the dynamic memory management applied to two real-life telecom network systems.

  • 27.
    Öberg, Johnny
    et al.
    KTH, Superseded Departments, Microelectronics and Information Technology, IMIT.
    Kumar, A
    Hemani, Ahmed
    KTH, Superseded Departments, Microelectronics and Information Technology, IMIT.
    Grammar-based hardware synthesis from port-size independent specifications2000In: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 8, no 2, p. 184-194Article in journal (Refereed)
    Abstract [en]

    A protocol defines how systems communicate, There are two ways of specifying the protocol, the language of communication, One way is to specify the automaton that recognizes the language, and this is the approach taken by SDL, etc. The other more abstract way is to specify the grammar of the language and let a tool synthesize the automaton, Directly specifying the automaton makes the specification implementation dependent in two ways: the time behavior is specified in terms of states, and the width of the inputs and outputs is fixed, By specifying the grammar, the specification is potentially independent of both these implementation details and allows design space exploration in these dimensions. This paper presents a grammar-based language, called ProGram, that supports a port-size independent specifications methodology and its application to parts of the Operation and Maintenance protocol, a typical application from the ATM world. The methodology has also been applied to another test set of example designs and compared to standard RTL synthesis and HLS in order to evaluate the quality of the produced designs.

  • 28.
    Öberg, Johnny
    et al.
    KTH, Superseded Departments, Microelectronics and Information Technology, IMIT.
    Kumar, Anshul
    KTH, Superseded Departments, Microelectronics and Information Technology, IMIT.
    Hemani, Ahmed
    KTH, Superseded Departments, Microelectronics and Information Technology, IMIT.
    Grammar-based hardware synthesis from port-size independent specifications2000In: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 8, p. 184-194Article in journal (Refereed)
    Abstract [en]

    A protocol defines how systems communicate. There are two ways of specifying the protocol, the language of communication. One way is to specify the automaton that recognizes the language, and this is the approach taken by SDL, etc. The other more abstract way ss to specify the grammar of the language and let a tool synthesize the automaton. Directly specifying the automaton makes the specification implementation dependent in two ways: the time behavior is specified in terms of states, and the width of the inputs and outputs is fixed. By specifying the grammar, the specification is potentially independent of both these implementation details and allows design space exploration in these dimensions. This paper presents a grammar-based language, called Program, that supports a port-size independent specifications methodology and its application to parts of the Operation and Maintenance protocol, a typical application from the ATM world. The methodology has also been applied to another test set of example designs and compared to standard RTL synthesis and HLS in order to evaluate the quality of the produced designs.

1 - 28 of 28
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf