Change search
Refine search result
1 - 27 of 27
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Becker, Matthias
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics, Electronic and embedded systems.
    Lu, Zhonghai
    KTH, Superseded Departments (pre-2005), Electronic Systems Design. KTH, School of Electrical Engineering and Computer Science (EECS), Electronics, Electronic and embedded systems.
    Chen, DeJiu
    KTH, School of Industrial Engineering and Management (ITM), Machine Design (Dept.), Machine Design (Div.). KTH, School of Industrial Engineering and Management (ITM), Machine Design (Dept.), Embedded Control Systems. KTH, Superseded Departments (pre-2005), Machine Design. KTH, School of Industrial Engineering and Management (ITM), Machine Design (Dept.), Mechatronics.
    Towards QoS-Aware Service-Oriented Communication in E/E Automotive Architectures2018In: Proceedings of the 44th Annual Conference of the IEEE Industrial Electronics Society (IECON), 2018, p. 4096-4101Conference paper (Refereed)
    Abstract [en]

    With the raise of increasingly advanced driving assistance systems in modern cars, execution platforms that build on the principle of service-oriented architectures are being proposed. Alongside, service oriented communication is used to provide the required adaptive communication infrastructure on top of automotive Ethernet networks. A middleware is proposed that enables QoS aware service-oriented communication between software components, where the prescribed behavior of each software component is defined by Assume/Guarantee (A-G) contracts. To enable the use of COTS components, that are often not sufficiently verified for the use in automotive systems, the middleware monitors the communication behavior of components and verifies it against the components A/G contract. A violation of the allowed communication behavior then triggers adaption processes in the system while the impact on other communication is minimized. The applicability of the approach is demonstrated by a case study that utilizes a prototype implementation of the proposed approach.

  • 2.
    Becker, Matthias
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics, Electronic and embedded systems.
    Mubeen, Saad
    Mälardalen University.
    Timing Analysis Driven Design-Space Exploration of Cause-Effect Chains in Automotive Systems2018In: IECON 2018 - 44th Annual Conference of the IEEE Industrial Electronics Society, 2018Conference paper (Refereed)
    Abstract [en]

    Model-based development and component-based software engineering have emerged as a promising approach to deal with enormous software complexity in automotive systems. This approach supports the development of software architectures by interconnecting (and reusing) software components (SWCs) at various abstraction levels. Automotive software architectures are often modeled with chains of SWCs, also called cause-effect chains that are constrained by timing requirements. Based on the variations in activation patterns of SWCs, a single model of a cause-effect chain at a higher abstraction level can conform to several valid refined models of the chain at a lower abstraction level, which is closer to the system implementation. As a consequence, the total number of valid implementation-level models generated by the existing techniques increases exponentially, thereby significantly increasing the runtime of the timing analysis engines and liming the scalability of the existing techniques. This paper computes an upper bound on the activation pattern combinations that may result from a system of cause-effect chains in a given high-level model of the software architecture. An efficient algorithm is presented that traverses only a reduced number of possible combinations of the cause-effect chains, resulting in the timing analysis of a significantly lower number of implementation-level models of the software architecture. A proof of concept is provided by conducting a case study that shows significant reduction in the runtime of timing analysis engines, i.e., the timing behavior of the considered system is verified by performing the timing analysis of only 27% of all possible combinations of the cause-effect chains.

  • 3. Ben Dhaou, I.
    et al.
    Kondoro, Aron
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics. University of Dar es Salaam, Tanzania.
    Kelati, Amleset
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics, Electronic and embedded systems. University of Turku, Finland.
    Rwegasira, Diana
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics. University of Turku, Finland.
    Naiman, S.
    Mvungi, N. H.
    Tenhunen, Hannu
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics.
    Communication and security technologies for smart grid2018In: Fog Computing: Breakthroughs in Research and Practice, IGI Global , 2018, p. 305-331Chapter in book (Other academic)
    Abstract [en]

    The smart grid is a new paradigm that aims to modernize the legacy power grid. It is based on the integration of ICT technologies, embedded system, sensors, renewable energy and advanced algorithms for management and optimization. The smart grid is a system of systems in which communication technology plays a vital role. Safe operations of the smart grid need a careful design of the communication protocols, cryptographic schemes, and computing technology. In this article, the authors describe current communication technologies, recently proposed algorithms, protocols, and architectures for securing smart grid communication network. They analyzed in a unifying approach the three principles pillars of smart-gird: Sensors, communication technologies, and security. Finally, the authors elaborate open issues in the smart-grid communication network.

  • 4.
    Chen, DeJiu
    et al.
    KTH, School of Industrial Engineering and Management (ITM), Machine Design (Dept.), Embedded Control Systems. KTH, School of Industrial Engineering and Management (ITM), Machine Design (Dept.), Machine Design (Div.). KTH, School of Industrial Engineering and Management (ITM), Machine Design (Dept.), Mechatronics.
    Östberg, Kenneth
    RISE - Research Institutes of Sweden.
    Becker, Matthias
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics, Electronic and embedded systems.
    Sivencrona, Håkan
    Zenuity AB.
    Warg, Fredrik
    RISE - Research Institutes of Sweden.
    Design of a Knowledge-Base Strategy for Capability-Aware Treatment of Uncertainties of Automated Driving Systems2018In: Computer Safety, Reliability, and Security. / [ed] Gallina B., Skavhaug A., Schoitsch E., Bitsch F., Cham, 2018, Vol. 11094Conference paper (Refereed)
    Abstract [en]

    Automated Driving Systems (ADS) represent a key technological advancement in the area of Cyber-physical systems (CPS) and Embedded Control Systems (ECS) with the aim of promoting traffic safety and environmental sustainability. The operation of ADS however exhibits several uncertainties that if improperly treated in development and operation would lead to safety and performance related problems. This paper presents the design of a knowledge-base (KB) strategy for a systematic treatment of such uncertainties and their system-wide implications on design-space and state-space. In the context of this approach, we use the term Knowledge-Base (KB) to refer to the model that stipulates the fundamental facts of a CPS in regard to the overall system operational states, action sequences, as well as the related costs or constraint factors. The model constitutes a formal basis for describing, communicating and inferring particular operational truths as well as the belief and knowledge representing the awareness or comprehension of such truths. For the reasoning of ADS behaviors and safety risks, each system operational state is explicitly formulated as a conjunction of environmental state and some collective states showing the ADS capabilities for perception, control and actuations. Uncertainty Models (UM) are associated as attributes to such state definitions for describing and quantifying the corresponding belief or knowledge status due to the presences of evidences about system performance and deficiencies, etc. On a broader perspective, the approach is part of our research on bridging the gaps among intelligent functions, system capability and dependability for mission-&safety-critical CPS, through a combination of development- and run-time measures.

  • 5.
    Chen, Xiaowen
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics, Electronic and embedded systems.
    Efficient Memory Access and Synchronization in NoC-based Many-core Processors2019Doctoral thesis, monograph (Other academic)
    Abstract [en]

    In NoC-based many-core processors, memory subsystem and synchronization mechanism are always the two important design aspects, since mining parallelism and pursuing higher performance require not only optimized memory management but also efficient synchronization mechanism. Therefore, we are motivated to research on efficient memory access and synchronization in three topics, namely, efficient on-chip memory organization, fair shared memory access, and efficient many-core synchronization.

    One major way of optimizing the memory performance is constructing a suitable and efficient memory organization. A distributed memory organization is more suitable to NoC-based many-core processors, since it features good scalability. We envision that it is essential to support Distributed Shared Memory (DSM) because of the huge amount of legacy code and easy programming. Therefore, we first adopt the microcoded approach to address DSM issues, aiming for hardware performance but maintaining the flexibility of programs. Second, we further optimize the DSM performance by reducing the virtual-to-physical address translation overhead. In addition to the general-purpose memory organization such as DSM, there exists special-purpose memory organization to optimize the performance of application-specific memory access. We choose Fast Fourier Transform (FFT) as the target application, and propose a multi-bank data memory specialized for FFT computation.

    In 3D NoC-based many-core processors, because processor cores and memories reside in different locations (center, corner, edge, etc.) of different layers, memory accesses behave differently due to their different communication distances. As the network size increases, the communication distance difference of memory accesses becomes larger, resulting in unfair memory access performance among different processor cores. This unfair memory access phenomenon may lead to high latencies of some memory accesses, thus negatively affecting the overall system performance. Therefore, we are motivated to study on-chip memory and DRAM access fairness in 3D NoC-based many-core processors through narrowing the round-trip latency difference of memory accesses as well as reducing the maximum memory access latency.

    Barrier synchronization is used to synchronize the execution of parallel processor cores. Conventional barrier synchronization approaches such as master-slave, all-to-all, tree-based, and butterfly are algorithm oriented. As many processor cores are networked on a single chip, contended synchronization requests may cause large performance penalty. Motivated by this, different from the algorithm-based approaches, we choose another direction (i.e., exploiting efficient communication) to address the barrier synchronization problem. We propose cooperative communication as a means and combine it with the master-slave algorithm and the all-to-all algorithm to achieve efficient many-core barrier synchronization. Besides, a multi-FPGA implementation case study of fast many-core barrier synchronization is conducted.

  • 6.
    Chen, Xiaowen
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS). Natl Univ Def Technol, Coll Comp, Changsha 410073, Hunan, Peoples R China.
    Lei, Yuanwu
    Natl Univ Def Technol, Coll Comp, Changsha 410073, Hunan, Peoples R China..
    Lu, Zhonghai
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics, Electronic and embedded systems.
    Chen, Shuming
    Natl Univ Def Technol, Coll Comp, Changsha 410073, Hunan, Peoples R China..
    A Variable-Size FFT Hardware Accelerator Based on Matrix Transposition2018In: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 26, no 10, p. 1953-1966Article in journal (Refereed)
    Abstract [en]

    Fast Fourier transform (FFT) is the kernel and the most time-consuming algorithm in the domain of digital signal processing, and the FFT sizes of different applications are very different. Therefore, this paper proposes a variable-size FFT hardware accelerator, which fully supports the IEEE-754 single-precision floating-point standard and the FFT calculation with a wide size range from 2 to 220 points. First, a parallel Cooley-Tukey FFT algorithm based on matrix transposition (MT) is proposed, which can efficiently divide a large size FFT into several small size FFTs that can be executed in parallel. Second, guided by this algorithm, the FFT hardware accelerator is designed, and several FFT performance optimization techniques such as hybrid twiddle factor generation, multibank data memory, block MT, and token-based task scheduling are proposed. Third, its VLSI implementation is detailed, showing that it can work at 1 GHz with the area of 2.4 mm(2) and the power consumption of 91.3 mW at 25 degrees C, 0.9 V. Finally, several experiments are carried out to evaluate the proposal's performance in terms of FFT execution time, resource utilization, and power consumption. Comparative experiments show that our FFT hardware accelerator achieves at most 18.89x speedups in comparison to two software-only solutions and two hardware-dedicated solutions.

  • 7.
    Chen, Yancang
    et al.
    Natl Univ Def Technol, Dept Comp, Changsha, Hunan, Peoples R China..
    Xie, Lunguo
    Natl Univ Def Technol, Dept Comp, Changsha, Hunan, Peoples R China..
    Li, Jinwen
    Natl Univ Def Technol, Dept Comp, Changsha, Hunan, Peoples R China..
    Shi, Zhu
    Natl Univ Def Technol, Dept Comp, Changsha, Hunan, Peoples R China..
    Zhang, Minxuan
    Natl Univ Def Technol, Dept Comp, Changsha, Hunan, Peoples R China..
    Chen, Xiaowen
    Natl Univ Def Technol, Dept Comp, Changsha, Hunan, Peoples R China..
    Lu, Zhonghai
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics, Electronic and embedded systems.
    A Trace-driven Hardware-level Simulator for Design and Verification of Network-on-Chips2010In: 2011 INTERNATIONAL CONFERENCE ON COMPUTERS, COMMUNICATIONS, CONTROL AND AUTOMATION (CCCA 2011), VOL II / [ed] Thaung, K S, IEEE , 2010, p. 32-35Conference paper (Refereed)
    Abstract [en]

    Traditional communications of general-purpose multi-core processor and application-specific System-on-Chip face challenges in terms of scalability and complexity. Network-on-Chip (NoC) has been the most promising solution for the communications of multi-core and many-core chips. In this paper, we present a trace-driven hardware-level simulator (noted HS) based on SystemVerilog for the design and verification of NoCs. Different from the state-of-the-art NoC simulators, the HS owns three important characteristics in addition to the capability of creating simulation and synthesizable NoC descriptions: 1) hardware-level simulation can be done, which means more implementation details of hardware than flit-level simulation; 2) router debugging and verification can be done at RTL by inserting assertions and coverage; 3) trace-based application simulations can be done besides synthetic workloads. A 4 X 4 2D mesh NoC with output virtual-channel routers verifies the capability of our HS.

  • 8.
    Dubrova, Elena
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics, Electronic and embedded systems.
    A reconfigurable arbiter PUF with 4 x 4 switch blocks2018In: Proceedings of The International Symposium on Multiple-Valued Logic, IEEE Computer Society , 2018, p. 31-37Conference paper (Refereed)
    Abstract [en]

    Physical Unclonable Functions (PUFs) exploit manufacturing process variation to create responses that are unique to individual integrated circuits (ICs). Typically responses of a PUF cannot be modified once the PUF is fabricated. In applications which use PUFs as a long-Term secret key, it would be useful to have a simple mechanism for reconfiguring the PUF in order to update the key periodically. In this paper, we present a new type of arbiter PUFs which use 4 x 4 switch blocks instead of the conventional 2 x 2 ones. Each 4 x 4 switch block can be reconfigured in many different ways during the PUF's lifetime, making possible regular key updates. © 2018 IEEE.

  • 9.
    Dubrova, Elena
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics, Electronic and embedded systems.
    Selander, G.
    Näslund, Mats
    KTH.
    Lindqvist, Fredrik
    KTH.
    Lightweight message authentication for constrained devices2018In: WiSec 2018 - Proceedings of the 11th ACM Conference on Security and Privacy in Wireless and Mobile Networks, Association for Computing Machinery (ACM), 2018, p. 196-201Conference paper (Refereed)
    Abstract [en]

    Message Authentication Codes (MACs) used in today's wireless communication standards may not be able to satisfy resource limitations of simpler 5G radio types and use cases such as machine type communications. As a possible solution, we present a lightweight message authentication scheme based on the cyclic redundancy check (CRC). It has been previously shown that a CRC with an irreducible generator polynomial as the key is an -almost XOR-universal (AXU) hash function with = (m + n)/2n-1, where m is the message size and n is the CRC size. While the computation of n-bit CRCs can be efficiently implemented in hardware using linear feedback shift registers, generating random degree-n irreducible polynomials is computationally expensive for large n. We propose using a product of k irreducible polynomials whose degrees sum up to n as a generator polynomial for an n-bit CRC and show that the resulting hash functions are -AXU with = (m + n)k/2n -k. The presented message authentication scheme can be seen as providing a trade-off between security and implementation efficiency.

  • 10.
    Dubrova, Elena
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics, Electronic and embedded systems.
    Teslenko, Maxim
    An efficient SAT-based algorithm for finding short cycles in cryptographic algorithms2018In: Proceedings of the 2018 IEEE International Symposium on Hardware Oriented Security and Trust, HOST 2018, Institute of Electrical and Electronics Engineers (IEEE), 2018, p. 65-72Conference paper (Refereed)
    Abstract [en]

    The absence of short cycles is a desirable property for cryptographic algorithms that are iterated. Furthermore, as demonstrated by the cryptanalysis of A5, short cycles can be exploited to reduce the complexity of an attack. We present an algorithm which uses a SAT-based bounded model checking for finding all short cycles of a given length. The existing Boolean Decision Diagram (BDD) based algorithms for finding cycles have limited capacity due to the excessive memory requirements of BDDs. The simulation-based algorithms can be applied to larger problem instances, however, they cannot guarantee the detection of all cycles of a given length. The same holds for general-purpose SAT-based model checkers. The presented algorithm can handle cryptographic algorithms with very large state spaces, including important ciphers such as Trivium and Grain-128. We found that these ciphers contain short cycles whose existence, to our best knowledge, was previously unknown. This potentially opens new possibilities for cryptanalysis.

  • 11.
    Kelati, Amleset
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics, Electronic and embedded systems. KTH, School of Electrical Engineering and Computer Science (EECS), Electronics, Integrated devices and circuits. university of Turku.
    BioSignal Monitoring tool Using Wearable IoT2018In: Proceedings of the 22nd IEEE FRUCT conference,, Jyvaskyla, 2018, p. 4-8Conference paper (Refereed)
  • 12.
    Kelati, Amleset
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics, Electronic and embedded systems. KTH, School of Electrical Engineering and Computer Science (EECS), Electronics, Integrated devices and circuits. University of Turku, Finland.
    Nigussie, Ethiopia
    University of Turku, Finland.
    Plosila, Juha
    University of Turku, Finland.
    Tenhunen, Hannu
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics, Integrated devices and circuits. University of Turku, Finland.
    Biosignal Feature Extraction Techniques for IoT Healthcare Platform2016In: IEEE Conference on Design and Architectures for Signal and Image Processing (DASIP2016), Rennes, France, 2016Conference paper (Other (popular science, discussion, etc.))
    Abstract [en]

    In IoT healthcare platform, a variety of biosignals are acquired from its sensors and appropriate feature extraction techniques are crucial in order to make use of the acquired biosignal data and help the healthcare scientist or bio-engineer to reach at optimal decisions. This work reviews the existing biosignal feature extraction and classification methods for different healthcare applications. Due the enormous amount of different biosignals and since most healthcare applications uses electrocardiogram (ECG), electroencephalogram (EEG), electromyogram (EMG), Electrogastrogram (EGG), we focus the review on feature extractions and classification method for these biosignals. The review also includes a summary of Blood Oxygen Saturation determined by Pulse Oximetry (SpO2), Electrooculography and eye movement (EOG), and Respiration (RSP) signals. Its discussion and analysis focuses on advantages, performance and drawbacks of the techniques.

  • 13.
    Kelati, Amleset
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics, Electronic and embedded systems. KTH, School of Electrical Engineering and Computer Science (EECS), Electronics, Integrated devices and circuits.
    Tenhunen, Hannu
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics, Integrated devices and circuits. KTH, School of Electrical Engineering and Computer Science (EECS), Electronics, Electronic and embedded systems.
    Wearable in a Cloud2019In: The third IEEE/ACM on Connected Health: Applications, System and Engineering Technologies, CHASE '18, September 26–28, 2018, Washington, DC, USA, © 2018 Association for Computing Machinery. / [ed] I018 IEEE/ACM International Conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE), Washington, DC, USA,: IEEE, 2019, p. 7-8Conference paper (Refereed)
    Abstract [en]

    Nowadays, health care at home becomes more andmore important, there are also products which can measurethe ECG, EMG with wearable devices. However, these devicesare not so accurate for diagnosis because of the low samplerate and less channels connected to the body. In this project,we design a wearable system with 8-channel AFE and use Wi-Fi module to transfer the data to cloud so that we can measurethe ECG or EMG more accurate at home almost at the samesample rate and channel at the hospital. Here, the cloud is builtto receive the data and the real-time display can help doctormonitor the patients’ condition remotely.

  • 14.
    Klaus, Tobias
    et al.
    Friedrich Alexander Univ Erlangen Nurnberg FAU, Distributed Syst & Operating Syst, Erlangen, Germany..
    Franzmann, Florian
    Friedrich Alexander Univ Erlangen Nurnberg FAU, Distributed Syst & Operating Syst, Erlangen, Germany..
    Becker, Matthias
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics, Electronic and embedded systems.
    Ulbrich, Peter
    Friedrich Alexander Univ Erlangen Nurnberg FAU, Distributed Syst & Operating Syst, Erlangen, Germany..
    Data Propagation Delay Constraints in Multi-Rate Systems - Deadlines vs. Job-Level Dependencies2018In: PROCEEDINGS OF THE 26TH INTERNATIONAL CONFERENCE ON REAL-TIME NETWORKS AND SYSTEMS (RTNS 2018), ASSOC COMPUTING MACHINERY , 2018Conference paper (Refereed)
    Abstract [en]

    Many industrial areas are faced with a continuous increase in system complexity, while systems need to satisfy stringent timing requirements, which are traditionally based on the tasks' local deadlines. However, correct functionality is subject to high-level timing requirements on data propagation through a set of semantically related tasks. Since distributed concurrent engineering is often used to deal with the complexity of such systems, violations of data propagation delay constraints are only visible at late development stages, where changes in system design become increasingly expensive. In this paper, we leverage job-level dependencies (JLDs) that can be specified at early development stages to guarantee data propagation delay constraints. Therefore, we present an approach that extends the Real-Time Systems Compiler to enforce the JLDs in actual multicore schedules. This strategy enables us to perform extensive evaluations of the effectiveness of JLDs in combination with contemporary allocation and scheduling algorithms, where we observed schedulability improvements of up to 42%. Additionally, we identified the effect of the number of available cores on the data age.

  • 15.
    Li, Pu
    et al.
    Taiyuan Univ Technol, Minist Educ, Key Lab Adv Transducers & Intelligent Control Sys, Taiyuan 030024, Shanxi, Peoples R China.;Taiyuan Univ Technol, Coll Phys & Optoelect, Inst Optoelect Engn, Taiyuan 030024, Shanxi, Peoples R China.;Bangor Univ, Sch Elect Engn, Bangor LL57 1UT, Gwynedd, Wales.;Inst Southwestern Commun, Sci & Technol Commun Lab, Chengdu 610041, Sichuan, Peoples R China..
    Guo, Ya
    Taiyuan Univ Technol, Minist Educ, Key Lab Adv Transducers & Intelligent Control Sys, Taiyuan 030024, Shanxi, Peoples R China.;Taiyuan Univ Technol, Coll Phys & Optoelect, Inst Optoelect Engn, Taiyuan 030024, Shanxi, Peoples R China..
    Guo, Yanqiang
    Taiyuan Univ Technol, Minist Educ, Key Lab Adv Transducers & Intelligent Control Sys, Taiyuan 030024, Shanxi, Peoples R China.;Taiyuan Univ Technol, Coll Phys & Optoelect, Inst Optoelect Engn, Taiyuan 030024, Shanxi, Peoples R China..
    Fan, Yuanlong
    Bangor Univ, Sch Elect Engn, Bangor LL57 1UT, Gwynedd, Wales..
    Guo, Xiaomin
    Taiyuan Univ Technol, Minist Educ, Key Lab Adv Transducers & Intelligent Control Sys, Taiyuan 030024, Shanxi, Peoples R China.;Taiyuan Univ Technol, Coll Phys & Optoelect, Inst Optoelect Engn, Taiyuan 030024, Shanxi, Peoples R China..
    Liu, Xianglian
    Taiyuan Univ Technol, Minist Educ, Key Lab Adv Transducers & Intelligent Control Sys, Taiyuan 030024, Shanxi, Peoples R China.;Taiyuan Univ Technol, Coll Phys & Optoelect, Inst Optoelect Engn, Taiyuan 030024, Shanxi, Peoples R China..
    Shore, K. Alan
    Bangor Univ, Sch Elect Engn, Bangor LL57 1UT, Gwynedd, Wales..
    Dubrova, Elena
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics, Electronic and embedded systems.
    Xu, Bingjie
    Inst Southwestern Commun, Sci & Technol Commun Lab, Chengdu 610041, Sichuan, Peoples R China..
    Wang, Yuncai
    Taiyuan Univ Technol, Minist Educ, Key Lab Adv Transducers & Intelligent Control Sys, Taiyuan 030024, Shanxi, Peoples R China.;Taiyuan Univ Technol, Coll Phys & Optoelect, Inst Optoelect Engn, Taiyuan 030024, Shanxi, Peoples R China..
    Wang, Anbang
    Taiyuan Univ Technol, Minist Educ, Key Lab Adv Transducers & Intelligent Control Sys, Taiyuan 030024, Shanxi, Peoples R China.;Taiyuan Univ Technol, Coll Phys & Optoelect, Inst Optoelect Engn, Taiyuan 030024, Shanxi, Peoples R China..
    Self-balanced real-time photonic scheme for ultrafast random number generation2018In: APL PHOTONICS, ISSN 2378-0967, Vol. 3, no 6, article id 061301Article in journal (Refereed)
    Abstract [en]

    We propose a real-time self-balanced photonic method for extracting ultrafast random numbers from broadband randomness sources. In place of electronic analog-to-digital converters (ADCs), the balanced photo-detection technology is used to directly quantize optically sampled chaotic pulses into a continuous random number stream. Benefitting from ultrafast photo-detection, our method can efficiently eliminate the generation rate bottleneck from electronic ADCs which are required in nearly all the available fast physical random number generators. A proof-of-principle experiment demonstrates that using our approach 10 Gb/s real-time and statistically unbiased random numbers are successfully extracted from a bandwidth-enhanced chaotic source. The generation rate achieved experimentally here is being limited by the bandwidth of the chaotic source. The method described has the potential to attain a real-time rate of 100 Gb/s.

  • 16.
    Liu, Shaoteng
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Automatic Control. KTH Royal Inst Technol, Stockholm, Sweden..
    Jantsch, Axel
    KTH, School of Electrical Engineering and Computer Science (EECS), Automatic Control. KTH Royal Inst Technol, Stockholm, Sweden..
    Lu, Zhonghai
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics, Electronic and embedded systems. KTH Royal Inst Technol, Stockholm, Sweden..
    Parallel Probing: Dynamic and Constant Time Setup Procedure in Circuit Switching NoC2012In: DESIGN, AUTOMATION & TEST IN EUROPE (DATE 2012), IEEE , 2012, p. 1289-1294Conference paper (Refereed)
    Abstract [en]

    We propose a circuit switching Network-on-chip with a parallel probe searching setup method, which can search the entire network in constant time, only dependent on the network size but independent of the network load. Under a specific search policy, the setup procedure is guaranteed to terminate in time 3D+6 cycles, where D is the geometric distance between source and destination. If a path can be found, the method succeeds in 3D+6 cycles; if a path cannot be found, it fails in maximum 3D+6 cycles. Compared to previous work, our method can reduce the setup time and enhance the success rate of setups. Our experiments show that compared with a sequential probe searching method, this method can reduce the search time by up to 20%. Compared with a centralized channel allocator method, this method can enhance the success rate by up to 20%.

  • 17.
    Lu, Zhonghai
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics, Electronic and embedded systems.
    Yao, Yuan
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics, Electronic and embedded systems.
    Thread Voting DVFS for Manycore NoCs2018In: I.E.E.E. transactions on computers (Print), ISSN 0018-9340, E-ISSN 1557-9956, Vol. 67, no 10, p. 1506-1524, article id 8338086Article in journal (Refereed)
    Abstract [en]

    We present a thread-voting DVFS technique for manycore networks-on-chip (NoCs). This technique has two remarkable features which differentiate from conventional NoC DVFS schemes. (1) Not only network-level but also thread-level runtime performance indicatives are used to guide DVFS decisions. (2) To resolve multiple perhaps conflicting performance indicatives from many cores, it allows each thread to 'vote' for a V/F level in its own performance interest, and a region-based V/F controller makes dynamic per-region V/F decision according to the major vote. We evaluate our technique on a 64-core CMP in full-system simulation environment GEM5 with both PARSEC and SPEC OMP2012 benchmarks. Compared to a network metric (router buffer occupancy) based approach, it can improve the network energy efficacy measured in MPPJ (million packets per joule) by up to 22 percent for PARSEC and 20 percent for SPEC OMP2012, and the system energy efficacy measured in MIPJ (million instructions per joule) by up to 35 percent for PARSEC and 33 percent for SPEC OMP2012. 

  • 18.
    Lu, Zhonghai
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics, Electronic and embedded systems.
    Vangal, S.
    Xu, J.
    Bogdan, P.
    Message from the Chairs2018In: 12th IEEE/ACM International Symposium on Networks-on-Chip, NOCS 2018; Torino; Italy; 4 October 2018 through 5 October 2018, Institute of Electrical and Electronics Engineers Inc. , 2018, article id 8512149Conference paper (Refereed)
  • 19. Lv, Hao
    et al.
    Zhou, You
    Wu, Fei
    Xiao, Weijun
    He, Xubin
    Lu, Zhonghai
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics, Electronic and embedded systems.
    Xie, Changsheng
    Exploiting Minipage-level Mapping to Improve Write Efficiency of NAND Flash2018In: 2018 IEEE INTERNATIONAL CONFERENCE ON NETWORKING, ARCHITECTURE AND STORAGE (NAS), Huazhong Univ Sci & Technol, Shenzhen Res Inst, Shenzhen 51800, Peoples R China. [Lv, Hao; Zhou, You; Wu, Fei; Xie, Changsheng] Huazhong Univ Sci & Technol, Wuhan Natl Lab Optoelect, Wuhan 430074, Hubei, Peoples R China. [Wu, Fei; Xie, Changsheng] Minist Educ, Key Lab Data Storage Syst, Wuhan 430074, Hubei, Peoples R China. [Xiao, Weijun] Virginia Commonwealth Univ, Dept Elect & Comp Engn, Richmond, VA 23284 USA. [He, Xubin] Temple Univ, Coll Sci & Technol, Philadelphia, PA 19122 USA. [Lu, Zhonghai] KTH Royal Inst Technol, Sch Informat & Commun Technol, S-10044 Stockholm, Sweden.: IEEE , 2018Conference paper (Refereed)
    Abstract [en]

    Pushing NAND flash memory to higher density, manufacturers are aggressively enlarging the flash page size. However, the sizes of I/O requests in a wide range of scenarios do not grow accordingly. Since a page is the unit of flash read/write operations, traditional flash translation layers (FTLs) maintain the page mapping regularity. Hence, small random write requests become common, leading to extensive partial logical page writes. This write inefficiency significantly degrades the performance and increases the write amplification of flash storage. In this paper, we first propose a configurable mapping layer, called minipage, whose size is set to match I/O request sizes. The minipage-level mapping provides better flexibility in handling small writes at the cost of sequential read performance degradation and a larger mapping table. Then, we propose a new FTL, called PM-FTL, that exploits the minipage-level mapping to improve write efficiency and utilizes the page-level mapping to reduce the costs caused by the minipage-level mapping. Finally, trace-driven simulation results show that compared to traditional FTLs, PM-FTL reduces the write amplification and flash storage response time by an average of 33.4% and 19.1%, up to 57.7% and 34%, respectively, under 16KB flash pages and 4KB minipages.

  • 20.
    Qin, Zidi
    et al.
    Nanjing Univ, Sch Elect Sci & Engn, Nanjing 210023, Jiangsu, Peoples R China..
    Zhu, Di
    Nanjing Univ, Sch Elect Sci & Engn, Nanjing 210023, Jiangsu, Peoples R China..
    Zhu, Xingwei
    Nanjing Univ, Sch Elect Sci & Engn, Nanjing 210023, Jiangsu, Peoples R China..
    Chen, Xuan
    Nanjing Univ, Sch Elect Sci & Engn, Nanjing 210023, Jiangsu, Peoples R China..
    Shi, Yinghuan
    Nanjing Univ, State Key Lab Novel Software Technol, Nanjing 210023, Jiangsu, Peoples R China..
    Gao, Yang
    Nanjing Univ, State Key Lab Novel Software Technol, Nanjing 210023, Jiangsu, Peoples R China..
    Lu, Zhonghai
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics, Electronic and embedded systems.
    Shen, Qinghong
    Nanjing Univ, Sch Elect Sci & Engn, Nanjing 210023, Jiangsu, Peoples R China..
    Li, Li
    Nanjing Univ, Sch Elect Sci & Engn, Nanjing 210023, Jiangsu, Peoples R China..
    Pan, Hongbing
    Nanjing Univ, Sch Elect Sci & Engn, Nanjing 210023, Jiangsu, Peoples R China..
    Accelerating Deep Neural Networks by Combining Block-Circulant Matrices and Low-Precision Weights2019In: ELECTRONICS, ISSN 2079-9292, Vol. 8, no 1, article id 78Article in journal (Refereed)
    Abstract [en]

    As a key ingredient of deep neural networks (DNNs), fully-connected (FC) layers are widely used in various artificial intelligence applications. However, there are many parameters in FC layers, so the efficient process of FC layers is restricted by memory bandwidth. In this paper, we propose a compression approach combining block-circulant matrix-based weight representation and power-of-two quantization. Applying block-circulant matrices in FC layers can reduce the storage complexity from <mml:semantics>O(k2)</mml:semantics> to <mml:semantics>O(k)</mml:semantics>. By quantizing the weights into integer powers of two, the multiplications in the reference can be replaced by shift and add operations. The memory usages of models for MNIST, CIFAR-10 and ImageNet can be compressed by <mml:semantics>171x</mml:semantics>, <mml:semantics>2731x</mml:semantics> and <mml:semantics>128x</mml:semantics> with minimal accuracy loss, respectively. A configurable parallel hardware architecture is then proposed for processing the compressed FC layers efficiently. Without multipliers, a block matrix-vector multiplication module (B-MV) is used as the computing kernel. The architecture is flexible to support FC layers of various compression ratios with small footprint. Simultaneously, the memory access can be significantly reduced by using the configurable architecture. Measurement results show that the accelerator has a processing power of 409.6 GOPS, and achieves 5.3 TOPS/W energy efficiency at 800 MHz.

  • 21.
    Rosvall, Kathrin
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS).
    Mohammadat, Tage
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics.
    Ungureanu, George
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics.
    Öberg, Johnny
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics, Electronic and embedded systems.
    Sander, Ingo
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics.
    Exploring Power and Throughput for Dataflow Applications on Predictable NoC Multiprocessors2018Conference paper (Refereed)
    Abstract [en]

    System level optimization for multiple mixed-criticality applications on shared networked multiprocessor platforms is extremely challenging. Substantial complexity arises from the interdependence between the multiple subproblems of mapping, scheduling and platform configuration under the consideration of several, potentially orthogonal, performance metrics and constraints. Instead of using heuristic algorithms and problem decomposition, novel unified design space exploration (DSE) approaches based on Constraint Programming (CP) have in the recent years shown promising results. The work in this paper takes advantage of the modularity of CP models, in order to support heterogeneous multiprocessor Network-on-Chip (NoC) with Temporally Disjoint Networks (TDNs) aware message injection. The DSE supports a range of design criteria, in particular the optimization and satisfaction of power and throughput. In addition, the DSE now provides a valid configuration for the TDNs that guarantees the performance required to fulfil the design goals. The experiments show the capability of the approach to find low-power and high-throughput designs, and validate a resulting design on a physical TDN-based NoC implementation.

  • 22.
    Shi, Xin
    et al.
    Huazhong Univ Sci & Technol, Wuhan Natl Lab Optoelect, Wuhan, Hubei, Peoples R China..
    Wu, Fei
    Huazhong Univ Sci & Technol, Wuhan Natl Lab Optoelect, Wuhan, Hubei, Peoples R China..
    Wang, Shunzhuo
    Huazhong Univ Sci & Technol, Wuhan Natl Lab Optoelect, Wuhan, Hubei, Peoples R China..
    Xie, Changsheng
    Huazhong Univ Sci & Technol, Wuhan Natl Lab Optoelect, Wuhan, Hubei, Peoples R China..
    Lu, Zhonghai
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics, Electronic and embedded systems.
    Program Error Rate-based Wear Leveling for NAND Hash Memory2018In: PROCEEDINGS OF THE 2018 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), Institute of Electrical and Electronics Engineers (IEEE), 2018, p. 1241-1246Conference paper (Refereed)
    Abstract [en]

    Wear leveling scheme has became a fundamental issue in the design of Solid State Disk (SSD) based on NAND Flash memory. Existing schemes aim to equalize the number of programming/erase (P/E) cycles and memory raw bit error rates (BER) among all the flash blocks. However, due to fabrication process variation, different blocks of the same flash chip usually have largely different endurance in terns of BER and program error rate (PER). Such conventional design cannot obtain the wear status of flash blocks precisely. This paper proposes PER WE, an efficient PER-based wear leveling scheme that uses PER statistics as the measurement of Hash block wear-out pace, and performs block data swapping to improve the wear leveling efficiency. In our evaluation with four realistic workloads, PER based wear leveling scheme can achieve 17% and 9% variance of program error rate reduction, 8% and 3% program error rate reduction with 5% and 2% system performance degradation when compared to two state-of-the-art wear leveling schemes on average.

  • 23.
    Törngren, Martin
    et al.
    KTH, School of Industrial Engineering and Management (ITM), Machine Design (Dept.), Mechatronics.
    Zhang, Xinhai
    KTH, School of Industrial Engineering and Management (ITM), Machine Design (Dept.), Embedded Control Systems.
    Mohan, Naveen
    KTH, School of Industrial Engineering and Management (ITM), Machine Design (Dept.), Mechatronics.
    Becker, Matthias
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics, Electronic and embedded systems.
    Svensson, Lars
    KTH, School of Industrial Engineering and Management (ITM), Machine Design (Dept.), Mechatronics.
    Tao, Xin
    KTH, School of Industrial Engineering and Management (ITM), Machine Design (Dept.), Embedded Control Systems.
    Chen, DeJiu
    KTH, School of Industrial Engineering and Management (ITM), Machine Design (Dept.), Machine Design (Div.). KTH, School of Industrial Engineering and Management (ITM), Machine Design (Dept.), Embedded Control Systems. KTH, School of Industrial Engineering and Management (ITM), Machine Design (Dept.), Mechatronics.
    Westman, Jonas
    KTH, School of Industrial Engineering and Management (ITM), Machine Design (Dept.), Mechatronics. KTH, School of Industrial Engineering and Management (ITM), Machine Design (Dept.), Embedded Control Systems.
    Architecting Safety Supervisors for High Levels of Automated Driving2018In: Proceeding of the 21st IEEE Int. Conf. on Intelligent Transportation Systems, IEEE, 2018Conference paper (Refereed)
    Abstract [en]

    The complexity of automated driving poses challenges for providing safety assurance. Focusing on the architecting of an Autonomous Driving Intelligence (ADI), i.e. the computational intelligence, sensors and communication needed for high levels of automated driving, we investigate so called safety supervisors that complement the nominal functionality. We present a problem formulation and a functional architecture of a fault-tolerant ADI that encompasses a nominal and a safety supervisor channel. We then discuss the sources of hazardous events, the division of responsibilities among the channels, and when the supervisor should take over. We conclude with identified directions for further work.

  • 24.
    Wang, Jian
    et al.
    Univ Elect Sci & Technol China, Chengdu 611731, Sichuan, Peoples R China..
    Guo, Shize
    Univ Elect Sci & Technol China, Chengdu 611731, Sichuan, Peoples R China..
    Chen, Zhe
    Univ Elect Sci & Technol China, Chengdu 611731, Sichuan, Peoples R China..
    Li, Yubai
    Univ Elect Sci & Technol China, Chengdu 611731, Sichuan, Peoples R China..
    Lu, Zhonghai
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics, Electronic and embedded systems.
    A New Parallel CODEC Technique for CDMA NoCs2018In: IEEE transactions on industrial electronics (1982. Print), ISSN 0278-0046, E-ISSN 1557-9948, Vol. 65, no 8, p. 6527-6537Article in journal (Refereed)
    Abstract [en]

    Code division multiple access (CDMA) network-on-chip (NoC) has been proposed for many-core systems due to its data transfer parallelism over communication channels. Consequently, coder-decoder (CODEC) module, which greatly impacts the performance of CDMA NoCs, attracted growing attention in recent years. In this paper, we propose a new parallel CODEC technique for CDMA NoCs. In general, by using a few simple logic circuits with small penalties in area and power, our new parallel (NPC) CODEC can execute the encoding/decoding process in parallel and thus reduce the data transfer latency. To reveal the benefits of our method for on-chip communication, we apply our NPC to CDMA NoCs and perform extensive experiments. From the results, we can find that our method outperforms existing parallel CODECs, such as Walsh-based parallel CODEC (WPC) and overloaded parallel CODEC (OPC). Specifically, it improves the critical point of communication latency (7.3% over WPC and 13.5% over OPC), reduces packet latency jitter by about 17.3% (against WPC) and 71.6% (against OPC), and improves energy efficiency by up to 41.2% (against WPC) and 59.2% (against OPC).

  • 25.
    Wang, Zicong
    et al.
    Natl Univ Def Technol, Coll Comp, Changsha 410073, Hunan, Peoples R China..
    Chen, Xiaowen
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics, Electronic and embedded systems. Natl Univ Def Technol, Coll Comp, Changsha 410073, Hunan, Peoples R China.
    Lu, Zhonghai
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics, Electronic and embedded systems.
    Guo, Yang
    Natl Univ Def Technol, Coll Comp, Changsha 410073, Hunan, Peoples R China..
    Cache Access Fairness in 3D Mesh-Based NUCA2018In: IEEE Access, E-ISSN 2169-3536, Vol. 6, p. 42984-42996Article in journal (Refereed)
    Abstract [en]

    Given the increase in cache capacity over the past few decades, cache access effciency has come to play a critical role in determining system performance. To ensure effcient utilization of the cache resources, non-uniform cache architecture (NUCA) has been proposed to allow for a large capacity and a short access latency. With the support of networks-on-chip (NoC), NUCA is often employed to organize the last level cache. However, this method also hurts cache access fairness, which denotes the degree of non-uniformity for cache access latencies. This drop in fairness can result in an increased number of cache accesses with overhigh latency, which leads to a bottleneck in system performance. This paper investigates the cache access fairness in the context of NoC-based 3-D chip architecture, and provides new insights into 3-D architecture design. We propose fair-NUCA (F-NUCA), a co-design scheme intended to optimize cache access fairness. In F-NUCA, we strive to improve fairness by equalizing cache access latencies. To achieve this goal, the memory mapping and the channel width are both redistributed non-uniformly, thereby equalizing the non-contention and contention latencies, respectively. The experimental results reveal that F-NUCA can effectively improve cache access fairness. When F-NUCA is compared with the traditional static NUCA in a simulation with PARSEC benchmarks, the average reductions in average latency and latency standard deviation are 4.64%/9.38% for a 4 x 4 x 2 mesh network, as well as 6.31%/13.51% for a 4 x 4 x 4 mesh network. In addition, a 4.0%/ 6.4% improvement in system throughput can be achieved for the two scales of mesh networks, respectively.

  • 26. Wang, S.
    et al.
    Wu, F.
    Lu, Zhonghai
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics, Electronic and embedded systems.
    Zhou, J.
    Xie, C.
    WARD: Wear aware RAID design within SSDs2018In: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, ISSN 0278-0070, E-ISSN 1937-4151, Vol. 37, no 11, p. 2918-2928, article id 8493504Article in journal (Refereed)
    Abstract [en]

    Redundant arrays of independent disk (RAID) is an efficient approach to relieve reliability sacrifice caused by aggressive scale-out of solid state drives (SSDs). Unfortunately, RAID is unfriendly to SSDs due to redundant parity write and data rebuilding. This paper proposes a wear aware RAID design for SSDs, called WARD, which: 1) adaptively organizes RAID stripes according to real-time interblock unbalanced wear for relieving high performance and storage overhead caused by parity data and 2) migrates blocks about to break in advance and leaves these blocks unused to reduce data rebuilding overhead. An efficient block wear detection scheme is employed to detect block wear during the whole lifetime of SSDs. Beginning with a large stripe width RAID instead of the redundant worst-case RAID, WARD reorganizes RAID stripes once wear blocks with high bit error rates come out. WARD divides the original stripe into several short width RAID stripes according to the number of wear blocks and separates all wear blocks into different stripes. This not only reduces parity redundancy but also provides high reliability to avoid more than RAID recoverable error-prone chunks remaining in one stripe. For high wear blocks tending to wear-out, data in them are migrated in advance and then the blocks are left unused, which efficiently avoids performance shock caused by data rebuilding. A reliability model considering interblock unbalanced wear is proposed and reveals that WARD provides a high and stable reliability and greatly prolongs the lifetime of SSDs. Comprehensive experiments based on an SSDsim derivative simulator are carried out and experiment results show that WARD considerably improves system performance compared to the worst-case RAID.

  • 27.
    Yu, Yang
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics, Electronic and embedded systems.
    Teijeira, Victor Diges
    KTH.
    Marranghello, Felipe
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics, Electronic and embedded systems.
    Dubrova, Elena
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics, Electronic and embedded systems.
    One-sided countermeasures for side-channel attacks can backfire2018In: WiSec 2018 - Proceedings of the 11th ACM Conference on Security and Privacy in Wireless and Mobile Networks, Association for Computing Machinery, Inc , 2018, p. 299-301Conference paper (Refereed)
    Abstract [en]

    Side-channel attacks are currently one of the most powerful attacks against implementations of cryptographic algorithms. They exploit the correlation between the physical measurements (power consumption, electromagnetic emissions, timing) taken at different points during the computation and the secret key. Some of the existing countermeasures offer a protection against one specific type of side channel only. We show that it can be a bad practice which can make exploitation of other side-channels easier. First, we perform a power analysis attack on an FPGA implementation of the Advanced Encryption Standard (AES) which is not protected against side-channel attacks and estimate the number of power traces required to extract its secret key. Then, we repeat the attack on AES implementations which are protected against fault injections by hardware redundancy and show that they can be broken with three times less power traces than the unprotected AES. We also demonstrate that the problem cannot be solved by complementing the duplicated module, as previously proposed. Our results show that there is a need for increasing knowledge about side-channel attacks and designing stronger countermeasures.

1 - 27 of 27
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf