kth.sePublications
Change search
Refine search result
123 1 - 50 of 127
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Attarzadeh-Niaki, Seyed Hosein
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Sander, Ingo
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Integrating Functional Mock-up units into a formal heterogeneous system modeling framework2015In: 18th CSI International Symposium on Computer Architecture and Digital Systems, CADS 2015, Institute of Electrical and Electronics Engineers (IEEE), 2015Conference paper (Refereed)
    Abstract [en]

    The Functional Mock-up Interface (FMI) standard defines a method for tool- and platform-independent model exchange and co-simulation of dynamic system models. In FMI, the master algorithm, which executes the imported components, is a timed differential equation solver. This is a limitation for heterogeneous embedded and cyber-physical systems, where models with different time abstractions co-exist and interact. This work integrates FMI into a heterogeneous system modeling and simulation framework as process constructors and co-simulation wrappers. Consequently, each external model communicates with the framework without unnecessary semantic adaptation while the framework provides necessary mechanisms for handling heterogeneity. The presented methods are implemented in the ForSyDe-SystemC modeling framework and tested using a case study.

  • 2.
    Attarzadeh-Niaki, Seyed-Hosein
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Altinel, Ekrem
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Koedam, Martijn
    Eindhoven University of Technology.
    Molnos, Anca
    CEA-LETI.
    Sander, Ingo
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Goossens, Kees
    Eindhoven University of Technology.
    A Composable and Predictable MPSoC Design Flow for Multiple Real-Time Applications2015Conference paper (Refereed)
    Abstract [en]

    Design of real-time MPSoC systems including multiple appli-cations is challenging because temporal requirements of each applicationmust be respected throughout the entire design flow. Currently the de-sign of different applications is often interdependent, making converge toa solution for each application difficult. This paper proposes a composi-tional method to design applications independently, and then to executethem without interference. We define a formal modeling framework as asuitable entry point for application design. The models are executable,which enables early detection of specification errors, and include the for-mal properties of the applications based on well-defined models of com-putation. We combine this with a predictable MPSoC platform templatethat has a supporting design flow but lacks a simulation front-end. Thestructure and behavior of the application models are exported to an in-termediate format via introspection which is iteratively adapted for thebackend flow. We identify the problems arising in this adaptation andprovide appropriate solutions. The design flow is demonstrated by a sys-tem consisting of two streaming applications where less than half of thedesign time is dedicated to operating on the integrated system model.

    Download full text (pdf)
    ForSyDe-CompSOC-MiFi
  • 3. Azad, S. P.
    et al.
    Farahini, Nasim
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Customization methodology of a Coarse Grained Reconfigurable architecture2015In: NORCHIP 2014 - 32nd NORCHIP Conference: The Nordic Microelectronics Event, 2015Conference paper (Refereed)
    Abstract [en]

    Mapping algorithms on CGRAs can lead to an inefficient implementation and hardware under-utilization if there is a mismatch between the granularity of reconfigurable processing unit and the algorithm. In this paper, we introduce a tool that takes the hardware configuration of a set of applications, identifies the unused parts of the CGRA, and let the user sweep the design space from fully programmable to fully customized by eliminating the unused components. User can select among multiple design points according to the application specification. This method is very useful to design multi-mode ASIC accelerators. The fully customized hardware generated using our tool has a negligible area and power overhead compared to the equivalent ASIC but can be generated significantly faster.

  • 4.
    Badawi, Mohammad
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Adaptive Coarse-grain Reconfigurable Protocol Processing Architecture2016Doctoral thesis, monograph (Other academic)
    Abstract [en]

    Digital signal processors and their variants have provided significant benefit to efficient implementation of Physical Layer (PHY) of Open Systems Interconnection (OSI) model’s seven-layer protocol processing stack compared to the general purpose processors. Protocol processors promise to provide a similar advantage for implementing higher layers in the (OSI)'s seven-layer model. This thesis addresses the problem of designing customizable coarse-grain reconfigurable protocol processing fabrics as a solution to achieving high performance and computational efficiency. A key requirement that this thesis addresses is the ability to not only adapt to varying applications and standards, and different modes in each standard but also to time varying load and performance demands while maintaining quality of service.This thesis presents a tile-based multicore protocol processing architecture that can be customized at design time to meet the requirements of the target application. The architecture can then be reconfigured at boot time and tuned to suit the desired use-case. This architecture includes a packet-oriented memory system that has deterministic access time and access energy costs, and hence can be accurately dimensioned to fulfill the requirements of the desired use-case. Moreover, to maintain quality of service as predicted, while minimizing the use of energy and resources, this architecture encompasses an elastic management scheme that controls run-time configuration to deploy processing resources based on use-case and traffic demands.To evaluate the architecture presented in this thesis, different case studies were conducted while quantitative and qualitative metrics were used for assessment. Energy-delay product, energy efficiency, area efficiency and throughput show the improvements that were achieved using the processing cores and the memory of the presented architecture, compared with other solutions. Furthermore, the results show the reduction in latency and power consumption required to evaluate controlling states when using the elastic management scheme. The elasticity of the scheme also resulted in reducing the total area required for the controllers that serve multiple processing cores in comparison with other designs. Finally, the results validate the ability of the presented architecture to support quality of service without misutilizing available energy during a real-life case study of a multi-participant Voice Over Internet Protocol (VOIP) call.

    Download full text (pdf)
    M_Badawi_PhD_Thesis_2016.pdf
  • 5.
    Badawi, Mohammad
    et al.
    KTH, School of Information and Communication Technology (ICT).
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Elastic Management and QoS Provisioning Scheme for Adaptable Multi-core Protocol Processing Architecture2016In: 19TH EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN (DSD 2016), IEEE, 2016, p. 575-583Conference paper (Refereed)
    Abstract [en]

    Adaptable protocol processing architectures can offer quality-of-service (QoS) while improving energy efficiency and resource utilization. However, a key condition for adaptable architectures to support QoS is that, the latency required for processor adaptation does not result in violating packet processing delay bound. Moreover, adaptation latency must not cause packets to accumulate until memory becomes full and packets are dropped. In this paper, we present an elastic management scheme for agile adaptable multi-core protocol processing architecture to facilitate processor adaptation when QoS has to be maintained. The proposed management scheme encompasses a set of reconfigurable finite state machines (FSMs) and each is dimensioned to associate single processing element (PE). During processor adaptation, the needed FSMs can rapidly be clustered to provide the control needed for the newly adapted structure. We use a real-life application to demonstrate how our proposed management scheme supports maintaining QoS during processor adaptation. We also quantify the time needed for processor adaptation as well as the reduction in energy, latency and area achieved when using our scheme.

  • 6.
    Badawi, Mohammad
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Service-Guaranteed Multi-Port PacketMemory for Parallel Protocol Processing Architecture2016In: Proceedings - 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, PDP 2016, Institute of Electrical and Electronics Engineers (IEEE), 2016, p. 408-412, article id 7445367Conference paper (Refereed)
    Abstract [en]

    Parallel processing architectures have been increasingly utilized due to their potential for improving performance and energy efficiency. Unfortunately, the anticipated improvement often suffers from a limitation caused by memory access latency and latency variation, which consequently impact Quality of Service (QoS). This paper presents a service-guaranteed multi-port packet memory system to boost parallelism in protocol processing architectures. In this proposed memory system, all arriving packets are guaranteed a memory space, such that, a packet memory space can be allocated in a bounded number of cycles and each of its locations is accessible in a single cycle. We consider a real-time Voice Over Internet Protocol (VOIP) call as a case-study to evaluate our service-guaranteed memory system.

  • 7. Bakhouya, Mohamed
    et al.
    Daneshtalab, Masoud
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Palesi, Maurizio
    Ghasemzadeh, Hassan
    Many-core System-on-Chip: architectures and applications2016In: Microprocessors and microsystems, ISSN 0141-9331, E-ISSN 1872-9436, Vol. 43, p. 1-3Article in journal (Refereed)
  • 8. Chabloz, J. -M
    et al.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Power management architecture in McNoC2012In: Scalable Multi-core Architectures: Design Methodologies and Tools / [ed] Soudris, Dimitrios and Axel Jantsch, Springer Science+Business Media B.V., 2012, p. 55-80Chapter in book (Other academic)
    Abstract [en]

    In this chapter we present the power management architecture of the McNoC platform. The power management architecture of McNoC offers distributed Dynamic Voltage Frequency Scaling (DVFS) and power down services to the platform at a fine level of granularity, allowing independent setting of frequency and supply voltage to all switch and resource nodes in the platform. The design style enables hierarchical physical design and solves the clock-domain-crossing problem with a solution based on rationally-related frequencies, which avoids the overhead associated with handshake. The architecture allows arbitrary power management regions to be defined and region-wide power management commands affecting all nodes in a region can be issued by the software layer that we call as Power Management Intelligence (PMINT).

  • 9. Chen, X.
    et al.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Li, Y.
    Jantsch, A.
    Zhao, Xueqian
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Chen, S.
    Guo, Y.
    Liu, Z.
    Lu, J.
    Wan, J.
    Sun, S.
    Chen, H.
    Achieving memory access equalization via round-trip routing latency prediction in 3D many-core NoCs2015In: Proceedings of IEEE Computer Society Annual Symposium on VLSI, ISVLSI, IEEE , 2015, p. 398-403Conference paper (Refereed)
    Abstract [en]

    3D many-core NoCs are emerging architectures for future high-performance single chips due to its integration of many processor cores and memories by stacking multiple layers. In such architecture, because processor cores and memories reside in different locations (center, corner, edge, etc.), memory accesses behave differently due to their different communication distances, and the performance (latency) gap of different memory accesses becomes larger as the network size is scaled up. This phenomenon may lead to very high latencies suffered from by some memory accesses, thus degrading the system performance. To achieve high performance, it is crucial to reduce the number of memory accesses with very high latencies. However, this should be done with care since shortening the latency of one memory access can worsen the latency of another as a result of shared network resources. Therefore, the goal should focus on narrowing the latency difference of memory accesses. In the paper, we address the goal by proposing to prioritize the memory access packets based on predicting the round-trip routing latencies of memory accesses. The communication distance and the number of the occupied items in the buffers in the remaining routing path are used to predict the round-trip latency of a memory access. The predicted round-trip routing latency is used as the base to arbitrate the memory access packets so that the memory access with potential high latency can be transferred as early and fast as possible, thus equalizing the memory access latencies as much as possible. Experiments with varied network sizes and packet injection rates prove that our approach can achieve the goal of memory access equalization and outperforms the classic round-robin arbitration in terms of maximum latency, average latency, and LSD1. In the experiments, the maximum improvement of the maximum latency, the average latency and the LSD are 80%, 14%, and 45% respectively.

  • 10.
    Chen, Xiaowen
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Jantsch, A.
    Chen, S.
    Guo, Y.
    Chen, H.
    Performance analysis of homogeneous on-chip large-scale parallel computing architectures for data-parallel applications2015In: Journal of Electrical and Computer Engineering, ISSN 2090-0147, E-ISSN 2090-0155, Vol. 2015, article id 902591Article in journal (Refereed)
    Abstract [en]

    On-chip computing platforms are evolving from single-core bus-based systems to many-core network-based systems, which are referred to as On-chip Large-scale Parallel Computing Architectures (OLPCs) in the paper. Homogenous OLPCs feature strong regularity and scalability due to its identical cores and routers. Data-parallel applications have their parallel data subsets that are handled individually by the same program running in different cores. Therefore, data-parallel applications are able to obtain good speedup in homogenous OLPCs. The paper addresses modeling the speedup performance of homogeneous OLPCs for data-parallel applications. When establishing the speedup performance model, the network communication latency and the ways of storing data of data-parallel applications are modeled and analyzed in detail. Two abstract concepts (equivalent serial packet and equivalent serial communication) are proposed to construct the network communication latency model. The uniform and hotspot traffic models are adopted to reflect the ways of storing data. Some useful suggestions are presented during the performance model's analysis. Finally, three data-parallel applications are performed on our cycle-accurate homogenous OLPC experimental platform to validate the analytic results and demonstrate that our study provides a feasible way to estimate and evaluate the performance of data-parallel applications onto homogenous OLPCs.

  • 11.
    Daneshtalab, Masoud
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems. University of Turku, Finland.
    Bagherzadeh, Nader
    Sarbazi-Azad, Hamid
    On-chip parallel and network-based systems Preface2015In: Integration, ISSN 0167-9260, E-ISSN 1872-7522, Vol. 50, p. 137-138Article in journal (Refereed)
  • 12.
    Daneshtalab, Masoud
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Bagherzadeh, Nader
    Sarbazi-Azad, Hamid
    Special issue on on-chip parallel and network-based systems2015In: Computing, ISSN 0010-485X, E-ISSN 1436-5057, Vol. 97, no 6, p. 539-541Article in journal (Other academic)
  • 13.
    Daneshtalab, Masoud
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems. University of Turku, Turku, Finland.
    Ebrahimi, Masoumeh
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems. University of Turku, Turku, Finland.
    Dytckov, Sergei
    Plosila, Juha
    In-order delivery approach for 2D and 3D NoCs2015In: Journal of Supercomputing, ISSN 0920-8542, E-ISSN 1573-0484, Vol. 71, no 8, p. 2877-2899Article in journal (Refereed)
    Abstract [en]

    In many applications, it is critical to guarantee the in-order delivery of requests from the master cores to the slave cores, so that the requests can be executed in the correct order without requiring buffers. Since in NoCs packets may use different paths and on the other hand traffic congestion varies on different routes, the in-order delivery constraint cannot be met without support. To guarantee the in-order delivery, traditional approaches either use dimension-order routing or employ reordering buffers at network interfaces. Dimension-order routing degrades the performance considerably while the usage of reordering buffers imposes large area overhead. In this paper, we present a mechanism allowing packets to be routed through multiple paths in the network, helping to balance the traffic load while guaranteeing the in-order delivery. The proposed method combines the advantages of both deterministic and adaptive routing algorithms. The simple idea is to use different deterministic algorithms for independent flows. This approach neither requires reordering buffers nor limits packets to use a single path. The algorithm is simple and practical with negligible area overhead over dimension-order routing. The concept is investigated in both 2D and 3D mesh networks.

  • 14.
    Daneshtalab, Masoud
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Mehdipour, Farhad
    Yu, Zhiyi
    Tenhunen, Hannu
    KTH, School of Information and Communication Technology (ICT), Industrial and Medical Electronics.
    Special Issue on Emerging Many-Core Systems for Exascale Computing2015In: ACM Journal on Emerging Technologies in Computing Systems, ISSN 1550-4832, E-ISSN 1550-4840, Vol. 11, no 4, article id 39Article in journal (Other academic)
  • 15.
    Daneshtalab, Masoud
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Palesi, M.
    Message from the chairs2016Conference proceedings (editor) (Refereed)
  • 16.
    Daneshtalab, Masoud
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Palesi, M.
    Sonntag, S.
    Angiolini, F.
    Message from the chairs2015In: ACM International Conference Proceeding, ACM Press, 2015, Vol. 13-17-June-2015Conference paper (Refereed)
  • 17.
    Daneshtalab, Masoud
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Palesi, Maurizio
    Mak, Terrence
    Introduction to the special issue on NoC-based many-core architectures2015In: Computers & electrical engineering, ISSN 0045-7906, E-ISSN 1879-0755, Vol. 45, p. 359-361Article in journal (Other academic)
  • 18. Diallo, P. I.
    et al.
    Attarzadeh-Niaki, Seyed Hosein
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Robino, Francesco
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Sander, Ingo
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Champeau, J.
    Öberg, Johnny
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    A formal, model-driven design flow for system simulation and multi-core implementation2015In: 2015 10th IEEE International Symposium on Industrial Embedded Systems, IEEE , 2015, p. 254-263Conference paper (Refereed)
    Abstract [en]

    With the growing complexity of Real-Time Embedded Systems (RTES), there is a huge interest in using modeling languages such as the Unified Modeling Language (UML), and other Model-Driven Engineering (MDE) techniques targeting RTES system design. These approaches provide language abstractions for system design, allowing to focus on their relevant properties. Unfortunately, such approaches still suffer from several shortcomings including the lack of well-defined semantics. Therefore, it remains difficult to connect the MDE specification tools and the design tools that are based on formal grounds and well-defined semantics to perform analysis, validation or system synthesis for RTES. This paper presents a top-down RTES design flow aiming to reduce the gap between MDE and formal design approaches. We present the connection between a framework dedicated to the enrichment of modeling languages such as UML with formal semantics, a framework based on formal models of computation supporting validation by simulation, and a system synthesis tool targeting a flexible platform with well-defined execution services. Our purpose is to cover several system design phases from specification, simulation down to implementation on a platform. As a case study, a JPEG Encoder application was realized following the different design steps of the tool-chain.

  • 19. Du, Gaoming
    et al.
    Ou, Yanghao
    Li, Xiangyang
    Song, Ping
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Gao, Minglun
    OLITS: An Ohm's Law-like Traffic Splitting Model Based on Congestion Prediction2016In: PROCEEDINGS OF THE 2016 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), IEEE conference proceedings, 2016, p. 1000-1005Conference paper (Refereed)
    Abstract [en]

    Through traffic splitting, multi-path routing in Network-on-Chip (NoC) outperforms single-path routing in terms of load balance and resource utilization. However, uncontrolled traffic splitting may aggravate network congestion and worsen the communication delay. We propose an Ohm's Law-like traffic splitting model aiming for application-specific NoC. We first characterize the flow congestion by redefining a contention matrix, which contains flow parameters such as average flow rate and burstiness. We then define flow resistance as the flow congestion factor extracted from the contention matrix, and use the parallel resistance theory to predicate the congestion state for every target sub-flow. Finally, the traffic splitting proportions of the parallel sub-flows are assigned according to the equivalent flow resistance. Experiments are taken both on 2D and 3D multi-path routing NoCs. The results show that the worst-case delay bound of target flow is significantly improved, and network congestion can be effectively balanced.

  • 20.
    Dubrova, Elena
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    A List of Maximum-Period NLFSRs2012Report (Other academic)
    Abstract [en]

    Non-Linear Feedback Shift Registers (NLFSRs) are a generalization of Linear Feedback Shift Registers (LFSRs) in which a current state is a nonlinear function of the previous state. While the theory behind LFSRs is wellunderstood, many fundamental problems related to NLFSRs remain open. Probably the most important one is finding a systematic procedure for constructing NLFSRs with a guaranteed long period. Available algorithms either consider some special cases, or are applicable to small NLFSRs only. In this paper, we present a complete list of n-bit NLFSRs with the period 2n − 1, n < 25, for three different types of feedback functions with algebraic degree two. We hope that the presented experimental data might help analysing feedback functions of maximum-period NLFSRs and finding a supporting theory characterizing them.

    Download full text (pdf)
    fulltext
  • 21.
    Dubrova, Elena
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Fault-tolerant design2013Book (Other academic)
    Abstract [en]

    This textbook serves as an introduction to fault-tolerance, intended for upper-division undergraduate students, graduate-level students and practicing engineers in need of an overview of the field. Readers will develop skills in modeling and evaluating fault-tolerant architectures in terms of reliability, availability and safety. They will gain a thorough understanding of fault tolerant computers, including both the theory of how to design and evaluate them and the practical knowledge of achieving fault-tolerance in electronic, communication and software systems. Coverage includes fault-tolerance techniques through hardware, software, information and time redundancy. The content is designed to be highly accessible, including numerous examples and exercises. Solutions and powerpoint slides are available for instructors.

  • 22.
    Dubrova, Elena
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Multiple-Valued Logic in VLSI: Challenges and Opportunities1999In: Proceedings of NORCHIP'99, IEEE conference proceedings, 1999Conference paper (Refereed)
  • 23.
    Dubrova, Elena
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Multiple-Valued Logic Synthesis and Optimization2002In: Logic Synthesis and Verification / [ed] S. Hassoun and T. Sasao, Springer, 2002, 1st, p. 89-114Chapter in book (Refereed)
  • 24.
    Dubrova, Elena
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    On Constructing Secure and Hardware-Efficient Invertible Mappings2016In: Proceedings of IEEE International Symposium on Multiple-Valued Logic, IEEE Computer Society, 2016Conference paper (Refereed)
    Abstract [en]

    Our society becomes increasingly dependent on wireless communications. The tremendous growth in the number and type of wirelessly connected devices in a combination with the dropping cost for performing cyberattacks create new challenges for assuring security of services and applications provided by the next generation of wireless communication networks. The situation is complicated even further by the fact that many end-point Internet of Things (IoT) devices have very limited resources for implementing security functionality. This paper addresses one of the aspects of this important, many-faceted problem - the design of hardware-efficient cryptographic primitives suitable for the protection of resource-constrained IoT devices. We focus on cryptographic primitives based on the invertible mappings of type {0,1,…,2n−1}→{0,1,…,2n−1}. In order to check if a given mapping is invertible or not, we generally need an exponential in n number of steps. In this paper, we derive a sufficient condition for invertibility which can be checked in O(n2N) time, where N is the size of representation of the largest function in the mapping. Our results can be used for constructing cryptographically secure invertible mappings which can be efficiently implemented in hardware.

  • 25.
    Dubrova, Elena
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Hell, Martin
    Lund University, Sweden.
    Espresso: A stream cipher for 5G wireless communication systems2017In: Cryptography and Communications, ISSN 1936-2447, E-ISSN 1936-2455, Vol. 9, no 2, p. 273-289Article in journal (Refereed)
    Abstract [en]

    The demand for more efficient ciphers is a likely to sharpen with new generation of products and applications. Previous cipher designs typically focused on optimizing only one of the two parameters - hardware size or speed, for a given security level. In this paper, we present a methodology for designing a class of stream ciphers which takes into account both parameters simultaneously. We combine the advantage of the Galois configuration of NLFSRs, short propagation delay, with the advantage of the Fibonacci configuration of NLFSRs, which can be analyzed formally. According to our analysis, the presented stream cipher Espresso is the fastest among the ciphers below 1500 GE, including Grain-128 and Trivium.

  • 26.
    Dubrova, Elena
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Näslund, Mats
    Ericsson AB.
    Carlsson, Gunnar
    Ericsson AB.
    Fornehed, John
    Ericsson AB.
    Smeets, Ben
    Ericsson AB.
    Two Countermeasures Against Hardware Trojans Exploiting Non-Zero Aliasing Probability of BIST2016In: Journal of Signal Processing Systems, ISSN 1939-8018, E-ISSN 1939-8115Article in journal (Refereed)
    Abstract [en]

    The threat of hardware Trojans has been widely recognized by academia, industry, and government agencies. A Trojan can compromise security of a system in spite of cryptographic protection. The damage caused by a Trojan may not be limited to a business or reputation, but could have a severe impact on public safety, national economy, or national security. An extremely stealthy way of implementing hardware Trojans has been presented by Becker et al. at CHES’2012. Their work have shown that it is possible to inject a Trojan in a random number generator compliant with FIPS 140-2 and NIST SP800-90 standards by exploiting non-zero aliasing probability of Logic Built-In-Self-Test (LBIST). In this paper, we present two methods for modifying LBIST to prevent such an attack. The first method makes test patterns dependent on a configurable key which is programed into a chip after the manufacturing stage. The second method uses a remote test management system which can execute LBIST using a different set of test patterns at each test cycle.

  • 27.
    Dubrova, Elena
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Näslund, Mats
    Ericsson, Sweden.
    Carlsson, Gunnar
    Ericsson, Sweden.
    Smeets, Ben
    Ericsson, Sweden.
    Keyed Logic BIST for Trojan Detection in SoC2014In: Proceedings of IEEE International Symposium on System-on-Chip (SOC'2014), IEEE conference proceedings, 2014Conference paper (Refereed)
    Abstract [en]

    As demonstrated by the recent attack on Intel’s Ivy Bridge processor, the traditional Logic Built-In Self-Test (LBIST) methods do not provide adequate protection of SoC against malicious modifications known as hardware Trojans. In this paper, we introduce a simple but efficient countermeasure against hardware Trojans which exploits non-zero aliasing probability of LBIST. We propose to generate LBIST test patterns based on a configurable key which is decided and programed into the circuit after the manufacturing stage. Since the key and hence expected LBIST signature are unknown at the manufacturing stage, an attack based on selecting suitable values for the Trojan which result in the same signature as a fault-free circuit signature becomes infeasible.

  • 28.
    Dubrova, Elena
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Näslund, Mats
    Ericsson AB.
    Selander, Göran
    Ericsson AB.
    CRC-Based Message Authentication for 5G Mobile Technology2015In: Proceedings of 2015 IEEE Trustcom/BigDataSE/ISPA, Institute of Electrical and Electronics Engineers (IEEE), 2015, Vol. 1, p. 1186-1191Conference paper (Refereed)
    Abstract [en]

    Our society greatly depends on mobile technologies. As wirelessly connected devices take over the control of the electricity in our homes, the water we drink and the transportation we use, it becomes increasingly important to guarantee the security of interactions of all players involved in a network. Apart from the high security needs, 5G will require utmost efficiency in the use of bandwidth and energy. In this paper, we show how to make the type of CRC checksum used in current LTE standards cryptographically secure with minimum extra resources. We present a new CRC-based message authentication method and provide a quantitative analysis of the achieved security as a function of message and CRC sizes. The presented method retains most of the implementation simplicity of the traditional CRC except that the LFSR implementing the encoding and decoding is required to have re-programmable connections. Similarly to previously proposed cryptographically secure CRCs, the presented CRC enables combining the detection of random and malicious errors without increasing bandwidth. Its main advantage is the ability to detect all double-bit errors in a message, which is of special importance for systems using Turbo codes, including LTE.

  • 29.
    Dubrova, Elena
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Näslund, Mats
    Ericsson Research, Sweden .
    Selander, Göran
    Ericsson Research, Sweden .
    Secure and Efficient LBIST for Feedback Shift Register-Based Cryptographic Systems2014In: Proceedings of 19th IEEE European Test Symposium (ETS'2014), IEEE conference proceedings, 2014Conference paper (Refereed)
    Abstract [en]

    Cryptographic methods are used to protect confidential information against unauthorised modification or disclo-sure. Cryptographic algorithms providing high assurance exist, e.g. AES. However, many open problems related to assuring security of a hardware implementation of a cryptographic algorithm remain. Security of a hardware implementation can be compromised by a random fault or a deliberate attack. The traditional testing methods are good at detecting random faults, but they do not provide adequate protection against malicious alterations of a circuit known as hardware Trojans. For example, a recent attack on Intel's Ivy Bridge processor demonstrated that the traditional Logic Built-In Self-Test (LBIST) may fail even the simple case of stuck-at fault type of Trojans. In this paper, we present a novel LBIST method for Feedback Shift Register (FSR)-based cryptographic systems which can detect such Trojans. The specific properties of FSR-based cryptographic systems allow us to reach 100% single stuck-at fault coverage with a small set of deterministic tests. The test execution time of the proposed method is at least two orders of magnitude shorter than the one of the pseudo-random pattern-based LBIST. Our results enable an efficient protection of FSR-based cryptographic systems from random and malicious stuck-at faults.

  • 30.
    Dubrova, Elena
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Näslund, Mats
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Selander, Göran
    Tsiatsis, Vlasios
    Energy-Efficient Message Authentication for IEEE 802.15.4-Based Wireless Sensor Networks2014In: Proceedings of 32nd Nordic Microelectronics Conference NORCHIP , IEEE conference proceedings, 2014Conference paper (Refereed)
    Abstract [en]

    The number of wirelessly connected devices is expected to increase to a few tens of billions by the year 2020. Newer generations of products and applications will sharpen demands for ultra-low energy consuming wireless devices. Various techniques for energy saving based on Discontinuous Reception (DRX) are known. However, DRX is vulnerable to unauthorized or fake trigger requests by malicious adversaries aiming to drain a device's battery. Existing message authentication methods can identify spoofed messages, but they require the reception of a complete message before its authenticity can be verified. In this paper, we present a method which inserts authentication checkpoints at several positions within a message. This enables a device to identify that a message is unauthorized and turn its radio receiver off as soon as the first checkpoint fails. The presented method has a low complexity with respect to the computational and memory resources and does not slow down the receiver. It can maintain the packet format prescribed by the IEEE 802.15.4 specification, which provides for backward compatibility. Finally, it incorporates authentication checkpoints at the MAC layer, which allows nodes that do not employ the presented method to participate in the communication.

  • 31. Dytckov, S.
    et al.
    Purohit, S. S.
    Daneshtalab, Masoud
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Plosila, J.
    Tenhunen, Hannu
    KTH, School of Information and Communication Technology (ICT), Industrial and Medical Electronics.
    Exploring NoC jitter effect on simulation of spiking neural networks2014In: Proceedings of the 2014 International Conference on High Performance Computing and Simulation, HPCS 2014, 2014, p. 693-696Conference paper (Refereed)
    Abstract [en]

    The major bottleneck in simulation of large-scale neural networks is the communication problem due to one-to-many neuron connectivity. Network-on-Chip concept has been proposed to address the problem. This work explores the drawback that is introduced by interconnection networks - a delay jitter. The preliminary experiment is held in the spiking neural network simulator introducing variable communicational delay to the simulation. The performance degradation is reported.

  • 32.
    Dytckov, Sergei
    et al.
    University of Turku, Finland.
    Daneshtalab, Masoud
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems. University of Turku, Finland.
    Ebrahimi, Masoumeh
    KTH, School of Information and Communication Technology (ICT), Industrial and Medical Electronics. University of Turku, Finland.
    Anwar, Hassan
    Ecole Polytechnique Montreal, Canada.
    Plosila, Juha
    University of Turku, Finland.
    Tenhunen, Hannu
    KTH, School of Information and Communication Technology (ICT), Industrial and Medical Electronics. University of Turku, Finland.
    Efficient STDP Micro-Architecture for Silicon Spiking Neural Networks2014In: 2014 17TH EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN (DSD), 2014, p. 496-503Conference paper (Refereed)
    Abstract [en]

    Spiking neural networks (SNNs) are the closest approach to biological neurons in comparison with conventional artificial neural networks (ANN). SNNs are composed of neurons and synapses which are interconnected with a complex pattern. As communication in such massively parallel computational systems is getting critical, the network-on-chip (NoC) becomes a promising solution to provide a scalable and robust interconnection fabric. However, using NoC for large-scale SNNs arises a trade-off between scalability, throughput, neuron/router ratio (cluster size), and area overhead. In this paper, we tackle the trade-off using a clustering approach and try to optimize the synaptic resource utilization. An optimal cluster size can provide the lowest area overhead and power consumption. For the learning purposes, a phenomenon known as spike-timing-dependent plasticity (STDP) is utilized. The micro-architectures of the network, clusters, and the computational neurons are also described. The presented approach suggests a promising solution of integrating NoCs and STDP-based SNNs for the optimal performance based on the underlying application.

  • 33.
    Ebrahimi, Masoumeh
    et al.
    KTH, School of Information and Communication Technology (ICT), Industrial and Medical Electronics. University of Turku, Finland .
    Wang, J.
    Huang, L.
    Daneshtalab, Masoud
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems. University of Turku, Finland .
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Rescuing healthy cores against disabled routers2014Conference paper (Refereed)
    Abstract [en]

    A router may be temporarily or permanently disabled in NoCs for several reasons such as saving power, occurring faults or testing. Disabling a router, however, may have a severe impact on the performance or functionality of the entire system if it results in disconnecting the core from the network. In this paper, we propose a deadlock-free routing algorithm which allows the core to stay connected to the system and continue its normal operation when its connected router is disabled. Our analysis and experiments show that the proposed technique has 100%, 93.60%, and 87.19% network availability by 100% packet delivery when 1, 2 and 3 routers are defunct or intentionally disabled. The algorithm provides adaptivity and it is lightweight, requiring one and two virtual channels along the X and Y dimension, respectively.

  • 34.
    Farahini, Nasim
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    SiLago: Enabling System Level Automation Methodology to Design Custom High-Performance Computing Platforms: Toward Next Generation Hardware Synthesis Methodologies2016Doctoral thesis, comprehensive summary (Other academic)
    Download full text (pdf)
    fulltext
  • 35.
    Farahini, Nasim
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Atomic stream computation unit based on micro-thread level parallelism2015In: IEEE 26th Application-specific Systems, Architectures and Processors (ASAP) 2015, IEEE , 2015, p. 25-29Conference paper (Refereed)
    Abstract [en]

    The increasing demand for higher resolution of images and communication bandwidth requires the streaming applications to deal with ever increasing size of datasets. Further, with technology scaling the cost of moving data is reducing at a slower pace compared to the cost of computing. These trends have motivated the proposed micro-architectural reorganization of stream processors by dividing the stream computation into functional computation, address constraints computation and address generation and deploying independent, distributed micro-threads to implement them. This scheme is an alternative to parallelizing them at instruction level. The proposed scheme has two benefits: a more efficient sequencer logic and energy savings in address generation and transportation. These benefits are quantified for a set of streaming applications and show average percentage improvement of 39 in silicon efficiency of the sequencer logic and 23 in total computational efficiency.

  • 36.
    Farahini, Nasim
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Jafri, S. M. A. H.
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Sohofi, Hassan
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    SiLago: A Structured Layout Scheme to Enable Efficient High Level and System Level Synthesis2016Report (Other academic)
  • 37.
    Farahini, Nasim
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Sohofi, Hassan
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    AlgoSil: A High Level Synthesis Tool targeting Micro-architecture Level Physical Design Platform2016Report (Other academic)
  • 38.
    Farahini, Nasim
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Sohofi, Hassan
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Li, Shuo
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Physical Design Aware System Level Synthesis of Hardware2015In: Proceedings - Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS), 2015, IEEE , 2015, p. 141-148Conference paper (Refereed)
    Abstract [en]

    In spite of decades of research, only a small percentage of hardware is designed using high-level synthesis because of the large gap between the abstraction levels of standard cells and algorithmic level. We propose a grid-based regular physical design platform composed of large grain hardened building blocks called SiLago blocks. This platform is divided into regions which are specialized for different functionalities like computation, storage, system control, etc. The characterized micro-architectural operations of the SiLago platform serve as the interface to meet-in-the-middle high-level and system-level syntheses framework. This framework was used to generate three hardware macro instances, derived from SiLago platform for three applications from signal processing domain. Results show two orders of magnitude improvements in efficiency of the system-level design space exploration and synthesis time, with average loss in design quality of 18% for energy and 54% for area compared to the commercial SOC flow.

  • 39. Firuzan, A.
    et al.
    Modarressi, M.
    Daneshtalab, Masoud
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Reconfigurable communication fabric for efficient implementation of neural networks2015In: 10th International Symposium on Reconfigurable and Communication-centric Systems-on-Chip, Institute of Electrical and Electronics Engineers (IEEE), 2015, article id 7238097Conference paper (Refereed)
    Abstract [en]

    Handling heavy multicast-based inter-neuron communication is the most challenging issue in parallel implementation of neural networks. To address this problem, a reconfigurable Network-on-Chip (NoC) architecture for neural networks is presented in this paper. The NoC consists of a number of node clusters with a fix topology connected by a reconfigurable inter-cluster communication fabric that efficiently handles multicast communication. The evaluation results show that the proposed architecture can better manage the multicast-based traffic of neural networks than the mesh-based topologies proposed in prior work. It offers up to 60% and 22% lower average message latency compared to a baseline and a state-of-the-Art NoC for neural networks, respectively, which directly translates to faster neural processing.

  • 40.
    Gkalea, Salvator
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Fault-Tolerant Nostrum NoC on FPGA for theForSyDe/NoC System Generator Tool Suite2014Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    Moore’s law is the observation that over the years, the transistor density will increase,allowing billions of transistors to be integrated on a single chip. Over the lasttwo decades, Moore’s law has enabled the implementation of complex systems on asingle chip(SoCs). The challenge of the System-on-Chip(SoC) era was the demandof an efficient communication mechanism between the growing number of processingcores on the chip. The outcome established an new interconnection scheme (amongothers, like crossbars, rings, buses) based on the telecommunication networks andthe Network- on-Chip(NoC) appeared on the scene.The NoC has been developed not only to support systems embedded into asingle processor, but also to support a set of processors embedded on a singlechip.Therefore, the Multi-Processors System on Chip(MPSoC) has arisen, whichincorporate processing elements, memories and I/O with a fixed interconnection infrastructurein a complete integrated system. In such systems, the NoC constitutesthe backbone of the communication architecture that targets future SoC composedby hundred of processing elements. Besides that, together with the deep sub-microntechnology progress, some drawbacks have arisen. The communication efficiencyand the reliability of the systems rely on the proper functionality of NoC for onchipdata communication. A NoC must deal with the susceptibility of transistors tofailure that indicates the demand for a fault tolerant communication infrastructure.A mechanism that can deal with the existence of different classes of faults(transient,intermittent and permanent [11]) which can occur in the communication network.In this thesis, different algorithms are investigated that implement fault toleranttechniques for permanent faults in the NoC. The outcome would be to deliver a faulttolerantmechanism for the NoC System Generator Tool [29] which is a researchin Network-on-Chip carried out at the Royal Institute of Technology. It will beexplicitly described the fault tolerant algorithm that is implemented in the switchin order to achieve packet rerouting around the faulty communication links.

    Download full text (pdf)
    fulltext
  • 41. Gorgen, Ralph
    et al.
    Gruttner, Kim
    Herrera, Fernando
    Panil, Pablo
    Medina, Julio
    Villar, Eugenio
    Palermo, Gianluca
    Fornaciari, William
    Brandolese, Carlo
    Gadioli, Davide
    Bocchio, Sara
    Ceva, Luca
    Azzoni, Paolo
    Poncino, Massimo
    Vinco, Sara
    Macii, Enrico
    Cusenza, Salvatore
    Favaro, John
    Valencia, Raul
    Sander, Ingo
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Rosvall, Kathrin
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Quaglia, Davide
    CONTREX: Design of embedded mixed-criticality CONTRol systems under consideration of EXtra-functional properties2016In: 19TH EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN (DSD 2016), IEEE, 2016, p. 286-293Conference paper (Refereed)
    Abstract [en]

    The increasing processing power of today's HW/SW platforms leads to the integration of more and more functions in a single device. Additional design challenges arise when these functions share computing resources and belong to different criticality levels. The paper presents the CONTREX European project and its preliminary results. CONTREX complements current activities in the area of predictable computing platforms and segregation mechanisms with techniques to consider the extra-functional properties, i.e., timing constraints, power, and temperature. CONTREX enables energy efficient and cost aware design through analysis and optimization of these properties with regard to application demands at different criticality levels.

  • 42.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    The SiLago method: Next generation VLSI architectures and design methods2016Conference proceedings (editor) (Refereed)
  • 43. Herrera, F.
    et al.
    Sander, Ingo
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Rosvall, Katrin
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Paone, E.
    Palermo, G.
    An efficient joint analytical and simulation-based design space exploration flow for predictable multi-core systems2015In: ACM International Conference Proceeding Series, ACM Digital Library, 2015Conference paper (Refereed)
    Abstract [en]

    Recent work has proposed two-phase joint analytical and simulation-based design space exploration (JAS-DSE) approaches. In such approaches, a first analytical phase relies on static performance estimation and either on exhaustive or heuristic search, to perform a very fast filtering of the design space. Then, a second phase obtains the Pareto solutions after an exhaustive simulation of the solutions found as compliant by the analytical phase. However, the capability of such approaches to find solutions close to the actual Pareto set at a reasonable time cost is compromised by current system complexities. This limitation is due to the fact that such approaches do not support an heuristic exploration on the simulation-based phase. It is not straightforward because in the second phase the heuristic is constrained to consider only the custom set of solutions found in the first phase. This set is in general unconnected and irregularly distributed, which prevents the application of existing heuristics. This paper provides as a solution a novel search heuristic called ARS (Adaptive Random Sampling). The ARS strategy enables the application of heuristic search in the two phases of the JAS-DSE flow, by enabling the application of heuristic in the second phase, regardless the type of performance estimation done at each phase. Moreover, it enables the definition of N-phase DSE flows. The paper shows on an experiment focused on predictable multi-core systems how this enhanced JAS-DSE is capable to find more efficient solutions and to tune the trade-off between exploration time and accuracy in finding actual Pareto solutions.

  • 44.
    Herrera, Fernando
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Sander, Ingo
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    An extensible infrastructure for modeling and time analysis of predictable embedded systems2015In: Forum on Specification and Design Languages, IEEE Computer Society, 2015Conference paper (Refereed)
    Abstract [en]

    Efficient design of predictable systems on top of multiprocessor-based architectures is challenging. It demands an integration effort to support system models relying on Models-of-Computation (MoC) theory, supporting real-time (RT) analysis and electronic system-level (ESL) design techniques. This paper presents a SystemC-based framework for modelling and time analysis of predictable embedded systems which aims such an integration. The framework has features for system-level design and research of predictable systems. Moreover, the framework is extensible, to enable experts from different communities to explore and assess their contributions, e.g. new schedulers, schedulability analyses, and predictable platform components, without having to rely on a physical platform. 

  • 45.
    Hjort Blindell, Gabriel
    et al.
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
    Menne, Christian
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Sander, Ingo
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Synthesizing Code for GPGPUs from abstract formal models2016In: 16th Conference on Languages, Design Methods, and Tools for Electronic System Design, FDL 2014, Springer, 2016, p. 115-134Conference paper (Refereed)
    Abstract [en]

    Today multiple frameworks exist for elevating the task of writing programs for GPGPUs, which are massively data-parallel execution platforms. These are needed as writing correct and high-performing applications for GPGPUs is notoriously difficult due to the intricacies of the underlying architecture. However, the existing frameworks lack a formal foundation that makes them difficult to use together with formal verification, testing, and design space exploration. We present in this chapter a novel software synthesis tool—called f2cc—which is capable of generating efficient GPGPU code from abstract formal models based on the synchronous model of computation. These models can be built using high-level modeling methodologies that hide low-level architecture details from the developer. The correctness of the tool has been experimentally validated on models derived from two applications. The experiments also demonstrate that the synthesized GPGPU code yielded a 28× speedup when executed on a graphics card with 96 cores and compared against a sequential version that uses only the CPU.

  • 46. Huang, L. -T
    et al.
    Dong, H.
    Wang, J. -S
    Daneshtalab, Masoud
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Li, G. -J
    WeNA: Deterministic Run-time Task Mapping for Performance Improvement in Many-core Embedded Systems2015In: IEEE Embedded Systems Letters, ISSN 1943-0663, Vol. 7, no 4, p. 93-96, article id 7097665Article in journal (Refereed)
    Abstract [en]

    Many-core embedded systems will feature an extremely dynamic workload distribution where massive applications arranged as an unpredictable sequence enter and leave the system at run-time. Efficient mapping strategy is required to allocate system resources to the incoming application. Noncontiguous mapping improves system throughput by utilizing disjoint nodes, however, the increasing communication distance and external congestion lead to high power consumption and network delay. This paper thus presents an enhanced noncontiguous dynamic mapping algorithm, aiming at decreasing interprocessor communication overhead and improving both network and application performance. Communication volumes are utilized to arrange the mapping order of tasks belong to the same application. Moreover, expanding parameter of each task is developed which directs the optimized mapping decision comparing to the current neighborhood and occupancy information. Experimental results show that our modified mapping algorithm Weighted-based Neighborhood Allocation (WeNA) makes considerable improvements on Average Weighted Manhattan Distance (8.06%) and network latency (9.8%) in comparison with the state-of-the-art algorithm.

  • 47. Huang, Letian
    et al.
    Wang, Junshi
    Ebrahimi, Masoumeh
    KTH, School of Information and Communication Technology (ICT), Industrial and Medical Electronics.
    Daneshtalab, Masoud
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Zhang, Xiaofan
    Li, Guangjun
    Jantsch, Axel
    Non-Blocking Testing for Network-on-Chip2016In: IEEE Transactions on Computers, ISSN 0018-9340, E-ISSN 1557-9956, Vol. 65, no 3, p. 679-692Article in journal (Refereed)
    Abstract [en]

    To achieve high reliability in on-chip networks, it is necessary to test the network as frequently as possible to detect physical failures before they lead to system-level failures. A main obstacle is that the circuit under test has to be isolated, resulting in network cuts and packet blockage which limit the testing frequency. To address this issue, we propose a comprehensive network-level approach which could test multiple routers simultaneously at high speed without blocking or dropping packets. We first introduce a reconfigurable router architecture allowing the cores to keep their connections with the network while the routers are under test. A deadlock-free and highly adaptive routing algorithm is proposed to support reconfigurations for testing. In addition, a testing sequence is defined to allow testing multiple routers to avoid dropping of packets. A procedure is proposed to control the behavior of the affected packets during the transition of a router from the normal to the testing mode and vice versa. This approach neither interrupts the execution of applications nor has a significant impact on the execution time. Experiments with the PARSEC benchmarks on an 8x8 NoC-based chip multiprocessors show only 3 percent execution time increase with four routers simultaneously under test.

  • 48.
    Jafari, Fahimeh
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Analysis and Management of Communication in On-Chip Networks2015Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Regarding the needs of low-power, high-performance embedded systems and the growing computation-intensive applications, the number of computing resources in a single chip has enormously increased. The current VLSI technology is able to support such an integration of transistors and add many computing resources such as CPU, DSP, specific IPs, etc to build a Systemon- Chip (SoC). However, interconnection between resources becomes another challenging issue which can be raised by using an on-chip interconnection network or Network-on-Chip (NoC). NoC-based communication which allows pipelined concurrent transmissions of transactions is becoming a dominate infrastructure for many core computing platforms.

    This thesis analyzes and manages both Best Effort (BE) and Guaranteed Service (GS) communications using analytical performance approaches. As the first step, the present thesis focuses on the flow control for BE traffic in NoC. It models BE source rates as the solution to a utility-based optimization problem which is constrained with link capacities while preserving GS traffic requirements at the desired level. Towards this, several utility functions including proportionally-fair, rate-sum, and max-min fair scenarios are investigated. Moreover, it is worth looking into a scenario in which BE source rates are determined in favor of minimizing the delay of such traffics. The presented flow control algorithms solve the proposed optimization problems determining injection rate in each BE source node.

    In the next step, real-time systems with guaranteed service are considered. Real-time applications require performance guarantees even under worst-case conditions, i.e. Quality of Service (QoS). Using network calculus, we present and prove the required theorems for deriving performance metrics and then apply them to propose formal approaches for the worst-case performance analysis. The proposed analytical model is used to minimize total cost in the networks in terms of buffer and delay. To this end, we address several optimization problems and solve them to consider the impact of various objective functions. We also develop a tool which derives performance metrics for a given NoC, formulates and solves the considerable optimization problems to provide an invaluable insight for NoC designers.

    Download full text (pdf)
    Thesis
  • 49.
    Jafari, Fahimeh
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Weighted Round Robin Configura- tion for Worst-Case Delay Optimization in Network-on-ChipManuscript (preprint) (Other academic)
  • 50.
    Jafari, Fahimeh
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Least Upper Delay Bound for VBR Flows in Networks-on- Chip with Virtual ChannelsIn: ACM Transactions on Design Automation of Electronic Systems, ISSN 1084-4309, E-ISSN 1557-7309Article in journal (Refereed)
123 1 - 50 of 127
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf