Change search
Refine search result
12 1 - 50 of 72
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Collin, Mikael
    et al.
    KTH, School of Information and Communication Technology (ICT), Communication: Services and Infrastucture, Software and Computer Systems, SCS.
    Brorsson, Mats
    KTH, School of Information and Communication Technology (ICT), Communication: Services and Infrastucture, Software and Computer Systems, SCS.
    Öberg, Johnny
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    A performance and energy exploration of dictionary code compression architectures2011In: 2011 International  Green Computing Conference and Workshops (IGCC), IEEE conference proceedings, 2011, p. 1-8Conference paper (Refereed)
    Abstract [en]

    We have made a performance and energy exploration of a previously proposed dictionary code compression mechanism where frequently executed individual instructions and/or sequences are replaced in memory with short code words. Our simulated design shows a dramatically reduced instruction memory access frequency leading to a performance improvement for small instruction cache sizes and to significantly reduced energy consumption in the instruction fetch path. We have evaluated the performance and energy implications of three architectural parameters: branch prediction accuracy, instruction cache size and organization. To asses the complexity of the design we have implemented the critical stages in VHDL.

  • 2.
    Deb, Abhijit Kumar
    et al.
    KTH, Superseded Departments, Microelectronics and Information Technology, IMIT.
    Hemani, Ahmed
    KTH, Superseded Departments, Microelectronics and Information Technology, IMIT.
    Öberg, Johnny
    KTH, Superseded Departments, Microelectronics and Information Technology, IMIT.
    Postula, Adam
    Department of CSEE, University of Queensland.
    Lindqvist, Dan
    Ericsson Radio Systems AB.
    Hardware software codesign of DSP system using grammar based approach2001In: VLSI Design, 2001. Fourteenth International Conference on, 2001, p. 42-47Conference paper (Refereed)
    Abstract [en]

    Embedded cores are gaining widespread use to deal with the complex DSP systems where flexibility is of utmost importance. The design of such a system offers several problems, which are not addressed by the existing methodology. The authors previously presented an integrated grammar based DSP design methodology that separates architectural and functional specification, can create a virtual prototype and has a smooth link to the implementation phase. In this paper we present the extension of the work to handle embedded cores. Here we the capture the host peripheral interface (HPI) of TMS320C6x core at higher level of abstraction and provide a single simulation environment, which facilitates faster analysis of hardware software components. Our results reveal that the proposed methodology offers simulation time speed-up of 5 times and design time speed-up of 8 times, while keeping the architectural specification separated from functionality

  • 3.
    Deb, Abhijit Kumar
    et al.
    KTH, Superseded Departments, Microelectronics and Information Technology, IMIT.
    Jantsch, Axel
    KTH, Superseded Departments, Microelectronics and Information Technology, IMIT.
    Öberg, Johnny
    KTH, Superseded Departments, Microelectronics and Information Technology, IMIT.
    System design for DSP applications in transaction level modeling paradigm2004In: 41st Design Automation Conference, Proceedings 2004, 2004, p. 466-471Conference paper (Refereed)
    Abstract [en]

    In this paper, we systematically define three transaction level models (TLMs), which reside at different levels of abstraction between the functional and the implementation model of a DSP system. We also show a unique language support to build the TLMs. Our results show that the abstract TLMs can be built and simulated much faster than the implementation model at the expense of a reasonable amount of simulation accuracy.

  • 4.
    Deb, Abhijit Kumar
    et al.
    KTH, Superseded Departments, Microelectronics and Information Technology, IMIT.
    Jantsch, Axel
    KTH, Superseded Departments, Microelectronics and Information Technology, IMIT.
    Öberg, Johnny
    KTH, Superseded Departments, Microelectronics and Information Technology, IMIT.
    System design for DSP applications using the MASIC methodology2004In: DESIGN, AUTOMATION AND TEST IN EUROPE CONFERENCE AND EXHIBITION, VOLS 1 AND 2, PROCEEDINGS, LOS ALAMITOS: IEEE COMPUTER SOC , 2004, p. 630-635Conference paper (Refereed)
    Abstract [en]

    Expensive top-down iterations are often required in the design cycle of complex DSP systems. In this paper, we introduce two levels of abstraction in the design flow by, systematically categorizing the architectural decisions. As a result, the top-down iteration loop is broken. We also present a technique to capture and inject the architectural decisions such that the system models can be created and simulated efficiently. The concepts are illustrated by a realistic speech processing example, which is implemented using the AMBA on-chip architecture. Our methodology offers a smooth path from the functional modeling phase to the implementation level, facilitates the reuse of HW and SW components, and enjoys existing tool support at the backend.

  • 5.
    Deb, Abhijit Kumar
    et al.
    KTH, Superseded Departments, Microelectronics and Information Technology, IMIT.
    Öberg, Johnny
    Jantsch, Axel
    Performance Analysis and Architectural Refinement of Embedded DSP Systems in the MASIC Methodology2002In: Proceedings of Swedish System-on-Chip Conference, 2002Conference paper (Refereed)
  • 6. Diallo, P. I.
    et al.
    Attarzadeh-Niaki, Seyed Hosein
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Robino, Francesco
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Sander, Ingo
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Champeau, J.
    Öberg, Johnny
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    A formal, model-driven design flow for system simulation and multi-core implementation2015In: 2015 10th IEEE International Symposium on Industrial Embedded Systems, IEEE , 2015, p. 254-263Conference paper (Refereed)
    Abstract [en]

    With the growing complexity of Real-Time Embedded Systems (RTES), there is a huge interest in using modeling languages such as the Unified Modeling Language (UML), and other Model-Driven Engineering (MDE) techniques targeting RTES system design. These approaches provide language abstractions for system design, allowing to focus on their relevant properties. Unfortunately, such approaches still suffer from several shortcomings including the lack of well-defined semantics. Therefore, it remains difficult to connect the MDE specification tools and the design tools that are based on formal grounds and well-defined semantics to perform analysis, validation or system synthesis for RTES. This paper presents a top-down RTES design flow aiming to reduce the gap between MDE and formal design approaches. We present the connection between a framework dedicated to the enrichment of modeling languages such as UML with formal semantics, a framework based on formal models of computation supporting validation by simulation, and a system synthesis tool targeting a flexible platform with well-defined execution services. Our purpose is to cover several system design phases from specification, simulation down to implementation on a platform. As a case study, a JPEG Encoder application was realized following the different design steps of the tool-chain.

  • 7.
    Ellervee, Peeter
    et al.
    KTH, Superseded Departments, Electronic Systems Design.
    Hemani, Ahmed
    KTH, Superseded Departments, Electronic Systems Design.
    Kumar, Anshul
    KTH, Superseded Departments, Electronic Systems Design.
    Svantesson, Bengt
    KTH, Superseded Departments, Electronic Systems Design.
    Öberg, Johnny
    KTH, Superseded Departments, Electronic Systems Design.
    Tenhunen, Hannu
    KTH, Superseded Departments, Electronic Systems Design.
    Controller Synthesis in Control and Memory Centric High-level Synthesis System1996In: Proceedings of the Baltic Electronics Conference, 1996, p. 393-396Conference paper (Refereed)
  • 8.
    Ellervee, Peeter
    et al.
    KTH, Superseded Departments, Electronic Systems Design.
    Jantsch, Axel
    KTH, Superseded Departments, Electronic Systems Design.
    Öberg, Johnny
    KTH, Superseded Departments, Electronic Systems Design.
    Hemani, Ahmed
    KTH, Superseded Departments, Electronic Systems Design.
    Tenhunen, Hannu
    KTH, Superseded Departments, Electronic Systems Design.
    Exploring ASIC Design Space at System Level with a Neural Network Estimator1994In: Proc. of IEEE ASIC-conference, 1994, 1994Conference paper (Refereed)
    Abstract [en]

    Estimators are critical tools in doing architectural level exploration of the design space. We present a novel approach to estimation based on the multilayer perceptron which builds the estimation function during the learning process and thus allows to describe arbitrary complex functions. We also describe how the control data flow graph is encoded for the neural network input and we present results of the first experiments made with realistic design examples.

  • 9.
    Ellervee, Peeter
    et al.
    KTH, Superseded Departments, Electronic Systems Design.
    Öberg, Johnny
    KTH, Superseded Departments, Electronic Systems Design.
    Jantsch, Axel
    KTH, Superseded Departments, Electronic Systems Design.
    Hemani, Ahmed
    KTH, Superseded Departments, Electronic Systems Design.
    Area Estimation in the High Level Synthesis Using Neural Networks1994Conference paper (Refereed)
  • 10.
    Ezzeddine, Hussein
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Öberg, Johnny
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Robino, Francesco
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Validation of Pipelined Double-precision Floating Point operations in a multi-core environment implemented on FPGA using the ForSyDe/NoC system generator tool suite2015In: NORCHIP 2014 - 32nd NORCHIP Conference: The Nordic Microelectronics Event, 2015Conference paper (Refereed)
    Abstract [en]

    Testing HW IP Blocks in multi-core environments is difficult. This paper presents a case study where a SINE/COSINE implementation using Pipelined Double-precision operations is implemented in one node, and results are sent through the NoC to a target node for inspection. The purpose of the experiments are two-fold, a) to study how debugging in a multi-core environment can be done and b) to examine why the original SINE/COSINE implementation is generating wrong results. During the experiments, several test-methods are applied to validate the implementations until the Floating Point implementation are generating correct values. After eliminating all faults in the operations, the SINE/COSINE function still generates some residual algorithmic errors, coming from the way the function was implemented. However, the experiments show that these errors can be eliminated with the help of some simple trigonometric rales.

  • 11. Fakih, M.
    et al.
    Grüttner, K.
    Schreiner, S.
    Seyyedi, R.
    Azkarate-Askasua, M.
    Onaindia, P.
    Poggi, T.
    Romero, N. G.
    Gonzalez, E. Q.
    Sundström, T.
    Frasquet, S. P.
    Balbastre, P.
    Mohammadat, T.
    Öberg, Johnny
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics, Electronic and embedded systems.
    Bebawy, Y.
    Obermaisser, R.
    Maleki, A.
    Lenz, A.
    Graham, D.
    Experimental evaluation of SAFEPOWER architecture for safe and power-efficient mixed-criticality systems2019In: Journal of Low Power Electronics and Applications, Vol. 9, no 1, article id 12Article in journal (Refereed)
    Abstract [en]

    With the ever-increasing industrial demand for bigger, faster and more efficient systems, a growing number of cores is integrated on a single chip. Additionally, their performance is further maximized by simultaneously executing as many processes as possible. Even in safety-critical domains like railway and avionics, multicore processors are introduced, but under strict certification regulations. As the number of cores is continuously expanding, the importance of cost-effectiveness grows. One way to increase the cost-efficiency of such a System on Chip (SoC) is to enhance the way the SoC handles its power consumption. By increasing the power efficiency, the reliability of the SoC is raised because the lifetime of the battery lengthens. Secondly, by having less energy consumed, the emitted heat is reduced in the SoC, which translates into fewer cooling devices. Though energy efficiency has been thoroughly researched, there is no application of those power-saving methods in safety-critical domains yet. The EU project SAFEPOWER (Safe and secure mixed-criticality systems with low power requirements) targets this research gap and aims to introduce certifiable methods to improve the power efficiency of mixed-criticality systems. This article provides an overview of the SAFEPOWER reference architecture for low-power mixed-criticality systems, which is the most important outcome of the project. Furthermore, the application of this reference architecture in novel railway interlocking and flight controller avionic systems was demonstrated, showing the capability to achieve power savings up to 37%, while still guaranteeing time-triggered task execution and time-triggered NoC-based communication. 

  • 12.
    Hemani, Ahmed
    et al.
    KTH, Superseded Departments, Electronic Systems Design.
    Svantesson, Bengt
    KTH, Superseded Departments, Electronic Systems Design.
    Ellervee, Peeter
    KTH, Superseded Departments, Electronic Systems Design.
    Postula, Adam
    Dept. of Electrical and Computer Engineering, University of Queensland.
    Öberg, Johnny
    KTH, Superseded Departments, Electronic Systems Design.
    Jantsch, Axel
    KTH, Superseded Departments, Electronic Systems Design.
    Tenhunen, Hannu
    KTH, Superseded Departments, Electronic Systems Design.
    High-level synthesis of control and memory intensive communication systems1995In:  , 1995, p. 185-191Conference paper (Refereed)
    Abstract [en]

    Communication sub-systems that deal with switching, routing and protocol implementation often have their functionality dominated by control logic and interaction with memory. Synthesis of such Control and Memory Intensive Systems (hereafter abbreviated to CMISTs) poses demands that in the past have not been met satisfactorily by general purpose high-level synthesis (HLS) tools and have led to several research efforts to address these demands. In this paper we: characterise CMISTs from the synthesis viewpoint; present a synthesis methodology adapted for CMISTs; present the Operation and Maintenance (OAM) Protocol of the ATM, its modelling in VHDL and synthesis aspects of the VHDL model; present the results of applying the synthesis methodology to the OAM as a test case-the results are compared to that obtained using the not adapted general purpose High-level synthesis tool; prove the efficacy of the proposed synthesis methodology by applying it to an industrial design and comparing our results to the results from two commercial HLS tools and to the results obtained by designing manually at register-transfer level

  • 13.
    Hemani, Ahmed
    et al.
    KTH, Superseded Departments, Electronic Systems Design.
    Svantesson, Bengt
    KTH, Superseded Departments, Electronic Systems Design.
    Ellervee, Peeter
    KTH, Superseded Departments, Electronic Systems Design.
    Postula, Adam
    Dept. of Electrical and Computer Engineering, University of Queensland.
    Öberg, Johnny
    KTH, Superseded Departments, Electronic Systems Design.
    Jantsch, Axel
    KTH, Superseded Departments, Electronic Systems Design.
    Tenhunen, Hannu
    KTH, Superseded Departments, Electronic Systems Design.
    Trade-offs in High-level Synthesis of Telecommunication Circuits1995Conference paper (Refereed)
  • 14.
    Isoaho, Jouni
    et al.
    Tampere University of Technology, Signal Processing Laboratory.
    Öberg, Johnny
    KTH, Superseded Departments, Electronic Systems Design.
    Hemani, Ahmed
    KTH, Superseded Departments, Electronic Systems Design.
    Tenhunen, Hannu
    KTH, Superseded Departments, Electronic Systems Design.
    High level synthesis in DSP ASIC optimization1994In:  Proc. of 7th IEEE ASIC Conference and Exhibit, 1994, p. 75-78Conference paper (Refereed)
    Abstract [en]

    In this paper Digital Signal Processing (DSP) system optimization with High Level Synthesis (HLS) environment is presented. To optimize a behavioural VHDL description, commercial SYNT and Synopsys synthesis tools are utilized. The optimization results are improved with a simple rule based preallocator. The coefficient optimization is done in Matlab to provide an efficient implementation of power-of-two and multiply-accumulate based FIR filters. The optimization results are presented using practical filter examples

  • 15.
    Isoaho, Jouni
    et al.
    Tampere University of Technology, Signal Processing Laboratory.
    Öberg, Johnny
    KTH, Superseded Departments, Electronic Systems Design.
    Hemani, Ahmed
    KTH, Superseded Departments, Electronic Systems Design.
    Tenhunen, Hannu
    KTH, Superseded Departments, Electronic Systems Design.
    HLS based DSP optimization with ASIC RTL libraries1994In:  , 1994, p. 218-225Conference paper (Refereed)
    Abstract [en]

    In this paper we show how the High Level Synthesis (HLS) tool can efficiently be used for DSP ASIC development. The performance of general HLS tool is improved with simple transformations and code optimizations, and a direct mapping to technology optimized parameterizable ASIC Register Transfer Level (RTL) library. The library mapping contains three phases: a structure recognition, an architecture selection and a parameter optimization. As an optimization framework SYNT, Synopsys and Matlab design environments are integrated. Lsi10k and Xilinx 4000 series are used as target technologies to demonstrate the performance of the approach

  • 16.
    Jantsch, Axel
    et al.
    KTH, Superseded Departments, Electronic Systems Design.
    Ellervee, Peeter
    KTH, Superseded Departments, Electronic Systems Design.
    Hemani, Ahmed
    KTH, Superseded Departments, Electronic Systems Design.
    Öberg, Johnny
    KTH, Superseded Departments, Electronic Systems Design.
    Tenhunen, Hannu
    KTH, Superseded Departments, Electronic Systems Design.
    Hardware/software partitioning and minimizing memory interface traffic1994In: Proceedings of the conference on European design automation 1994, 1994, p. 226-231Conference paper (Refereed)
  • 17.
    Jantsch, Axel
    et al.
    KTH, Superseded Departments (pre-2005), Electronic Systems Design.
    Ellervee, Peeter
    Öberg, Johnny
    KTH, Superseded Departments (pre-2005), Electronic Systems Design.
    Hemani, Ahmed
    A Case Study on Hardware/Software Partitioning1994In: Proceedings of IEEE Workshop on FPGAs for Custom Computing Machines, IEEE conference proceedings, 1994, p. 111-118Conference paper (Refereed)
    Abstract [en]

    We present an analysis of a fully automatic method to accelerate standard software in C or C++ by use of field programmable gate arrays. Traditional compiler techniques are applied to the hardware/software partitioning problem and a compiler is linked to state of the art hardware synthesis tools. Time critical regions are identified by means of profiling and are automatically implemented in user programmable logic with high level and logic synthesis design tools. The underlying architecture is an add-on board with user programmable logic connected to a Spare based workstation via the system bus. We present an analysis and case study of this method. Eight programs are used as test cases and the data collected by applying this method to programs is used to discuss potentials and limitations of this and similar methods. We discuss architectural parameters, programming language properties, and analysis techniques.

  • 18.
    Jantsch, Axel
    et al.
    KTH, Superseded Departments, Microelectronics and Information Technology, IMIT.
    Kumar, Shashi
    Indian Institute of Technology.
    Sander, Ingo
    KTH, Superseded Departments, Microelectronics and Information Technology, IMIT.
    Svantesson, Bengt
    Öberg, Johnny
    KTH, Superseded Departments, Microelectronics and Information Technology, IMIT.
    Hemani, Ahmed
    KTH, Superseded Departments, Microelectronics and Information Technology, IMIT.
    Ellervee, Peeter
    KTH, Superseded Departments, Microelectronics and Information Technology, IMIT.
    O’Nils, Mattias
    A comparison of six languages for system level description of telecom applications2001In: Electronic Chips & System Design Languages, Kluwer Academic Publishers, 2001, p. 181-192Chapter in book (Other academic)
  • 19.
    Jantsch, Axel
    et al.
    KTH, Superseded Departments, Electronic Systems Design.
    Öberg, Johnny
    KTH, Superseded Departments, Electronic Systems Design.
    Ellervee, Peeter
    KTH, Superseded Departments, Electronic Systems Design.
    Hemani, Ahmed
    KTH, Superseded Departments, Electronic Systems Design.
    A software oriented approach to hardware-software co-design1994In: International conference on Compiler Construction, 1994, p. 93-102Conference paper (Refereed)
  • 20.
    Jantsch, Axel
    et al.
    KTH, Superseded Departments, Electronic Systems Design.
    Öberg, Johnny
    Tenhunen, Hannu
    KTH, Superseded Departments, Electronic Systems Design.
    Special Issue on Networks on Chip - guest editor’s introduction2004In: Journal of systems architecture, ISSN 1383-7621, E-ISSN 1873-6165, Vol. 50, no 2-3, p. 61-63Article in journal (Other academic)
  • 21.
    Kumar, Shashi
    et al.
    KTH, Superseded Departments, Electronic Systems Design.
    Jantsch, Axel
    KTH, Superseded Departments, Electronic Systems Design.
    Soininen, Juha-Pekka
    Forsell, Martti
    Millberg, Mikael
    KTH, Superseded Departments, Electronic Systems Design.
    Öberg, Johnny
    KTH, Superseded Departments, Electronic Systems Design.
    Tiensyrja, Kari
    Hemani, Ahmed
    A network on chip architecture and design methodology2002In: VLSI 2002: IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI - NEW PARADIGMS FOR VLSI SYSTEMS DESIGN, IEEE conference proceedings, 2002, p. 105-112Conference paper (Refereed)
    Abstract [en]

    We propose a packet switched platform for single chip systems which scales well to an arbitrary number of processor like resources. The platform, which we call Network-on-Chip (NOC), includes both the architecture and the design methodology. The NOC architecture is a m x n mesh of switches and resources are placed on the slots formed by the switches. We assume a direct layout of the 2-D mesh of switches and resources providing physical- architectural level design integration. Each switch is connected to one resource and four neighboring switches, and each resource is connected to one switch. A resource can be a processor core, memory, an FPGA, a custom hardware block or any other intellectual property (LP) block, which fits into the available slot and complies with the interface of the NOC. The NOC architecture essentially is the onchip communication infrastructure comprising the physical layer, the data link layer and the network layer of the OSI protocol stack. We define the concept of a region, which occupies an area of any number of resources and switches. This concept allows the NOC to accommodate large resources such as large memory banks, FPGA areas, or special purpose computation resources such as high performance multiprocessors. The NOC design methodology consists of two phases. In the first phase a concrete architecture is derived from the general NOC template. The concrete architecture defines the number of switches and shape of the network, the kind and shape of regions and the number and kind of resources. The second phase maps the application onto the concrete architecture to form a concrete product.

  • 22.
    Kyriakakis, Eleftherios
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronics.
    Ngo, Kalle
    KTH, School of Information and Communication Technology (ICT), Electronics.
    Öberg, Johnny
    KTH, School of Information and Communication Technology (ICT), Electronics.
    Mitigating Single-Event Upsets in COTS SDRAM using an EDAC SDRAM Controller2017In: 2017 IEEE NORDIC CIRCUITS AND SYSTEMS CONFERENCE (NORCAS): NORCHIP AND INTERNATIONAL SYMPOSIUM OF SYSTEM-ON-CHIP (SOC) / [ed] Nurmi, J Vesterbacka, M Wikner, JJ Alvandpour, A NielsenLonn, M Nielsen, IR, IEEE , 2017Conference paper (Refereed)
    Abstract [en]

    From deep space missions to low-earth orbit satellites, the natural radiation of space proves to be a hostile environment for electronics. Memory elements in particular are highly susceptible to radiation charge that if latched can cause single-event upsets (SEU, bit-flips) which lead to data corruption and even mission critical failures. On Earth, SDRAM devices are widely used as a cost-effective, high performance storage elements in almost every computer system. However, their physical design makes them highly susceptible to SEUs. Thus, their usage in space application is limited and usually avoided, requiring the use of radiation hardened components which are generally a few generations older and often much more expensive than COTS. In this paper, an off-chip SEU/MBU mitigation mechanism is presented that aims to drastically reduce the probability of data corruption inside a commercial-off-the-shelf (COTS) synchronous dynamic random access memory (SDRAM) using a triple modular redundant (TMR) scheme for data and periodic scrubbing. The proposed mitigation technique is implemented in a novel controller that will be used by the single-event upset detector (SEUD) experiment aboard the KTH MInature STudent (MIST) satellite project.

  • 23. Lenz, Alina
    et al.
    Blazquez, Mikel Azkarate-Askasua
    Coronel, Javier
    Crespo, Alfons
    Davidmann, Simon
    Diaz Garcia, Juan Carlos
    Gonzalez Romero, Nera
    Gruettner, Kim
    Obermaisser, Roman
    Öberg, Johnny
    KTH.
    Perez, Jon
    Sander, Ingo
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Soederquist, Ingemar
    SAFEPOWER project: Architecture for Safe and Power-Efficient Mixed-Criticality Systems2016In: 19TH EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN (DSD 2016), IEEE, 2016, p. 294-300Conference paper (Refereed)
    Abstract [en]

    With the ever increasing industrial demand for bigger, faster and more efficient systems, a growing number of cores is integrated on a single chip. Additionally, their performance is further maximized by simultaneously executing as many processes as possible not regarding their criticality. Even safety critical domains like railway and avionics apply these paradigms under strict certification regulations. As the number of cores is continuously expanding, the importance of cost-effectiveness grows. One way to increase the cost-efficiency of such System on Chip (SoC) is to enhance the way the SoC handles its power resources. By increasing the power efficiency, the reliability of the SoC is raised, because the lifetime of the battery lengthens. Secondly, by having less energy consumed, the emitted heat is reduced in the SoC which translates into fewer cooling devices. Though energy efficiency has been thoroughly researched, there is no application of those power saving methods in safety critical domains yet. The EU project SAFEPOWER(1) targets this research gap and aims to introduce certifiable methods to improve the power efficiency of mixed-criticality real-time systems (MCRTES). This paper will introduce the requirements that a power efficient SoC has to meet and the challenges such a SoC has to overcome.

  • 24. Mand, N. P.
    et al.
    Öberg, Johnny
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Going for brain-scale integration - Using FPGAS, TSVs and NOC based artificial neural networks: A case study2014In: 11th FPGAworld Conference - Academic Proceedings 2014, FPGAWorld 2014, ACM Digital Library, 2014Conference paper (Refereed)
    Abstract [en]

    With better understanding of brain's massive parallel processing, brain-scale integration has been announced as one of the key research area in modern times and numerous efforts has been done to mimic such models. Multicore architectures, Network-On-Chip, 3D stacked ICs with TSVs, FPGA's growth beyond Moore's law and new design methodologies like high level synthesis will ultimately lead us toward single- and multi-chip solutions of Artificial Neural Net models comprising of millions or even more neurons per chip. Historically ANNs have been emulated as either software models, ASICs or a hybrid of both. Software models are very slow while ASICs based designs lacks plasticity. FPGA consumes a little more power but offer the flexibility of software and performance of ASICs along with basic requirement of plasticity in the form of reconfigurability. However, the traditional bottom up approach for building large ANN models is no more feasible and wiring along with memory becomes major bottlenecks when considering networks comprised of large number of neurons. The aim of this paper is to present a design space exploration of large-scale ANN models using a scalable NOC based architecture together with high level synthesis tools to explore the feasibility of implementing brain-scale ANNs on FPGAs using 3D stacked memory structures.

  • 25.
    Mand, Nowshad Painda
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Robino, Francesco
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Öberg, Johnny
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Artificial neural network emulation on NOC based multi-core FPGA platform2012In: NORCHIP, 2012, IEEE , 2012, p. 6403122-Conference paper (Refereed)
    Abstract [en]

    With the emergence of Multi-Core platforms, brain emulation in the form of Artificial Neural Nets has been announced as one of the important key research area. However, due to large non-linear growth of inter-neuron connectivity, direct mapping of ANNs to silicon structures is very difficult due to communication bottleneck.

  • 26.
    Minhass, Wajid Hassan
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Öberg, Johnny
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Sander, Ingo
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Design and implementation of a plesiochronous multi-core 4x4 network-on-chip FPGA platform with MPI HAL support2009In: 6th FPGAworld Conference, Academic Proceedings 2009, ACM , 2009, p. 52-57Conference paper (Refereed)
    Abstract [en]

    The Multi-Core NoC is a 4 by 4 Mesh NoC targeted for Altera FPGAs. It implements a deflective routing policy and is used to connect sixteen NIOS II processors. Each NIOS II is connected to the NoC via an address-mapped Resource Network Interface. The Multi-Core NoC is implemented on four separate Altera Stratix II FPGA boards, each hosting a Quad-Core NoC, which operates on a local 50 MHz clock. It has an onboard throughput of 650 Mbps (12.5 MFlit/s), and uses 28% of the LUs, 18% of the ALUTs, 22 % of the dedicated registers and 31% of the total memory blocks of a Stratix II FPGA. Asynchronous clock bridges, with a throughput of 50 Mbps (∼1MFlit/s), are used for the inter-board communication. Application programs use an MPI compatible Hardware Abstraction Layer (HAL) to communicate with the Resource Network Interface of the NoC. The RNI sets up message transfer, with a maximum length of 512 bytes, and sends flits with the size of 32 bit data plus 20 bit headers through the network. The MPI is the bottleneck of the system; it takes 46 us (43.4 kPackets/s) to send a minimum-sized packet through the protocol stack to a near neighbour and bounce it back to the original application. The bounce-back time for a far neighbour is 56 us.

  • 27.
    Minhass, Wajid Hassan
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Öberg, Johnny
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Sander, Ingo
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Implementation of a scalable, globally plesiochronous locally synchronous, off-chip NoC communication protocol2009In: 2009 NORCHIP, 2009, p. 1-5Conference paper (Refereed)
    Abstract [en]

    Multiprocessor system-on-chip design (MPSoC) is becoming a regular feature of the embedded systems. Shared-bus systems hold many advantages, but they do not scale. Network on chip (NoC) offers a promising solution to the scalability problem by enhancing the topology design. However, standard NoCs are only scalable within a chip. To be able to build infinitely scalable structures, a method to enhance the NoC-grid off-chip is needed. In this paper, we present such a method. As a proof of concept, the protocol is implemented on a 4 by 4 Mesh NoC, with NIOS II CPU cores as nodes, partitioned across four separate Altera FPGA boards, each board hosting a Quad-Core (2x2) NoC, operating on a local 50 MHz clock. The inter-chip communication protocol uses asynchronous clock bridges, with a throughput of 50 Mbps (~1MFlit/s) and is completely scalable. The NoC has an onboard throughput of 650 Mbps (12.5 MFlit/s). Each Quad-Core uses 28% of the LUs, 18% of the ALUTs, 22 % of the dedicated registers and 31% of the total memory blocks of the Stratix II FPGAs. Application programs use an MPI compatible Hardware Abstraction Layer (HAL) to communicate with each other over the NoC.

  • 28.
    Navas, Byron
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Sander, Ingo
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Öberg, Johnny
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Camera and LCM IP-Cores for NIOS SOPC System2009In: 6th FPGAworld Conference, Academic Proceedings 2009, New York: ACM , 2009, p. 18-23Conference paper (Refereed)
    Abstract [en]

    This paper presents the development of IP-Cores to integrate the Terasic DC2 Camera and LCM (LCD Module) daughter boards into an Altera Nios System, so that the image can be further processed by embedded software or custom hardware instructions. Among other challenges overcome during this work are clock-domain crossing, synchronizing FIFO design, variable and pipelined burst control, multi-masters contention for system memory and image frame buffer switching. In addition, we designed software device drivers, and API functions intended for graphics, image processing and video control; which are part of the IP deliverables. In a brief, this work describes some concepts and methodologies involved in the creation of IP-Cores for an Altera SOPC; it also presents the results of the designed CAM-IP and LCM-IP Cores working in an application demo, which constitutes a real solution and a reference design.

  • 29.
    Navas, Byron
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Sander, Ingo
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Öberg, Johnny
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Reinforcement Learning Based Self-Optimization of Dynamic Fault-Tolerant Schemes in Performance-Aware RecoBlock SoCs2015Report (Other academic)
    Abstract [en]

    Partial and run-time reconfiguration (RTR) technology has increased the range of opportunities and applications in the design of systems-on-chip (SoCs) based on Field-Programmable Gate Arrays (FPGAs). Nevertheless, RTR adds another complexity to the design process, particularly when embedded FPGAs have to deal with power and performance constraints uncertain environments. Embedded systems will need to make autonomous decisions, develop cognitive properties such as self-awareness and finally become self-adaptive to be deployed in the real world. Classico-line modeling and programming methods are inadequate to cope with unpredictable environments. Reinforcement learning (RL) methods have been successfully explored to solve these complex optimization problems mainly in workstation computers, yet they are rarely implemented in embedded systems. Disruptive integration technologies reaching atomic-scales will increase the probability of fabrication errors and the sensitivity to electromagnetic radiation that can generate single-event upsets (SEUs) in the configuration memory of FPGAs. Dynamic FT schemes are promising RTR hardware redundancy structures that improve dependability, but on the other hand, they increase memory system traffic. This article presents an FPGA-based SoC that is self-aware of its monitored hardware and utilizes an online RL method to self-optimize the decisions that maintain the desired system performance, particularly when triggering hardware acceleration and dynamic FT schemes on RTR IP-cores. Moreover, this article describes the main features of the RecoBlock SoC concept, overviews the RL theory, shows the Q-learning algorithm adapted for the dynamic fault-tolerance optimization problem, and presents its simulation in Matlab. Based on this investigation, the Q-learning algorithm will be implemented and verified in the RecoBlock SoC platform.

  • 30.
    Navas, Byron
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Sander, Ingo
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Öberg, Johnny
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    The RecoBlock SoC Platform: A Flexible Array of Reusable Run-Time-Reconfigurable IP-Blocks2013In: Design, Automation & Test in Europe Conference & Exhibition (DATE), 2013, 2013, p. 833-838Conference paper (Refereed)
    Abstract [en]

    Run-time reconfigurable (RTR) FPGAs combine the flexibility of software with the high efficiency of hardware. Still, their potential cannot be fully exploited due to increased complexity of the design process. Consequently, to enable an efficient design flow, we devise a set of prerequisites to increase the flexibility and reusability of current FPGA-based RTR architectures. We apply these principles to design and implement the RecoBlock SoC platform, which main characterization is (1) a RTR plug-and-play IP-Core whose functionality is configured at run-time; (2) flexible inter-block communication configured via software, and (3) built-in buffers to support data-driven streams and inter-process communications. We illustrate the potential of our platform by a tutorial case study using an adaptive streaming application to investigate different combinations of reconfigurable arrays and schedules. The experiments underline the benefits of the platform and shows resource utilization.

  • 31.
    Navas, Byron
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Sander, Ingo
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Öberg, Johnny
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Towards cognitive reconfigurable hardware: Self-aware learning in RTR fault-tolerant SoCs2015In: Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC), 2015, Institute of Electrical and Electronics Engineers (IEEE), 2015, article id 7238103Conference paper (Refereed)
    Abstract [en]

    Traditional embedded systems are evolving into power-and-performance-domain self-aware intelligent systems in order to overcome complexity and uncertainty. Without human control, they need to keep operative states in applications such as drone-based delivery or robotic space landing. Nowadays, the partial and run-time reconfiguration (RTR) of FPGA-based Systems-on-chip (SoC) can enable dynamic hardware acceleration or self-healing structures, but this conversely increases system-memory traffic. This paper introduces the basis of cognitive reconfigurable hardware and presents the design of an FPGA-based RTR SoC that becomes conscious of its monitored hardware and learns to make decisions that maintain a desired system performance, particularly when triggering hardware acceleration and dynamic fault-tolerant (FT) schemes on RTR cores. Self-awareness is achieved by evaluating monitored metrics in critical AXI-cores, supported by hardware performance counters. We suggest a reinforcement-learning algorithm that helps the system to search out when and which reconfigurable FT-scheme can be triggered. Executing random sequences of an embedded benchmark suite simulates unpredictability and bus traffic. The evaluation shows the effectiveness and implications of our approach.

  • 32.
    Navas, Byron
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. ESPE Universidad de Las Fuerzas Armadas, Ecuador .
    Öberg, Johnny
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Sander, Ingo
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    On providing scalable self-healing adaptive fault-tolerance to RTR SoCs2014In: Proceedings of ReConFigurable Computing and FPGAs (ReConFig), 2014 International Conference on, 2014, p. 1-6Conference paper (Refereed)
    Abstract [en]

    The dependability of heterogeneous many-core FPGA based systems are threatened by higher failure rates caused by disruptive scales of integration, increased design complexity, and radiation sensitivity. Triple-modular redundancy (TMR) and run-time reconfiguration (RTR) are traditional fault-tolerant (FT) techniques used to increase dependability. However, hardware redundancy is expensive and most approaches have poor scalability, flexibility, and programmability. Therefore, innovative solutions are needed to reduce the redundancy cost but still preserve acceptable levels of dependability. In this context, this paper presents the implementation of a self-healing adaptive fault-tolerant SoC that reuses RTR IP-cores in order to self-assemble different TMR schemes during run-time. The presented system demonstrates the feasibility of the Upset-Fault-Observer concept, which provides a run-time self-test and recovery strategy that delivers fault-tolerance over functions accelerated in RTR cores, at the same time reducing the redundancy scalability cost by running periodic reconfigurable TMR scan-cycles. In addition, this paper experimentally evaluates the trade-off of the implemented reconfigurable TMR schemes by characterizing important fault tolerant metrics i.e., recovery time (self-repair and self-replicate), detection latency, self-assembly latency, throughput reduction, and increase of physical resources.

  • 33.
    Navas, Byron
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Öberg, Johnny
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Sander, Ingo
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    The Upset-Fault-Observer: A Concept for Self-healing Adaptive Fault Tolerance2014In: Proceedings of the 2014 NASA/ESA Conference on Adaptive Hardware and Systems, AHS 2014, IEEE Computer Society, 2014, p. 89-96Conference paper (Refereed)
    Abstract [en]

    Advancing integration reaching atomic-scales makes components highly defective and unstable during lifetime. This demands paradigm shifts in electronic systems design. FPGAs are particularly sensitive to cosmic and other kinds of radiations that produce single-event-upsets (SEU) in configuration and internal memories. Typical fault-tolerance (FT) techniques combine triple-modular-redundancy (TMR) schemes with run-time-reconfiguration (RTR). However, even the most successful approaches disregard the low suitability of fine-grain redundancy in nano-scale design, poor scalability and programmability of application specific architectures, small performance-consumption ratio of board-level designs, or scarce optimization capability of rigid redundancy structures. In that context, we introduce an innovative solution that exploits the flexibility, reusability, and scalability of a modular RTR SoC approach and reuse existing RTR IP-cores in order to assemble different TMR schemes during run-time. Thus, the system can adaptively trigger the adequate self-healing strategy according to execution environment metrics and user-defined goals. Specifically the paper presents: (a) the upset-fault-observer (UFO), an innovative run-time self-test and recovery strategy that delivers FT on request over several function cores but saves the redundancy scalability cost by running periodic reconfigurable TMR scan-cycles, (b) run-time reconfigurable TMR schemes and self-repair mechanisms, and (c) an adaptive software organization model to manage the proposed FT strategies.

  • 34.
    Navas, Byron
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Öberg, Johnny
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Sander, Ingo
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Towards the generic reconfigurable accelerator: Algorithm development, core design, and performance analysis2013Conference paper (Refereed)
    Abstract [en]

    Adoption of reconfigurable computing is limited in part by the lack of simplified, economic, and reusable solutions. The significant speedup and energy saving can increase performance but also design complexity; in particular for heterogeneous SoCs blending several CPUs, GPUs, and FPGA-Accelerator Cores. On the other hand, implementing complex algorithms in hardware requires modeling and verification, not only HDL generation. Most approaches are too specific without looking for reusability. Therefore, we present a solution based on: (1) a design methodology to develop algorithms accelerated in reconfigurable/non-reconfigurable IP-Cores, using common access tools, and contemplating verification from model to embedded software stages; (2) a generic accelerator core design that enables relocation and reuse almost independently of the algorithm, and data-flow driven execution models; and (3) a performance analysis of the acceleration mechanisms included in our system (i.e., accelerator core, burst I/O transfers, and reconfiguration pre-fetch). In consequence, the implemented system accelerates algorithms (e.g., FIR and Kalman filters) with speedups up to 3 orders of magnitude, compared to processor implementations.

  • 35.
    Ngo, Kalle
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronics.
    Mohammadat, Tage
    KTH, School of Information and Communication Technology (ICT), Electronics.
    Öberg, Johnny
    KTH, School of Information and Communication Technology (ICT), Electronics.
    Towards a Single Event Upset Detector Based on COTS FPGA2017In: 2017 IEEE NORDIC CIRCUITS AND SYSTEMS CONFERENCE (NORCAS): NORCHIP AND INTERNATIONAL SYMPOSIUM OF SYSTEM-ON-CHIP (SOC) / [ed] Nurmi, J Vesterbacka, M Wikner, JJ Alvandpour, A NielsenLonn, M Nielsen, IR, IEEE , 2017Conference paper (Refereed)
    Abstract [en]

    The Single Event Upset Detector (SEUD) is 3U CubeSat payload experiment that aims to achieve radiation tolerant computing through detection and correction of SEU bit flips on COTS SRAM FPGAs. Our proposed self-healing architecture applies selective TMR, internal configuration memory scrubbing, and partial reconfiguration and intends to demonstrate a cost-effective alternative to Space-grade radiation hardened SRAM FPGAs. This paper presents an overview of the ongoing development of the SEUD architecture and when complete, the SEUD will be tested on board the KTH MIST student CubeSat that is targeting to be launched in late 2020.

  • 36.
    Nilsson, Erland
    et al.
    KTH, Superseded Departments, Microelectronics and Information Technology, IMIT.
    Millberg, Mikael
    KTH, Superseded Departments, Microelectronics and Information Technology, IMIT.
    Öberg, Johnny
    KTH, Superseded Departments, Microelectronics and Information Technology, IMIT.
    Jantsch, Axel
    KTH, Superseded Departments, Microelectronics and Information Technology, IMIT.
    Load Distribution with the Proximity Congestion Awareness in a Network on Chip2003In: Design, Automation And Test In Europe Conference And Exhibition, Proceedings , LOS ALAMITOS, USA: IEEE COMPUTER SOC , 2003, p. 1126-1127Conference paper (Refereed)
    Abstract [en]

    In Networks on Chip, NoC, very low cost and high performance switches will be of critical importance. For a regular two-dimensional NoC we propose a very simple, memoryless switch. In case of congestion, packets are emitted in a non-ideal direction, also called deflective routing. To increase the maximum tolerable load of the network, we propose a Proximity Congestion Awareness, PCA, technique, where switches use load information of neighbouring switches, called stress values, for their own switching decisions, thus avoiding congested areas. We present simulation results with random traffic which show that the PCA technique can increase the maximum traffic load by a factor of over 20.

  • 37.
    Nilsson, Erland
    et al.
    KTH, School of Information and Communication Technology (ICT).
    Öberg, Johnny
    KTH, School of Information and Communication Technology (ICT).
    PANACEA- A case study on the PANACEA NoC- a Nostrum Network on Chip prototype2006Report (Other academic)
  • 38.
    Nilsson, Erland
    et al.
    KTH, Superseded Departments, Microelectronics and Information Technology, IMIT.
    Öberg, Johnny
    KTH, Superseded Departments, Microelectronics and Information Technology, IMIT.
    Reducing Power and Latency in 2-D Mesh NoCs using Globally Pseudochronous Locally Synchronous Clocking2004In:  International Conference On Hardware/Software Codesign And System Synthesis   , New York, USA: ASSOC COMPUTING MACHINERY , 2004, p. 176-181Conference paper (Refereed)
    Abstract [en]

    One of the main problems when designing large ASICs today is to distribute a low power synchronous clock over the whole chip and a lot of remedies to this problem has been proposed over the years. For Networks-on-Chip (NoC), where computational Resources are organised in a 2-D mesh connected together through Switches in an on-chip interconnection network, another possibility exists: Globally Pseudochronous Locally Synchronous clock distribution.

    In this paper, we present a clocking scheme for NoCs that we call Globally Pseudochronous Locally Synchronous, in which we distribute a clock with a constant phase difference between he switches. As a consequence of the phase difference, some paths along the NoC switch network become faster than the others. We call these paths Data Motorways. By adapting the switching policy in the switches to prefer data to use the motorways, we show that the latency within the network is reduced with up to 40% compared to a synchronous reference case.

    The phase difference between the resources also makes the circuit more tolerant to clock skew. It also distributes the current peaks more evenly across the clock period, which lead to a reduction in peak power, which in turn further reduces the clock skew and the jitter in the clock network.

  • 39.
    Nilsson, Erland
    et al.
    KTH, School of Information and Communication Technology (ICT), Microelectronics and Information Technology, IMIT.
    Öberg, Johnny
    KTH, School of Information and Communication Technology (ICT), Microelectronics and Information Technology, IMIT.
    Trading off Power versus Latency using GPLS Clocking in 2D-Mesh NoCs2005In: Isscs 2005: International Symposium On Signals, Circuits And Systems, Vols 1 And 2, Proceedings , New York, USA: IEEE , 2005, p. 51-54Conference paper (Refereed)
    Abstract [en]

    To handle the design complexity when the number of transistors on-chip reaches one billion, new ways of organizing chips will be needed. One solution to this problem is to organize computational resources in a grid, where all communication between the resources are performed using an interconnection network. These networks are commonly referred to as Networks-on-Chip, or NoCs.

    This paper focus on the trade-off between power and latency while keeping the required interconnection bandwidth constant. The clock frequency can be lowered to reduce the power, with reduced bandwidth as a consequence, which in a synchronous system will increase the latency linearly. In a 2D-Mesh NoC structure, it is possible to choose the regions with different clock phase and arrange them in such ways that the latency from sender to receiver along certain paths is nearly constant, and the total average latency is reduced with 50%. The reduction can then be exploited to trade off latency vs. power; the GPLS solution consumes 50% or the power compared to the fully synchronous solution, at the same latency and constant throughput.

  • 40.
    Pamunuwa, Dinesh
    et al.
    KTH, Superseded Departments, Electronic Systems Design.
    Öberg, Johnny
    KTH, Superseded Departments, Microelectronics and Information Technology, IMIT.
    Zheng, Li-Rong
    KTH, Superseded Departments, Electronic Systems Design.
    Millberg, Mikael
    KTH, Superseded Departments, Microelectronics and Information Technology, IMIT.
    Jantsch, Axel
    KTH, Superseded Departments, Electronic Systems Design.
    Tenhunen, Hannu
    KTH, Superseded Departments, Electronic Systems Design.
    A study on the implementation of 2-D mesh-based networks-on-chip in the nanometre regime2004In: Integration, ISSN 0167-9260, E-ISSN 1872-7522, Vol. 38, no 1, p. 3-17Article in journal (Refereed)
    Abstract [en]

    On-chip packet-switched networks have been proposed for future giga-scale integration in the nanometre regime. This paper examines likely architectures for such networks and considers trade-offs in the layout, performance, and power consumption based on full-swing, voltage-mode CMOS signalling. A study is carried out for a future technology with parameters as predicted by the International Technology Roadmap for Semiconductors to yield a quantitative comparison of the performance and power trade-off for the network. Important physical level issues are discussed.

  • 41.
    Petersen, Kim
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Öberg, Johnny
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Toward a scalable test methodology for 2D-mesh network-on-chips2007In: 2007 Design, Automation & Test In Europe Conference & Exhibition: Vols 1-3, 2007, p. 367-372Conference paper (Refereed)
    Abstract [en]

    This paper presents a BIST strategy for testing the NoC interconnect network, and investigates if the strategy is a suitable approach for the task. All switches and links in the NoC are tested with BIST running at full clock-speed, and in a functional-like mode. The BIST is carried out as a go/no-go BIST operation at start up, or on command It is shown that the proposed methodology can be applied for different implementations of deflecting switches, and that the test time is limited to a few thousand-clock cycles with fault coverage close to 100%.

  • 42.
    Petersén, Kim
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Magnhagen, Bengt
    Jönköpings Tekniska Högskola.
    Öberg, Johnny
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Towards an almost c-testable NoC test strategy2007In: Proceedings of the IEEE East-West Design and Test Symposium, 2007Conference paper (Refereed)
  • 43.
    Petersén, Kim
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Öberg, Johnny
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    An (almost) c-testable BIST strategy for NoCs2009In: Proceedings of Nordic Test Forum 2009, 2009Conference paper (Refereed)
  • 44.
    Petersén, Kim
    et al.
    HDC AB, Sweden.
    Öberg, Johnny
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Towards a test vector independent test response analyser for NoCs2007Conference paper (Refereed)
  • 45.
    Petersén, Kim
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Öberg, Johnny
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Utilizing NoC Switches as BIST Structures in 2D-Mesh Network-on-Chips2006In: Proceedings of NoC Workshop, DATE 2006, 2006Conference paper (Refereed)
    Abstract [en]

    ¨This poster proposes a test methodology and presents amethod for carrying out an automatic go/no-go BIST operationat start up of a 2D-mesh NoC Network. It executes infunctional mode at full clock speed. Only minor area penaltyis introduced in the NoC-network itself; the BIST is placed inthe Network-Interface inside the computational resources.

  • 46.
    Robino, Francesco
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Öberg, Johnny
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    From Simulink to NoC-based MPSoC on FPGA2014In: Design, Automation and Test in Europe Conference and Exhibition (DATE), 2014, IEEE , 2014Conference paper (Refereed)
    Abstract [en]

    Network-on-chip (NoC) based multi-processor systems are promising candidates for future embedded system platforms. However, because of their complexity, new high level modeling techniques are needed to design, simulate and synthesize embedded systems targeting NoC-based MPSoC. Simulink is a popular modeling environment suitable to model at system level. However, there is no clear standard to synthesize Simulink models into SW and HW towards a NoC-based MPSoC implementation. In addition, many of the proposed solutions require large overhead in terms of SW components and memory requirements, resulting in complex and customized multi-processor platforms. In this paper we present a novel design flow to synthesize Simulink models onto a NoC-based MPSoC running on low-cost FPGAs. Our design flow constrains the MPSoC and the Simulink model to share a common semantics domain. This permits to reduce the need of resource consuming SW components, reducing the memory requirements on the platform. At the same time, performances (throughput) of dataflow applications can increase when the number of processors of the target platform is increased. This is shown through a case study on FPGA.

  • 47.
    Robino, Francesco
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Öberg, Johnny
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    The HeartBeat model: A platform abstraction enabling fast prototyping of real-time applications on NoC-based MPSoC on FPGA2013In: 2013 8th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip, ReCoSoC 2013, IEEE , 2013, p. 6581536-Conference paper (Refereed)
    Abstract [en]

    Future embedded systems will make use of many hundred, configurable or re-configurable, processing elements communicating through a network on chip (NoC), but there is lack of rapid automated design flows bridging the abstraction gap between the models of such systems and their implementation.

  • 48.
    Rosvall, Kathrin
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS).
    Mohammadat, Tage
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics.
    Ungureanu, George
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics.
    Öberg, Johnny
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics, Electronic and embedded systems.
    Sander, Ingo
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics.
    Exploring Power and Throughput for Dataflow Applications on Predictable NoC Multiprocessors2018Conference paper (Refereed)
    Abstract [en]

    System level optimization for multiple mixed-criticality applications on shared networked multiprocessor platforms is extremely challenging. Substantial complexity arises from the interdependence between the multiple subproblems of mapping, scheduling and platform configuration under the consideration of several, potentially orthogonal, performance metrics and constraints. Instead of using heuristic algorithms and problem decomposition, novel unified design space exploration (DSE) approaches based on Constraint Programming (CP) have in the recent years shown promising results. The work in this paper takes advantage of the modularity of CP models, in order to support heterogeneous multiprocessor Network-on-Chip (NoC) with Temporally Disjoint Networks (TDNs) aware message injection. The DSE supports a range of design criteria, in particular the optimization and satisfaction of power and throughput. In addition, the DSE now provides a valid configuration for the TDNs that guarantees the performance required to fulfil the design goals. The experiments show the capability of the approach to find low-power and high-throughput designs, and validate a resulting design on a physical TDN-based NoC implementation.

  • 49.
    Seyyedi, Razi
    et al.
    OFFIS Inst Informat Technol, Oldenburg, Germany..
    Mohammadat, M. T.
    KTH.
    Fakih, Maher
    OFFIS Inst Informat Technol, Oldenburg, Germany..
    Gruettner, Kim
    OFFIS Inst Informat Technol, Oldenburg, Germany..
    Öberg, Johnny
    KTH.
    Graham, Duncan
    Imperas, London, England..
    Towards Virtual Prototyping of Synchronous Real-time Systems on NoC-based MPSoCs2017In: 2017 12TH IEEE INTERNATIONAL SYMPOSIUM ON INDUSTRIAL EMBEDDED SYSTEMS (SIES), IEEE , 2017, p. 99-102Conference paper (Refereed)
    Abstract [en]

    NoC-based designs provide a scalable and flexible communication solution for the rising number of processing cores on a single chip. To master the complexity of the software design in such a NoC-based multi-core architecture, advanced incremental integration testing solutions are required. This work presents a virtual platform based software testing and debugging approach for a synchronous application model on a 2 x 2 NoC-based MPSoC. We propose a development approach and a test environment that exploits the time approximation within Imperas OVP instruction accurate simulator and a functional model of the Nostrum NoC, for both software instructions and hardware clock cycles at larger time stamps called Quantum that does not sacrifice functional correctness. The functional testing environment runs the target software without running it on the real hardware platform. With the help of Nostrum NoC we can support a synchronous system execution that is reasonably fast and precise with respect to a global synchronization signal, called HeartBeat. As work in progress, this work also discusses several possible timing refinement and their possible implication on the simulation semantics and performance and how it is tackled in the future work.

  • 50.
    Svantesson, Bengt
    et al.
    KTH, Superseded Departments, Electronic Systems Design.
    Hemani, Ahmed
    KTH, Superseded Departments, Electronic Systems Design.
    Ellervee, Peeter
    KTH, Superseded Departments, Electronic Systems Design.
    Postula, Adam
    Department of CSEE, University of Queensland.
    Öberg, Johnny
    KTH, Superseded Departments, Electronic Systems Design.
    Jantsch, Axel
    KTH, Superseded Departments, Electronic Systems Design.
    Tenhunen, Hannu
    KTH, Superseded Departments, Electronic Systems Design.
    A Novell Allocation Strategy for Control and Memory Intensive Telecommunication Circiuts1996In: : VLSI in Mobile Communication, 1996, p. 23-28Conference paper (Refereed)
    Abstract [en]

    Communication sub-systems that deal with switching, routing and protocol implementation often have their functionality dominated by control logic and interaction with memory. Synthesis of such Control and Memory Intensive Systems (hereafter abbreviated to CMISTs) poses demands that in the past have not been met satisfactorily by general purpose high-level synthesis (HLS) tools and have led to several research efforts to address these demands. In this paper we: Characterise CMISTs from the synthesis viewpoint; Contend that the synthesis demands of CMISTs can be met within the framework of a general purpose High-level synthesis tool, by making parts of it adaptive to the input, rather than develop a complete tool for a particular type of application; Present an allocation strategy that automatically adapts for CMISTs; Present the Operation and Maintenance (OAM) Protocol of the ATM, its modelling in VHDL and synthesis aspects of the VHDL model; Present the results of applying the synthesis methodology to the OAM as a test case. The results are compared with the result from two commercial High-level synthesis tool; Prove the efficacy of the proposed synthesis methodology by applying it to an industrial design and comparing our obtained by designing manually at register-transfer level; The results is also compared with the results from two commercial HLS tools.

12 1 - 50 of 72
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf