Change search
Refine search result
45678910 301 - 350 of 633
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 301.
    Krenz, R.
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Dubrova, Elena
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Formal Verification Using Probabilistic Techniques2001In: Proceedings of NORCHIP’01, 2001, p. 258-264Conference paper (Refereed)
  • 302. Kumar, Shashi
    et al.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Ellervee, Peeter
    Hemani, Ahmed
    Kumar, Anshul
    Internal Representation for Specification and Design of Heterogenous Systems1997In: Third Workshop on Systems Design Languages, Italy, 1997Conference paper (Refereed)
  • 303. Laaksolahti, J.
    et al.
    Tholander, J.
    Lundén, M.
    Solsona Belenguer, Jordi
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Karlsson, A.
    Jaensson, T.
    The lega: A device for leaving and finding tactile traces2011In: Proceedings of the 5th International Conference on Tangible Embedded and Embodied Interaction, 2011, p. 193-196Conference paper (Refereed)
    Abstract [en]

    This paper describes experiences from development and deployment of the Lega, a hand held device for physical sharing of experiences during an art exhibition. Touching and moving the device in different ways creates a tactile trace that can be experienced by others through their own device. The system was successfully deployed at an art exhibition for two months where user studies were performed. Here we present some general observations regarding the systems performance and discuss issues that we encountered.

  • 304.
    Lansner, Anders
    et al.
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Farahini, Nasim
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Spiking brain models: Computation, memory and communication constraints for custom hardware implementation2014In: 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC), IEEE , 2014, p. 556-562Conference paper (Refereed)
    Abstract [en]

    We estimate the computational capacity required to simulate in real time the neural information processing in the human brain. We show that the computational demands of a detailed implementation are beyond reach of current technology, but that some biologically plausible reductions of problem complexity can give performance gains between two and six orders of magnitude, which put implementations within reach of tomorrow's technology.

  • 305. Latif, K.
    et al.
    Seceleanu, T.
    Seceleanu, C.
    Tenhunen, Hannu
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Resource-aware task allocation and scheduling for SegBus platform2010In: 2010 IEEE International Conference on Electronics, Circuits, and Systems, ICECS 2010 - Proceedings, 2010, p. 523-526Conference paper (Refereed)
    Abstract [en]

    In this work, we propose an integrated task allocation and scheduling mechanism to minimize the resource contention and the processing latency for application running on the SegBus platform. The transactions are classified as local and cross border SPLIT transactions. The hybrid scheduling approach implemented by hierarchal arbiter code structure shows significant improvement in system performance. The interrupt scheduling has been implemented to further enhance system performance. A H.264 video encoder application has been used to verify the proposed technique, showing a large improvement in system throughput.

  • 306. Latif, K.
    et al.
    Seceleanu, T.
    Tenhunen, Hannu
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Power and Area Efficient Design of Network-on-Chip Router through Utilization of Idle Buffers2010In: Proceedings of the 17th IEEE International Conference and Workshops on the Engineering of Computer-Based Systems, ECBS 2010, IEEE , 2010, p. 131-138Conference paper (Refereed)
    Abstract [en]

    Network-on-Chip (NoC) is the interconnection platform that answers the requirements of the modern on-Chip design. Small optimizations in NoC router architecture can show a significant improvement in the overall performance of NoC based systems. Power consumption, area overhead and the entire NoC performance is influenced by the router buffers. Resource sharing for on-chip network is critical to reduce the chip area and power consumption. Virtual channel buffer sharing by other router ports has been proposed to enhance the performance of on-chip communication. We approach the router architecture optimization by utilizing the idle buffers instead of increasing the number and size of buffers for desired throughput.

  • 307. Latif, K.
    et al.
    Tenhunen, Hannu
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Seceleanu, T.
    Application specific IP placement for on-chip distributed architectures2009In: 2009 NORCHIP, 2009, p. 1-4Conference paper (Refereed)
    Abstract [en]

    In this paper we approach the performance aspects of MPSoC platforms, from the point of view of IP placement with the focus on Network-on-Chip(NoC). Proper IP placement is important for several time-dependent applications such as video and voice where traffic must be delivered on time in order to operate properly. Proper placement of IPs can lower the traffic congestion, improve overall execution time and power consumption. We have suggested a new criteria for the prioritization of IPs regarding placement. Based on that criteria, we implemented an algorithm for IP placement.The running example is represented by mapping of H.264 encoder application on a NoC mesh. Allocation of processing elements on the platform, topology and communication mechanism are the main topics described here.

  • 308. Latif, K.
    et al.
    Tenhunen, Hannu
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Seceleanu, T.
    Multicast protocol for SegBus platform2009In: 2009 NORCHIP, 2009, p. 1-6Conference paper (Refereed)
    Abstract [en]

    The task is to analyze, how different services can be designed for the SegBus multiprocessor platform and observe the improvement in system performance. In this paper, we utilize the concept of broadcasting and multicasting service from standard data bus for multiprocessor systems to enhance the performance of SegBus platform. The running example is represented by the H.264 encoder. The SegBus platform architecture, the communication mechanism, the arbitration scheme, the allocation of processing elements on the platform, and the broadcasting services and their implementation are the main topics analyzed here.

  • 309.
    Latif, Khalid
    et al.
    Turku Centre for Computer Science (TUCS).
    Rahmani, Amir-Mohammad
    Turku Centre for Computer Science (TUCS).
    Vaddina, Kameswar Rao
    Turku Centre for Computer Science (TUCS).
    Seceleanu, Tiberiu
    Turku Centre for Computer Science (TUCS).
    Tenhunen, Hannu
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK.
    Processing Element Core Protection Using PVS-NoC Architecture2012In: Work in Progress Session of the Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (WiP-PDP'12), 2012, p. 21-22Conference paper (Refereed)
  • 310. Lazraq, T.
    et al.
    Svantesson, B.
    Jantsch, A.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hemani, A.
    Modelling of Operation and Maintance Functions in the ATM Network1995In: Proc. of the 9th European Simulation Multiconference, 1995Conference paper (Refereed)
  • 311.
    Leung, Simon
    et al.
    Department of CSEE, University of Queensland.
    Postula, Adam
    Department of CSEE, University of Queensland.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Development of programmable architecture for base-band processing2000In: Euromicro Conference, 2000. Proceedings of the 26th, 2000, Vol. 1, p. 362-367Conference paper (Refereed)
    Abstract [en]

    Field Programmable Gate Arrays (FPGAs) are the most flexible solutions for reconfigurable platforms. However, they do not always deliver the required performance, e.g. in base-band processing, where the trend is to use DSP processors for flexibility and very specialised ASICs for performance. We propose a reconfigurable architecture based on parameterised functional blocks that correspond to core functionality of the base-band systems. This architecture completes with DSPs and ASICs both in flexibility and performance. Our methodology starts with a systematic analysis of various base-bands standards. We derived functional specification of the required blocks and identified the data flows to be reflected in the interconnect fabric. The parameterised blocks are of higher complexity and performance than the cells in FPGAs. Based on our analysis, we can take advantage of the sequential data flow in the base-band applications and provide dedicated circuitry to support signal multiplexing between blocks to fully utilise the bandwidth on the data path, conserve chip area and simplify inter-chip connections with route-through resources

  • 312.
    Li, Jiantong
    et al.
    KTH, School of Information and Communication Technology (ICT), Integrated Devices and Circuits.
    Unander, Tomas
    López Cabezas, Ana
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK.
    Shao, Botao
    KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK. KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Liu, Zhiying
    KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK. Uppsala University, Sweden.
    Feng, Yi
    KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK. KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Forsberg, Esteban Bernales
    KTH, School of Information and Communication Technology (ICT), Integrated Devices and Circuits.
    Zhang, Zhibin
    KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK.
    Jögi, Indrek
    Gao, Xindong
    Boman, Mats
    Zheng, Li-Rong
    KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK. KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Östling, Mikael
    KTH, School of Information and Communication Technology (ICT), Integrated Devices and Circuits.
    Nilsson, Hans-Erik
    Zhang, Shi-Li
    KTH, School of Information and Communication Technology (ICT). Uppsala University, Sweden.
    Ink-jet printed thin-film transistors with carbon nanotube channels shaped in long strips2011In: Journal of Applied Physics, ISSN 0021-8979, E-ISSN 1089-7550, Vol. 109, no 8, article id 084915Article in journal (Refereed)
    Abstract [en]

    The present work reports on the development of a class of sophisticated thin-film transistors (TFTs) based on ink-jet printing of pristine single-walled carbon nanotubes (SWCNTs) for the channel formation. The transistors are manufactured on oxidized silicon wafer and flexible plastic substrates at ambient conditions. For this purpose, ink-jet printing techniques are developed aiming at high-throughput production of SWCNT thin-film channels shaped in long strips. Stable SWCNT inks with proper fluidic characteristics are formulated by polymer addition. The present work unveils, through Monte Carlo simulation and in the light of heterogeneous percolation, the underlying physics of the superiority of long-strip channels for SWCNT TFTs. It further predicts the compatibility of such a channel structure with ink-jet printing taking into account the minimum dimensions achievable by commercially available printers. The printed devices exhibit improved electrical performance and scalability, compared to previously reported ink-jet printed SWCNT TFTs. The present work demonstrates that ink-jet printed SWCNT TFTs of long-strip channels are promising building blocks for flexible electronics.

  • 313.
    Li, Molan
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Xu, Shaohui
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Chen, Qiang
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Zheng, Li-Rong
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Thermoelectric-Generator-Based DC-DC Conversion Networks for Automotive Applications2011In: Journal of Electronic Materials, ISSN 0361-5235, E-ISSN 1543-186X, Vol. 40, no 5, p. 1136-1143Article in journal (Refereed)
    Abstract [en]

    Maximizing electrical energy generation through waste heat recovery is one of the modern research questions within automotive applications of thermoelectric (TE) technologies. This paper proposes a novel concept of distributed multisection multilevel DC-DC conversion networks based on thermoelectric generators (TEGs) for automotive applications. The concept incorporates a bottom-up design approach to collect, convert, and manage vehicle waste heat efficiently. Several state-of-the-art thermoelectric materials are analyzed for the purpose of power generation at each waste heat harvesting location on a vehicle. Optimal materials and TE couple configurations are suggested. Moreover, a comparison of prevailing DC-DC conversion techniques was made with respect to applications at each conversion level within the network. Furthermore, higher-level design considerations are discussed according to system specifications. Finally, a case study is performed to compare the performance of the proposed network and a traditional single-stage system. The results show that the proposed network enhances the system conversion efficiency by up to 400%.

  • 314.
    Li, Nan
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Improvements in High-Coverage and Low-Power LBIST2015Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Testing cost is one of the major contributors to the manufacturing cost of integrated circuits. Logic Built-In Self Test (LBIST) offers test cost reduction in terms of using smaller and cheaper ATE, test data volume reduction due to on-chip test pattern generation, test time reduction due to at-speed test pattern application. However, it is difficult to reach a sufficient test coverage with affordable area overhead using LBIST. Also, excessive power dissipation during test due to the random nature of LBIST patterns causes yield-decreasing problems such as IR-drop and overheating.

    In this dissertation, we present techniques and algorithms addressing these problems.

    In order to increase test coverage of LBIST, we propose to use on-chip circuitry to store and generate the "top-off" deterministic test patterns. First, we study the synthesis of Registers with Non-Linear Update (RNLUs) as on-chip sequence generators. We present algorithms constructing RNLUs which generate completely and incompletely specified sequences. Then, we evaluate the effectiveness of RNLUs generating deterministic test patterns on-chip. Our experimental results show that we are able to achieve higher test coverage with less area overhead compared to test point insertion. Finally, we investigate the possibilities of integrating the presented on-chip deterministic test pattern generator with existing Design-For-Testability (DFT) techniques with a case study.

    The problem of excessive test power dissipation is addressed with a scan partitioning algorithm which reduces capture power for delay-fault LBIST. The traditional S-graph model for scan partitioning does not quantify the dependency between scan cells. We present an algorithm using a novel weighted S-graph model in which the weights are scan cell dependencies determined by signal probability analysis. Our experimental results show that, on average, the presented method reduces average capture power by 50% and peak capture power by 39% with less than 2% drop in the transition fault coverage. By comparing the proposed algorithm to the original scan partitioning, we show that the proposed method is able to achieve higher capture power reduction with less fault coverage drop.

  • 315.
    Li, Nan
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Dubrova, Elena
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    AIG Rewriting Using 5-Input Cuts2011In: Computer Design (ICCD), 2011 IEEE 29th International Conference on, IEEE conference proceedings, 2011, p. 429-430Conference paper (Refereed)
    Abstract [en]

    Rewriting is a common approach to logic optimization based on local transformations. Most commercially availablelogic synthesis tools include a rewriting engine that may be usedmultiple times on the same netlist during optimization. This paperpresents an And-Inverter graph (AIG) based rewriting algorithmusing 5-input cuts. The best circuits are pre-computed for a subsetof NPN classes of 5-variable functions. Cut enumeration andBoolean matching are used to identify replacement candidates.The presented approach is expected to complement existingrewriting approaches which are usually based on 4-input cuts.The experimental results show that, by adding the new rewritingalgorithm to ABC synthesis tool, we can further reduce the areaof heavily optimized large circuits by 5.57% on average.

  • 316.
    Li, Nan
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Dubrova, Elena
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    An Algorithm for Constructing a Minimal Register with Non-Linear Update Generating a Given Sequence2014In: Proceedings of 2014 IEEE 44th International Symposium on Multiple-Valued Logic (ISMVL), 2014, p. 254-259Conference paper (Refereed)
    Abstract [en]

    Registers with Non-Linear Update (RNLUs) are a generalization of Non-Linear Feedback Shift Registers (NLFSRs) in which both, feedback and feedforward, connections are allowed and no chain connection between the stages is required. An RNLU can be used to generate any given 2p-ary sequence, p ≥ 1. In this paper, a new algorithm for constructing RNLUs is presented. Expected size of RNLUs constructed by the presented algorithm is proved to be asymptotically smaller than the expected size of RNLUs constructed by previous algorithms generating the same sequence. The presented algorithm can potentially be useful for applications such as testing, wireless communications, and cryptography.

  • 317.
    Li, Nan
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Dubrova, Elena
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Area-efficient high-coverage LBIST2014In: Microprocessors and microsystems, ISSN 0141-9331, E-ISSN 1872-9436, Vol. 38, no 5, p. 368-374Article in journal (Refereed)
    Abstract [en]

    Logic Built-In Self Test (LBIST) is a popular technique for applications requiring in-field testing of digital circuits. LBIST incorporates test generation and response-capture on-chip. It requires no interaction with a large, expensive tester. LBIST offers test time reduction due to at-speed test pattern application, makes possible test data re-usability at many levels, and enables test-ready IP. However, the traditional pseudo-random pattern-based LBIST often has a low test coverage. This paper presents a new method for on-chip generation of deterministic test patterns based on registers with non-linear update. Our experimental results on 7 real designs show that the presented approach can achieve a higher stuck-at coverage than the test point insertion with less area overhead. We also show that registers with non-linear update are asymptotically smaller than memories required to store the same test patterns in a compressed form.

  • 318.
    Li, Nan
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Dubrova, Elena
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    On-Chip Area-Efficient Binary Sequence Storage2013In: Proceedings of the ACM Great Lakes Symposium on VLSI, GLSVLSI, 2013, p. 325-326Conference paper (Refereed)
    Abstract [en]

    On-chip storage of binary sequences normally require the use ofRead-Only Memories (ROMs). However, ROMs do not exploit ofthe fact that the stored information is accessed sequentially. Thispaper presents an area-efficient sequence storage technique basedon state machines. Experimental results show that the presentedmethod significantly outperforms previous approaches. The resultingstate machines are on average 54% smaller than ROMs storingthe same sequence.

  • 319.
    Li, Nan
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Dubrova, Elena
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Synthesis of Power- and Area-Efficient Binary Machines for Incompletely Specified Sequences2014In: Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC, 2014, p. 634-639Conference paper (Refereed)
    Abstract [en]

    Binary Machines (BMs) are a generalization of Linear Feedback Shift Registers (LFSRs) in which a current state is a nonlinear function of the previous state. It is known how to construct a BM generating a given completely specified binary sequence. In this paper, we present an algorithm which can efficiently handle the case of incompletely specified sequences. Our experimental results show that it significantly outperforms the approaches based on all-0 or random fill in both area and power dissipation. On average, it reduces dynamic power dissipation twice compared to all-0 fill approach and 6 times compared to random fill approach. The presented algorithm can potentially be useful for many applications, including Logic Built-In Self Test (LBIST).

  • 320.
    Li, Nan
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Dubrova, Elena
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Carlsson, Gunnar
    Development Unit Radio, Ericsson AB, Sweden.
    A Scan Partitioning Algorithm for Reducing Capture Power of Delay-Fault LBIST2015In: Proceedings of Design, Automation and Test in Europe Conference and Exhibition (DATE), 2015, 2015, p. 842-847Conference paper (Refereed)
  • 321.
    Li, Nan
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Dubrova, Elena
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Carlsson, Gunnar
    Development Unit Radio, Ericsson AB, Sweden.
    Evaluation of Alternative LBIST Flows: A Case Study2014In: Proceedings of 32nd Nordic Microelectronics Conference (NORCHIP'2014), 2014Conference paper (Refereed)
    Abstract [en]

    The cost of manufacturing test has been growing dramatically over the years. The traditional pseudo-random pattern based Logic Built-in Self Test (LBIST) can potentially reduce the test cost by minimizing the need for the automatic test equipment. However, LBIST test coverage can be unaccept-ably low for some designs. Various methods for complementing pseudo-random patterns to increase test coverage exist, but the combined effect of these methods has not been studied. In this paper, we evaluate the effectiveness of alternative LBIST flows by a case study on a real industrial design. Our results can guide the selection of the best LBIST flow for a given set of design constraints such as test coverage, area overhead, and test time.

  • 322.
    Li, Nan
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Sharif Mansouri, Shohreh
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Dubrova, Elena
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Secure Key Storage Using State Machines2013In: 2013 IEEE 43rd International Symposium On Multiple-Valued Logic (ISMVL 2013), IEEE Computer Society, 2013, p. 290-295Conference paper (Refereed)
    Abstract [en]

    In hardware implementations of cryptographic systems, secret keys are commonly stored in an on-chip memory. This makes them prone to physical attacks, since the location of a memory on a chip in usually easy to spot. We propose to encode secret keys using a state machine which can be concealed in the rest of the logic on a chip. We present an heuristic algorithm which constructs a minimal state machine for a given set of secret keys. We show that, by using m-ary encoding, we are able to construct state machines which are smaller than the ones constructed using binary encoding. The presented algorithm is feasible for storing up to 1Mbits of random data.

  • 323.
    Li, Shuo
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Chen, Guo
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    A code reuse method for many-core coarse-grained reconfigurable architecture function library development2011In: 2011 International Symposium on Integrated Circuits, ISIC 2011, 2011, p. 512-515Conference paper (Refereed)
    Abstract [en]

    In this paper 1, a code reuse method is proposed to enhance the efficiency of the function library development of many core coarse-grained reconfigurable architecture. The method focuses on developing and using the precompiled ReCon-figurable Functions (RCFs) in the function library. By applying this method on the RCF development, functions are objectified like classes in any objective-oriented programming language. Using a function is to instantiate a selected RCF. Similar functions can be instantiated from the same RCF. Thus, the total number of RCFs to be compiled is reduced and the global programming efficiency is increased and the labor requirement for application development is reduced.

  • 324.
    Li, Shuo
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Farahini, Nasim
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Global control and storage synthesis for a system level synthesis approach2013In: Proceedings - 21st Annual International IEEE Symposium on Field-Programmable Custom Computing Machines, FCCM 2013, IEEE , 2013, p. 6546036-Conference paper (Refereed)
    Abstract [en]

    SYLVA is a System Level Architectural Synthesis Framework that translates Synchronous Data Flow (SDF) models of DSP sub-systems like modems and codecs into hardware implementation in ASIC/Standard Cells, FPGAs or CGRAs (Coarse Grain Reconfigurable Fabric).

  • 325.
    Li, Shuo
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Farahini, Nasim
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Rosvall, Kathrin
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Sander, Ingo
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    System level synthesis of hardware for DSP applications using pre-characterized function implementations2013In: 2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), IEEE , 2013Conference paper (Refereed)
    Abstract [en]

    SYLVA is a system level synthesis framework that transforms DSP sub-systems modeled as synchronous data flow into hardware implementations in ASIC, FPGAs or CGRAs. SYLVA synthesizes in terms of pre-characterized function implementations (FTMPs). It explores the design space in three dimensions, number of FTMPs, type of FTMPs and pipeline parallelism between the producing and consuming FTMPs. We introduce timing and interface model of FTMPs to enable reuse and automatic generation of Global Interconnect and Control (GLIC) to glue the FTMPs together into a working system. SYLVA has been evaluated by applying it to five realistic DSP applications and results analyzed for design space exploration, efficacy in generating GLIC by comparing to manually generated GLIC and accuracy of design space exploration by comparing the area and energy costs considered during the design space exploration based on pre-characterized FIMPs and the final results.

  • 326.
    Li, Shuo
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Case Study: Constraint Programming in a System Level Synthesis Framework2014In: PRINCIPLES AND PRACTICE OF CONSTRAINT PROGRAMMING, CP 2014, 2014, p. 846-861Conference paper (Refereed)
    Abstract [en]

    This article presents a case study of using a constraint programming solver in a system level synthesis framework called SYLVA. The solver is used to find the repetition vector of a synchronous data flow graph and serving as the design space exploration engine, which rapidly finds qualified system implementations by solving a constraint satisfaction optimization problem. Each system implementation is a combination of a number of function implementation instances and their cycle accurate execution schedules. The problem to be solved is automatically generated based on the user inputs: 1) a system model to be synthesized, 2) a library containing all the usable function implementations, 3) the performance/cost constraints, and 4) the optimization objectives. Use of constraints programming technique enabled a low cost development of design space exploration engine in addition to gaining ease of use.

  • 327.
    Li, Shuo
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Global interconnect and control synthesis in system level architectural synthesis framework2013In: Proceedings - 16th Euromicro Conference on Digital System Design, DSD 2013, New York: IEEE , 2013, p. 11-17Conference paper (Refereed)
    Abstract [en]

    In this paper, we describe the procedure of the Global Interconnect and Control (GLIC) synthesis step in a system level synthesis framework to automatically generate GLIC logics from a scheduled SDF. The generated GLIC logics consist of control FSMs, interconnect and data buffers to glue existing function implementations to construct the system, which is modeled by the scheduled SDF. The experimental result shows that GLIC synthesis is able to generate compact (5.7%, 0.6% and 0.9% of area usage for three examples implemented in 65nm ASIC) control, interconnect and data buffers while saving huge amount of manual effort and time (0.5s, 2.4s and 4.3s run time on a 2.8GHz x86 microprocessor for the three examples).

  • 328.
    Li, Shuo
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Memory allocation and optimization in system-level architectural synthesis2013In: 2013 8th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip, ReCoSoC 2013, New York: IEEE , 2013, p. 6581537-Conference paper (Refereed)
    Abstract [en]

    In this paper, we present a novel approach to optimally allocate memory resources in a system-level synthesis flow, which converts a dataflow style system description (synchronous data flow) into the register-transfer level description in the specified implementation style (ASIC, FPGA or CGRA). The first problem is encountered by the synthesis flow is that since it covers different implementation styles, a generic model is required to support resource allocation and optimization. The second problem is the memory allocation method to optimally allocate memory resources in the RTL model. The contribution of this paper has two parts, which are 1) a generic memory model for different memory architectures in ASIC, FPGA and CGRA, and 2) a memory allocation and optimization method for optimally allocating storage elements in the intermediate representation with actual implementations (e.g. on-chip SRAM for ASIC, memory controller and off-chip SDRAM for FPGA). The memory allocation method is an implementation style dependent procedure and has three steps: architecture independent optimization, resource allocation and architecture depended optimization. The experimental result shows that the proposed method is efficient and effective. The automatically generated implementation uses only approximately 4% more resources compared to manual implementation. The fast and automatic memory allocation method enables fast design space exploration that requires little effort form the system designer.

  • 329.
    Li, Shuo
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jafari, Fahimeh
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Kumar, Shashi
    Department of Electronics and Computer Engineering, School of Engineering, Jönköping University.
    Layered Spiral Algorithm for memory-aware mapping and scheduling on Network-on-Chip2010In: 28th Norchip Conference, NORCHIP 2010, 2010Conference paper (Refereed)
    Abstract [en]

    In this paper, Layered Spiral Algorithm (LSA) is proposed for memory-aware application mapping and scheduling onto Network-on-Chip (NoC) based Multi-Processor System-on-Chip (MPSoC). The energy consumption is optimized while keeping high task level parallelism. The experimental evaluation indicates that if memory-awareness is not considered during mapping and scheduling, memory overflows may occur. The underlying problem is also modeled as a Mixed Integer Linear Programming (MILP) problem and solved using an efficient branch-and-bound algorithm to compare optimal solutions with results achieved by LSA. Comparing to MILP solutions, the LSA results demonstrate only about 20% and 12% increase of total communication cost in case of a small and middle size synthetic problem, respectively, while it is order of magnitude faster than the MILP solutions. Therefore, the LSA can find acceptable total communication cost with a low run-time complexity, enabling quick exploration of large design spaces, which is infeasible for exhaustive search.

  • 330.
    Li, Shuo
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Malik, Jamshaid Sarwar
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Liu, Shaoteng
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    A code generation method for system-level synthesis on ASIC, FPGA and manycore CGRA2013In: MES '13 Proceedings of the First International Workshop on Many-core Embedded Systems, ACM , 2013, p. 25-32Conference paper (Refereed)
    Abstract [en]

    This paper presents a code generation method that translates an intermediate Register-Transfer Level (RTL) model of a system into its corresponding VHDL code for ASIC and FPGAs and MATLAB functions for manycores CGRAs. The intermediate representation consists of Function Implementation (FIMPs) and the glue logic. FIMPs are VHDL design units for the ASIC and FPGA implementation styles and MATLAB function templates for the CGRA implementation style, while the glue logic is a compact data structure storing Global Interconnect and Control (GLIC) information. The automatically generated implementation codes increase the resource usage by 1.5% on the average while reducing total design effort by two orders of magnitudes.

  • 331.
    Li, Shuo
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Malik, Omer
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Automatic test program generation framework for NoC-based MPSoC compiler validation2011In: 2011 International Conference on Instrumentation, Measurement, Circuits and Systems (ICIMCS 2011), vol 1: Instrumentation, Measurement, Circuits and Systems, New York: Amer Soc Mechanical Engineers , 2011, p. 99-103Conference paper (Refereed)
    Abstract [en]

    In this paper, we propose a systematic method (a framework) for automatic test program generation for Network-on-Chip (NoC) based Multi-Processor System-on-Chip (MPSoC) compiler validation. This framework consists of three parts: specification reader, program generator and platform simulator. By applying this framework, specified test programs for compiler validation are automatically generated as well as their corresponding run time results. The validation productivity is enhanced and the expertise requirement is reduced. We also present an example tool called Automatic VESYLA Generator (AVG) implementing this framework. This tool is used in the Dynamic Reconfigurable Resource Array (DRRA) assembler development in our research group. The experiment shows that on a personal PC, AVG tool generates bug-free test programs more than 100 times faster than a human programmer.

  • 332. Li, Xiaopeng
    et al.
    Jonsson, Fredrik
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Olsson, Håkan
    Ismail, Mohammed
    A High-Speed Low-Power architecture for GHz CMOS Dual-Modulus Prescaler2000In: Proc. International Analog VLSI Workshop, IEEJ, 2000, p. 6-9Conference paper (Refereed)
  • 333. Li, Yang
    et al.
    Chen, Xiaowen
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. College of Computer, National University of Defense Technology, China .
    Zhao, Xiaohui
    Yang, Yong
    Liu, Hengzhu
    Round-trip latency prediction for memory access fairness in mesh-based many-core architectures2014In: IEICE Electronics Express, ISSN 1349-2543, E-ISSN 1349-2543, Vol. 11, no 24, p. 20141027-Article in journal (Refereed)
    Abstract [en]

    In mesh-based many-core architectures, processor cores and memories reside in different locations (center, corner, edge, etc.), therefore memory accesses behave differently due to their different communication distances. The latency difference leads to unfair memory access and some memory accesses with very high latencies, degrading the system performance. However, improving one memory access's latency can worsen the latency of another since memory accesses contend in the network. Therefore, the goal should focus on memory access fairness through balancing the latencies of memory accesses while ensuring a low average latency. In the paper, we address the goal by proposing to predict the round-trip latencies of memory access related packets and use the predicted round-trip latencies to prioritize the packets. The router supporting fair memory access is designed and its hardware cost is given. Experiments are carried out with a variety of network sizes and packet injection rates and prove that our approach outperforms the classic round-robin arbitration in terms of average latency and LSD1. In the experiments, the maximum improvement of the average latency and the LSD are 16% and 48% respectively.

  • 334.
    Liang, Lei
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Design and Implementation of an Extendable SoC Virtual Platform in SystemC-TLM 2.02012Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    With the increasing design complexity for SoC development, the workload for hardware designer and verification engineer is becoming larger and larger. On the other hand, software and hardware development is unable to be carried out in parallel. This creates a bottleneck in the current design flow. Also, it will be very difficult to deal with the hardware problems which are found during the software development process. To overcome these problems, design at higher level needs to be applied. SystemC is a language which enables the design at the system level and the TLM-2.0 contains different standardized SystemC interface classes, which ensures the portability and interoperability of different IPs.

    In this thesis, an extendable SoC virtual platform is implemented in SystemC. It can give exactly the same functions as the design specification required. A standardized SystemC module template is designed which owns all different interfaces of the virtual platform. The template can provide lots of convenience for future module development. One method for wrapping a C/C++ into SystemC is given and a basic framework structure is implemented so that the existing C++/Simics modules can work in the designed SystemC virtual platform. Finally, the comparison on simulation time and workload between RTL modules and SystemC modules is made, which demonstrates that large development time can be saved by using this virtual platform for software development.

  • 335.
    Liu, Jia
    et al.
    KTH, School of Industrial Engineering and Management (ITM), Industrial Economics and Management (Dept.), Industrial Management.
    Li, Z.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Design of evaluation platform of machine vision for portable wireless terminals2011Conference paper (Refereed)
    Abstract [en]

    An evaluation platform for Machine vision algorithm is designed in this paper. The platform is constructed with DM6437 DSP processor and image input-output circuit models. An image process algorithm used for machine vision can be performed on the platform. With DFG model of the algorithm, the algorithm architecture can be built for programming and analyzing expediently. As an example the image segmentation algorithm has been modeled and executed with the platform. The result shows that the platform is useful for algorithm analysis and could be compared with other implementation system as design reference.

  • 336. Liu, M.
    et al.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Kuehn, W.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    A survey of FPGA dynamic reconfiguration design methodology and applications2012In: International Journal of Embedded and Real-Time Communication Systems, ISSN 1947-3176, Vol. 3, no 2, p. 23-39Article, review/survey (Refereed)
    Abstract [en]

    FPGA Dynamic Partial Reconfiguration (DPR or PR) technology has emerged and become gradually mature in the recent years. It provides the Time-Division Multiplexing (TDM) capability in utilizing on-chip resources and leads to significant benefits in comparison with conventional static designs. However, the partially reconfigurable design process features additional complexity and technical requirements to the FPGA developers. Hence, PR design approaches are being widely explored and investigated to systematize the development methodology and ease the designers. In this paper, the authors collect several research and engineering projects in this area and present a survey of the design methodology and applications of PR. Research aspects are discussed in various hardware/software layers.

  • 337.
    Liu, Ming
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Adaptive Computing based on FPGA Run-time Reconfigurability2011Doctoral thesis, monograph (Other academic)
    Abstract [en]

    In the past two decades, FPGA has been witnessed from its restricted use as glue logic towards real System-on-Chip (SoC) platforms. Profiting from the great development on semiconductor and IC technologies, the programmability of FPGAs enables themselves wide adoption in all kinds of aspects of embedded designs. Modern FPGAs provide the additional capability of being dynamically and partially reconfigured during the system run-time. The run-time reconfigurability enhances FPGA designs from the sole spatial to both spatial and temporal parallelism, providing more design flexibility for advanced system features.

    Adaptive computing delegates an advanced computing paradigm in which computation tasks and resources are intelligently managed in correspondence with conditional requirements. In this thesis, we investigate adaptive designs on FPGA platforms: We present a comprehensive and practical design framework for adaptive computing based on the FPGA run-time reconfigurability. It concerns several design key issues in different hardware/software layers, specifically hardware architecture, run-time reconfiguration technical support, OS and device drivers, hardware process scheduler, context switching as well as Inter-Process Communications (IPC). Targeting a special application of data acquisition (DAQ) and trigger systems in nuclear and particle physics experiments, we set up the data streaming model and conduct theoretical analysis on the adaptive system. Three application studies are employed to verify the proposed adaptive design framework: The first application demonstrates a peripheral controller adaptable system aiming at general embedded designs. Through dynamically loading/unloading a NOR flash memory controller and an SRAM controller, both flash memory and SRAM accesses may be accomplished with less resource consumption than in traditional static designs. In the second case, two real algorithm processing engines are adaptively time-multiplexed in the same reconfigurable slot for particle recognition computation. Experimental results reveal the reduced on-chip resource requirements, as well as an approximate processing capability of the peer static design. Taking advantage of the FPGA dynamic reconfigurability, we present in the third application a novel on-FPGA interconnection microarchitecture named RouterLess NoC (RL-NoC). RL-NoC employs the novel design concept of Move Logic Not Data (MLND), and significantly distinguishes itself from the existing interconnection architectures such as buses, crossbars or NoCs. It does not rely on routers to deliver packets hop by hop as canonical NoCs do, but buffers data packets in virtual channels and brings various nodes using run-time reconfiguration to produce or consume them. In comparison with canonical packet-switching NoCs, the routerless architecture features lower design complexity, less resource consumption, higher work frequency, more efficient power dissipation as well as comparable or even higher packet delivery efficiency. It is regarded as a promising interconnection approach in some design scenarios on FPGAs, especially for light-weight applications.

  • 338.
    Liu, Ming
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Dubrova, Elena
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    The robustness of balanced boolean networks2013In: Complex Networks / [ed] Menezes, Ronaldo; Evsukoff, Alexandre; González, Marta C, Springer Berlin/Heidelberg, 2013, p. 19-30Conference paper (Refereed)
    Abstract [en]

    One of the characteristic features of genetic regulatory networks is their inherent robustness, that is, their ability to retain functionality in spite of the introduction of random errors. In this paper, we focus on the robustness of Balanced Boolean Networks (BBNs), which is a special kind of Boolean Network model of genetic regulatory networks. Our goal is to formalize and analyse the robustness of BBNs. Based on these results, applications using Boolean network model can be improved and optimized to be more robust. We formalize BBNs and introduce a method to construct BBNs for 2-singleton attractors Boolean networks. The experiment results show that BBNs have a good performance on tolerating the single stuck-at faults on every edge. Our method improves the robustness of Boolean networks by at least 13% in average, and in some special case, up to 61%.

  • 339. Liu, Ming
    et al.
    Kuehn, Wolfgang
    Lange, Soeren
    Yang, Shuo
    Roskoss, Johannes
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Wang, Qiang
    Xu, Hao
    Jin, Dapeng
    Liu, Zhen'an
    A High-End Reconfigurable Computation Platform for Nuclear and Particle Physics Experiments2011In: Computing in science & engineering (Print), ISSN 1521-9615, E-ISSN 1558-366X, Vol. 13, no 2, p. 52-63Article in journal (Refereed)
    Abstract [en]

    A high-performance computation platform based on field-programmable gate arrays targets nuclear and particle physics experiment applications. The system can be constructed or scaled into a supercomputer-equivalent size for detector data processing by inserting compute nodes into advanced telecommunications computing architecture (ATCA) crates. Among the case study results are that one ATCA crate can provide a computation capability equivalent to hundreds of commodity PCs for Hades online particle track reconstruction and Cherenkov ring recognition.

  • 340.
    Liu, Ming
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Kuehn, Wolfgang
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Adaptively Reconfigurable Controller for the Flash Memory2011In: Flash Memories, InTech , 2011Chapter in book (Refereed)
  • 341.
    Liu, Ming
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Kuehn, Wolfgang
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    FPGA-based adaptive computing for correlated multi-stream processing2010In: Proceedings -Design, Automation and Test in Europe, DATE, IEEE Computer Society, 2010, p. 973-976Conference paper (Refereed)
    Abstract [en]

    In conventional static implementations for correlated streaming applications, computing resources may be inefficiently utilized since multiple stream processors may supply their sub-results at asynchronous rates for result correlation or synchronization. To enhance the resource utilization efficiency, we analyze multi-streaming models and implement an adaptive architecture based on FPGA Partial Reconfiguration (PR) technology. The adaptive system can intelligently schedule and manage various processing modules during run-time. Experimental results demonstrate up to 78.2% improvement in throughput-per-unit- area on unbalanced processing of correlated streams, as well as only 0.3% context switching overhead in the overall processing time in the worst-case.

  • 342.
    Liu, Ming
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Kuehn, Wolfgang
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    FPGA-based Cherenkov Ring Recognition in Nuclear and Particle Physics Experiments2011In: Reconfigurable Computing: Architectures, Tools And Applications / [ed] Koch, A; Krishnamurthy, R; McAllister, J; Woods, R; ElGhazawi, T, Springer, 2011, p. 169-180Conference paper (Refereed)
    Abstract [en]

    Cherenkov ring is often adopted to identify particles flying through the detector systems in nuclear and particle physics experiments. In this paper, we introduce an improved ring recognition algorithm and present its FPGA implementation. Compared to the previous implementation based on VMEBus and FPGAs, our design is evaluated to outperform by several tens up to hundred times with acceptable resource utilizations on a Xilinx Virtex-4 FX60 FPGA. The design module will reside in the online data acquisition (DAQ) and trigger facilities, and contribute to significantly reduce the data rate of storage for offline analysis by retaining only interesting events and dropping the noise. Our customized FPGA cluster in one ATCA [1] shelf is foreseen to achieve an equivalent computation capability up to thousands of commodity PCs for particle recognition.

  • 343.
    Liu, Ming
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Kuehn, Wolfgang
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    FPGA-Based Particle Recognition in the HADES Experiment2011In: IEEE Design & Test of Computers, ISSN 0740-7475, E-ISSN 1558-1918, Vol. 28, no 4, p. 48-57Article in journal (Refereed)
  • 344.
    Liu, Ming
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Kuehn, Wolfgang
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Inter-process communication using pipes in FPGA-based adaptive computing2010In: Proceedings - IEEE Annual Symposium on VLSI, ISVLSI 2010, 2010, p. 80-85Conference paper (Refereed)
    Abstract [en]

    In FPGA-based adaptive computing, Inter-Process Communications (IPC) are required to exchange information among hardware processes which time-multiplex the resources in a same reconfigurable region. In this paper, we use pipes for IPC and analyze the performance in terms of throughput, throughput efficiency and latency in switching contexts. We also present two practical implementations using FPGA BRAM and external DDR memory. Experimental results expose the key role that context switching plays in determining the IPC performance at various pipe sizes and data rates.

  • 345.
    Liu, Ming
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Kuehn, Wolfgang
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Reducing FPGA Reconfiguration Time Overhead using Virtual Configurations2010In: Proceedings of the 5th International Workshop on Reconfigurable Communication Centric Systems-on-Chip, 2010, p. 149-152Conference paper (Refereed)
    Abstract [en]

    Reconfiguration time overhead is a critical factor in determining the system performance of FPGA dynamically reconfigurable designs. To reduce the reconfiguration overhead, the most straightforward way is to increase the reconfiguration throughput, as many previous contributions did. In addition to shortening FPGA reconfiguration time, we introduce a new concept of Virtual ConFigurations (VCF) in this paper, hiding dynamic reconfiguration time in the background to reduce the overhead. Experimental results demonstrate up to 29.9% throughput enhancement by adopting two VCFs in a consumerreconfigurable design. The packet latency performance is also largely improved by extending the channel saturation to a higher packet injection rate.

  • 346.
    Liu, Ming
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Sharif Mansouri, Shohreh
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Dubrova, Elena
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    A Faster Shift Register Alternative to Filter Generators2013In: Proceedings - 16th Euromicro Conference on Digital System Design, DSD 2013, IEEE , 2013, p. 713-718Conference paper (Refereed)
    Abstract [en]

    LFSR-based filter generators are used as a basic building block in many stream ciphers. Filter generators are popular because their well-defined mathematical description enables a detailed formal security analysis. In this paper, we show how to modify a filter generator into a nonlinear feedback shift register which is faster, but slightly larger, than the original filter generator. For example, the propagation delay can be reduced 1.54 times at the expense of 1.27% extra area. The presented method might be important for applications which require very high data rates, e.g. 4G mobile communication technology.

  • 347.
    Liu, Pei
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Ebrahim, Fatemeh Ostad
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Paul, Kolin
    Department of Computer Science and engineering, Indian Institute of Technology.
    A Coarse-Grained Reconfigurable Processor for Sequencing and Phylogenetic Algorithms in Bioinformatics2011In: Proceedings: 2011 International Conference on Reconfigurable Computing and FPGAs, ReConFig 2011, 2011, p. 190-197Conference paper (Refereed)
    Abstract [en]

    A coarse-grained reconfigurable processor tailoredfor accelerating multiple bioinformatics algorithms isproposed. In this paper, a programmable and scalablearchitectural platform instantiates an array of coarse grainedlight weight processing elements, which allows arbitrarypartitioning, scheduling schemes and capable of solvingcomplete four popular bioinformatics algorithms: theNeedleman-Wunsch, Smith-Waterman, and HMMER onsequencing, and Maximum Likelihood on phylogenetic. Thekey difference of the proposed CGRA based solution comparedto FPGA and GPU based solutions is a much better match onarchitecture and algorithms for the core computational needs,as well as the system level architectural needs. For the samedegree of parallelism, we provide a 5X to 14X speed-upimprovements compared to FPGA solutions and 15X to 78Xcompared to GPU acceleration on 3 sequencing algorithms. Wealso provide 2.8X speed-up compared to FPGA with the sameamount of core logic and 70X compared to GPU with the samesilicon area for Maximum Likelihood.

  • 348.
    Liu, Pei
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    A Coarse Grain Reconfigurable Architecture for sequence alignment problems in bio-informatics2010In: Proceedings of the 2010 IEEE 8th Symposium on Application Specific Processors, SASP'10, 2010, p. 50-57Conference paper (Refereed)
    Abstract [en]

    A Coarse Grain Reconfigurable Architecture (CGRA) tailored for accelerating bio-informatics algorithms is proposed. The key innovation is a light weight bio-informatics processor that can be reconfigured to perform different Add Compare and Select operations of the popular sequencing algorithms. A programmable and scalable architectural platform instantiates an array of such processing elements and allows arbitrary partitioning and scheduling schemes and capable of solving complete sequencing algorithms including the sequential phases and deal with arbitrarily large sequences. The key difference of the proposed CGRA based solution compared to FPGA and GPU based solutions is a much better match of the architecture and algorithm for the core computational need as well as the system level architectural need. This claim is quantified for three popular sequencing algorithms: the Needleman-Wunsch, Smith-Waterman and HMMER. For the same degree of parallelism, we provide a 5 X and 15 X speed-up improvements compared to FPGA and GPU respectively. For the same size of silicon, the advantage grows by a factor of another 10 X.

  • 349.
    Liu, Pei
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Paul, Kolin
    Indian Institute of Technology, Delhi, India.
    A many-core hardware acceleration platform for short read mapping problem using distributed memory interface with 3D-stacked architecture2014In: 2014 International Symposium on System-on-Chip, SoC 2014, 2014, p. 1-8Conference paper (Refereed)
    Abstract [en]

    Next Generation Sequencing technologies produce huge amounts of short reads consisting randomly fragmented DNA base pair strings, while assembling poses a challenge on the mapping of short reads to a reference genome in terms of both sensitivity and execution time. In this paper, we propose a many-core hardware acceleration platform for short read mapping based on hash-index method, which benefit from a distributed memory interface with 3D-stacked architecture for local memory access. Our design provides an amazingly 45012 times speedup over software approach for single end short reads and 21102 times for paired end reads, while also beats similar single FPGA solution for 1466 times in case of single end reads.

  • 350.
    Liu, Pei
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Paul, Kolin
    Department of Computer Science and engineering, Indian Institute of Technology.
    A reconfigurable processor for phylogenetic inference2011In: VLSI Design (VLSI Design), 2011 24th International Conference on, IEEE , 2011, p. 226-231Conference paper (Refereed)
    Abstract [en]

    A reconfigurable processor tailored for accelerating Phylogenetic Inference is proposed. In this paper, a programmable and scalable architectural platform instantiates an array of coarse grained light weight processing elements, which allows arbitrary partitioning, scheduling schemes and capable of solving complete Maximum Likelihood algorithm with arbitrarily of large sequences. The key difference of the proposed CGRA based solution compared to FPGA and GPU based solutions is a much better match of the architecture and algorithm for the core computational need as well as the system level architectural need. For the same degree of parallelism, we provide a 2.27X speed-up improvements compared to FPGA with the same amount of logic, and an 81.87X speed-up improvements compared to GPU with the same silicon area respectively.

45678910 301 - 350 of 633
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf