Purpose – The paper addresses three main problems resulting from uncertainty in information securitymanagement: i) dynamically changing security requirements of an organization ii) externalities caused by a securitysystem and iii) obsolete evaluation of security concerns.
Design/methodology/approach – In order to address these critical concerns, a framework based on optionsreasoning borrowed from corporate finance is proposed and adapted to evaluation of security architecture anddecision-making for handling these issues at organizational level. The adaptation as a methodology is demonstrated by a large case study validating its efficacy.
Findings – The paper shows through three examples that it is possible to have a coherent methodology, buildingon options theory to deal with uncertainty issues in information security at an organizational level.
Practical implications – To validate the efficacy of the methodology proposed in this paper, it was applied tothe SHS (Spridnings- och Hämtningssystem: Dissemination and Retrieval System) system. The paper introduces themethodology, presents its application to the SHS system in detail and compares it to the current practice.
Originality/value – This research is relevant to information security management in organizations, particularlyissues on changing requirements and evaluation in uncertain circumstances created by progress in technology.
Organizations relying on Information Technology for their business processes have to employ various Security Mechanisms (Authentication, Authorization, Hashing, Encryption etc) to achieve their organizational security objectives of data confidentiality, integrity and availability. These security mechanisms except from their intended role of increased security level for this organization may also affect other systems outside the organization in a positive or negative manner called externalities. Externalities emerge in several ways i.e. direct cost, direct benefit, indirect cost and indirect benefit. Organizations barely consider positive externalities although they can be beneficial and the negative externalities that could create vulnerabilities are simply ignored. In this paper, we will present an infrastructure to streamline information security externalities that appear dynamically for an organization
The constantly rising threats in IT infrastructure raise many concerns for an organization, altering security requirements according to dynamically changing environment, need of midcourse decision management and deliberate evaluation of security measures are most striking. Common Criteria for IT security evaluation has long been considered to be victimized by uncertain IT infrastructure and considered resource hungry, complex and time consuming process. Considering this aspect we have continued our research quest for analyzing the opportunities to empower IT security evaluation process using Real Options thinking. The focus of our research is not only the applicability of real options analysis in IT security evaluation but also observing its implications in various domains including IT security investments and risk management. We find it motivating and worth doing to use an established method from corporate finance i.e. real options and utilize its rule of thumb technique as a road map to counter uncertainty issues for evaluation of IT products. We believe employing options theory in security evaluation will provide the intended benefits. i.e. i) manage dynamically changing security requirements ii) accelerating evaluation process iii) midcourse decision management. Having all the capabilities of effective uncertainty management, options theory follows work procedures based on mathematical calculations quite different from information security work processes. In this paper, we will address the diversities between the work processes of security evaluation and real options analysis. We present an adaptability infrastructure to bridge the gap and make them coherent with each other. This liaison will transform real options concepts into a compatible mode that provides grounds to target IT security evaluation and common criteria issues. We will address ESAM system as an example for illustrations and applicability of the concepts.
Reliability of IT systems and infrastructure is a critical need for organizations to trust their business processes. This makes security evaluation of IT systems a prime concern for these organizations. Common Criteria is an elaborate, globally accepted security evaluation process that fulfills this need. However CC rigidly follows the initial specification and security threats and takes too long to evaluate and as such is also very expensive. Rapid development in technology and with it the new security threats further aggravates the long evaluation time problem of CC to the extent that by the time a CC evaluation is done, it may no longer be valid because new security threats have emerged that have not been factored in. To address these problems, we propose a novel Option Based Evaluation methodology for security of IT systems that can also be considered as an enhancement to the CC process. The objective is to address uncertainty issues in IT environment and speed up the slow CC based evaluation processes. OBE will follow incremental evaluation model and address the following main concerns based on options theory i.e. i) managing dynamic security requirement with mid-course decision management ii) devising evaluation as an improvement process iii) reducing cost and time for evaluation of an IT product.
Information security has long been considered as a key concern for organizations benefiting from the electronic era. Rapid technological developments have been observed in the last decade which has given rise to novel security threats, making IT, an uncertain infrastructure. For this reason, the business organizations have an acute need to evaluate the security aspects of their IT infrastructure. Since many years, CC (Common Criteria) has been widely used and accepted for evaluating the security of IT products. It does not impose predefined security rules that a product should exhibit but a language for security evaluation. CC has certain advantages over ITSEC1, CTCPEC2 and TCSEC3 due to its ability to address all the three dimensions: a) it provides opportunity for users to specify their security requirements, b) an implementation guide for the developers and c) provides comprehensive criteria to evaluate the security requirements. Among the few notable shortcomings of CC is the amount of resources and a lot of time consumption. Another drawback of CC is that the security requirements in this uncertain IT environment must be defined before the project starts. ROA is a well known modern methodology used to make investment decisions for the projects under uncertainty. It is based on options theory that provides not only strategic flexibility but also helps to consider hidden options during uncertainty. ROA comes in two flavors: first for the financial option pricing and second for the more uncertain real world problems where the end results are not deterministic. Information security is one of the core areas under consideration where researchers are employing ROA to take security investment decisions. In this paper, we give a brief introduction of ROA and its use in various domains. We will evaluate the use of Real options based methods to enhance the Common Criteria evaluation methodology to manage the dynamic security requirement specification and reducing required time and resources. We will analyze the possibilities to overcome CC limitations from the perspective of the end user, developer and evaluator. We believe that with the ROA enhanced capabilities will potentially be able to stop and possibly reverse this trend and strengthen the CC usage with a more effective and responsive evaluation methodology.
Today multicore platforms are already prevalent solutions for modern embedded systems. In the future, embedded platforms will have an even more increased processor core count, composing many-core platforms. In addition, applications are becoming more complex and dynamic and try to efficiently utilize the amount of available resources on the embedded platforms. Efficient memory utilization is a key challenge for application developers, especially since memory is a scarce resource and often becomes the system's bottleneck. To cope with this dynamism and achieve better memory footprint utilization (lowmemory fragmentation) application developers resort to the usage of dynamic memory (heap) management techniques, by allocating and deallocating data at runtime. Moreover, overall power consumption is another key challenge that needs to be taken into consideration. Towards this, designers employ the usage of Dynamic Voltage and Frequency Scaling (DVFS) mechanisms, adapting to the application's computational demands at runtime. In this article, we propose the combination of dynamic memory management techniques with DVFS ones. This is performed by integrating, within thememorymanager, runtimemonitoringmechanisms that steer the DVFSmechanisms to adjust clock frequency and voltage supply based on heap performance. The proposed approach has been evaluated on a distributed shared-memory many-core platform composed of multiple LEON3 processors interconnected by a Network-on-Chip infrastructure, supporting DVFS. Experimental results show that by using the proposed method for monitoring and applying DVFS mechanisms the power consumption concerning dynamic memory management was reduced by approximately 37%. In addition we present the trade-offs the proposed approach. Last, by combining the developed method with heap fragmentation-aware dynamic memory managers, we achieve low heap fragmentation values combined with low power consumption.
Today, reconfigurable architectures are becoming increas- ingly popular as the candidate platforms for neural net- works. Existing works, that map neural networks on re- configurable architectures, only address either FPGAs or Networks-on-chip, without any reference to the Coarse-Grain Reconfigurable Architectures (CGRAs). In this paper we investigate the overheads imposed by implementing spiking neural networks on a Coarse Grained Reconfigurable Ar- chitecture (CGRAs). Experimental results (using point to point connectivity) reveal that up to 1000 neurons can be connected, with an average response time of 4.4 msec.
Mapping algorithms on CGRAs can lead to an inefficient implementation and hardware under-utilization if there is a mismatch between the granularity of reconfigurable processing unit and the algorithm. In this paper, we introduce a tool that takes the hardware configuration of a set of applications, identifies the unused parts of the CGRA, and let the user sweep the design space from fully programmable to fully customized by eliminating the unused components. User can select among multiple design points according to the application specification. This method is very useful to design multi-mode ASIC accelerators. The fully customized hardware generated using our tool has a negligible area and power overhead compared to the equivalent ASIC but can be generated significantly faster.
Trade-off between flexibility and performance became an important factor for characterizing modern protocol processing architectures. While some solutions tend to be more flexible and less computational efficient like GPPs, other solutions like custom ASIC devices provide high computational efficiency while loosing the ability to cope with the diversity of current and evolving protocols. We propose a reconfigurable protocol processor that is flexible and highly adaptable to the needs of the required protocol with the ability to operate individually or as a multi-core integrating processors. We show how a common protocol processing task that consumes one third of RISC CPU time can be performed on our processor at high speed and low energy cost.
In this paper, we present a highly customizable and rapidly reconfigurable multi-core packet processing architecture that provides energy and area efficiency while retaining flexibility. Presented architecture with its agile reconfigurability permits time-critical adaptability where resources can be re-clustered at run time in few cycles, hence, maintaining efficiency if requirements of the use-case change. We elaborate the flexibility and adaptability of our architecture and we report its evaluation results. For evaluation, we performed the widely-used UDP/IP and we compared our proposed architecture to low-power 32-bit general purpose processors, a custom ASIC implementation and a programmable protocol processor. Compared to GPP-based solutions, our architecture is 20-34 times more energy efficient while providing 2.4-4.1 times higher throughput. While retaining the programmability, the proposed solution achieved 78% of the energy efficiency of hardwired ASIC implementation. Compared to a programmable protocol processor, our solution has 2.6 times more throughput and requires only a third of the gate count. lastly, we quantified the worst-case time and average-case time required for time-critical adaptability when reconfiguration occurs during a real-life Voice-Over IP traffic.
Adaptable protocol processing architectures can offer quality-of-service (QoS) while improving energy efficiency and resource utilization. However, a key condition for adaptable architectures to support QoS is that, the latency required for processor adaptation does not result in violating packet processing delay bound. Moreover, adaptation latency must not cause packets to accumulate until memory becomes full and packets are dropped. In this paper, we present an elastic management scheme for agile adaptable multi-core protocol processing architecture to facilitate processor adaptation when QoS has to be maintained. The proposed management scheme encompasses a set of reconfigurable finite state machines (FSMs) and each is dimensioned to associate single processing element (PE). During processor adaptation, the needed FSMs can rapidly be clustered to provide the control needed for the newly adapted structure. We use a real-life application to demonstrate how our proposed management scheme supports maintaining QoS during processor adaptation. We also quantify the time needed for processor adaptation as well as the reduction in energy, latency and area achieved when using our scheme.
Parallel processing architectures have been increasingly utilized due to their potential for improving performance and energy efficiency. Unfortunately, the anticipated improvement often suffers from a limitation caused by memory access latency and latency variation, which consequently impact Quality of Service (QoS). This paper presents a service-guaranteed multi-port packet memory system to boost parallelism in protocol processing architectures. In this proposed memory system, all arriving packets are guaranteed a memory space, such that, a packet memory space can be allocated in a bounded number of cycles and each of its locations is accessible in a single cycle. We consider a real-time Voice Over Internet Protocol (VOIP) call as a case-study to evaluate our service-guaranteed memory system.
The project will address two main challenges of prevailing architectures: 1) The global Interconnect and memory bottleneck due to a single, globally shared memory with high access times and power consumption; 2) The difficulties in programming heterogeneous, multi-core platforms, in particular in dynamically managing data structures in distributed memory. MOSART aims to overcome these through a multi-core architecture with distributed memory organisation, a Network-on-Chip (NoC) communication backbone and configurable processing cores that are scaled, optimised and customised together to achieve diverse energy, performance, cost and size requirements of different classes of applications. MOSART achieves this by: A) Providing platform support for management of abstract data structures Including middleware services and a run-time data manager for NoC based communication infrastructure; 2) Developing tool support for parallelizing and mapping applications on the multi-core target platform and customizing the processing cores for the application.
MOSART project addresses two main challenges of prevailing architectures: (i) Theglobal interconnect and memory bottleneck due to a single, globally shared memorywith high access times and power consumption; (ii) The difficulties in programmingheterogeneous, multi-core platforms MOSART aims to overcome these through amulti-core architecture with distributed memory organization, a Network-on-Chip(NoC) communication backbone and configurable processing cores that are scaled,optimized and customized together to achieve diverse energy, performance, cost andsize requirements of different classes of applications. MOSART achieves this by:(i) Providing platform support for management of abstract data structures includingmiddleware services and a run-time data manager for NoC based communicationinfrastructure; (ii) Developing tool support for parallelizing and mapping applicationson the multi-core target platform and customizing the processing cores for theapplication.
In this chapter we present the power management architecture of the McNoC platform. The power management architecture of McNoC offers distributed Dynamic Voltage Frequency Scaling (DVFS) and power down services to the platform at a fine level of granularity, allowing independent setting of frequency and supply voltage to all switch and resource nodes in the platform. The design style enables hierarchical physical design and solves the clock-domain-crossing problem with a solution based on rationally-related frequencies, which avoids the overhead associated with handshake. The architecture allows arbitrary power management regions to be defined and region-wide power management commands affecting all nodes in a region can be issued by the software layer that we call as Power Management Intelligence (PMINT).
As a replacement for the fast-fading Globally-Synchronous model, we have defined a flexible design style for SoCs, called GRLS, for Globally-Ratiochronous, Locally-Synchronous, which does not rely on global synchronization and is based on using rationally-related clock frequencies derived from the same source. In this paper, using the special periodical properties of rationally-related systems, we build a latency-insensitive, maximal-throughput, low-overhead communication method, based on the idea of using both clock edges to sample data at the Receiver. The validity of the method and its resistance to non-idealities such as jitter, misalignments and clock drifts are formally proven while experimental results including overhead are presented for 90 nm technology. Despite allowing much greater flexibility, the overhead of our method is comparable to that of state-of-the-art mesochronous communication techniques. We also show performances, complexity and overhead improvements over all other approaches that have so far been proposed for rationally-related clock frequencies.
GALS Networks-on-Chip (NoCs) in which the frequency of every switch can be set independently would enable per-node DVFS without requiring asynchronous switch design. However, traditional GALS interfaces introduce high latency penalties and are therefore ill-suited for inter-switch links in a NoC. In this paper we introduce and study a GALS Network-on-Chip based on the Globally-Ratiochronous, Locally-Synchronous (GRLS) paradigm. GRLS constrains all switch frequencies to be rationally-related but enables the use of efficient interfaces which reduce the latency of the network 60% compared to GALS solutions and obtains better throughput-per-power ratios compared to synchronous and mesochronous solutions.
We have defined a flexible latency-insensitive design style called Globally Ratiochronous Locally Synchronous (GRLS), based on quantized voltage levels and rationally-related clock frequencies. In this paper we present the infrastructure necessary to enable Distributed DVFS in such a system and analyze its overheads, quantitatively showing how, with minimal overheads, we obtain energy benefits that are close to those of a totally ideal GALS approach. The benefits that we show, coupled with the complexity and performance benefits of GRLS, which we briefly analyze, show how this approach is a strong competitor to GALS.
We have introduced the Globally-Ratiochronous, Locally-Synchronous (GRLS) design paradigm, a design style based on rationally-related frequencies, with the objective to overcome the limitations of traditional multi-frequency systems by providing a flexibility close that of Globally-Asynchronous, Locally-Synchronous (GALS) systems but introducing performance penalties and overheads close to those of mesochronous systems. In this paper we focus on performances and improve the latency figures of our original GRLS interfaces by introducing two new interfaces, called GRLS-F and GRLS-noF, the first suitable for blocks with long computation time and the second for blocks with short computation time. The latency figures of the original GRLS interfaces are improved up to 50% without increasing complexity. The average latency figures of the resulting interfaces are lower than 1 Receiver clock cycle, the latency of a synchronous interface.
In this paper we present efficient Mesochronous and Plesiochronous interfaces targeting low-latency and low-overhead links. Our source-synchronous scheme can easily be integrated in traditional design flows, supports maximal throughput, has low latency and has an overhead of only three flipflops per data line. With one additional flipflop per data line, the Plesiochronous interface allows the synchronizer to cope with clock drifts. The simple synchronization scheme is validated through formal analysis and simulation.
In this paper, we introduce a source-synchronous adaptive interface for the globally ratiochronous, locally synchronous design style, a subset of the globally asynchronous, locally synchronous (GALS) design style in which the frequencies of all clocks are not phase-aligned but are constrained to be rationally related, i.e., they are all submultiple of the same physical or virtual frequency. The interface can be designed using only standard cells and guarantees maximal throughput in addition to an average latency four times lower compared with state-of-the-art asynchronous first-input, first-output GALS interfaces. Several properties of the interface are formally stated and proved. We also demonstrate that the interface has a low area overhead, with only four flip-flops per data line, and is robust against nonidealities such as clock jitters and propagation delay misalignments. For a realistic link in 90-nm application-specific integrated circuit technology, we derive a 1-GHz upper bound for the least common multiple among the frequencies.
In this paper we introduce a novel interface for Globally-Asynchronous, Locally-Synchronous systems which does not use any form of handshake to cross the gap between the clock domains. In particular, links in which the Receiver runs faster than the Transmitter are targeted. The interface works by finding an approximate ratio between the clock frequencies. Then, ratiochronous synchronizers that can tolerate clock drifts are employed to transmit data from the Transmitter to the Receiver clock domain. Thanks to the periodic properties of rationally-related systems, no handshake is employed and the average latency of the interface is decreased ∌ 75% compared to state-of-the-art GALS interfaces. Additionally, the interface uses only standard cells and, save for a delay line, can be designed at Register Transfer Level.
Embedded cores are gaining widespread use to deal with the complex DSP systems where flexibility is of utmost importance. The design of such a system offers several problems, which are not addressed by the existing methodology. The authors previously presented an integrated grammar based DSP design methodology that separates architectural and functional specification, can create a virtual prototype and has a smooth link to the implementation phase. In this paper we present the extension of the work to handle embedded cores. Here we the capture the host peripheral interface (HPI) of TMS320C6x core at higher level of abstraction and provide a single simulation environment, which facilitates faster analysis of hardware software components. Our results reveal that the proposed methodology offers simulation time speed-up of 5 times and design time speed-up of 8 times, while keeping the architectural specification separated from functionality
System-level exploration of memory organizations is a key issue in successful implementation of data dominated applications based on dynamically allocated data structures involving records and access keys. This paper presents a formalized technique for exploring different memory data-format alternatives when only the system level functional behavior of the application has been defined. Our data-format exploration approach allows to substantially minimize the number of accessed bits by rearranging the format of the data records. The technique exploits parallelism in the data transfer by analyzing the dependencies between data-record accesses. As a result, significant reduction in memory size, bandwidth, and power are obtained. We have validated our techniques using several real-life asynchronous transfer mode cell processing applications, where we have obtained reductions in memory size (up to 20%), power (up to a 60%), and bandwidth.
Estimators are critical tools in doing architectural level exploration of the design space. We present a novel approach to estimation based on the multilayer perceptron which builds the estimation function during the learning process and thus allows to describe arbitrary complex functions. We also describe how the control data flow graph is encoded for the neural network input and we present results of the first experiments made with realistic design examples.
Not available
In this paper, we introduce BRIC, a novel custom multi-chip digital computer architecture for simulating in realtime a model of human brain in form of a spiking Bayesian Confidence Propagation Neural Network (BCPNN). The design is conceptually dimensioned for available technology in 2015-2020 with the estimated size of a pizza box, consuming less than 3 kWs of power, delivering 800 Teraflops/sec (single precision multiply operation) and 30 TBs of memory. To the best of our knowledge, this will be the smallest and lowest power real-time brain simulation engine if manufactured. The silicon and computational efficiencies come from use of 3D memory stacking, innovation in algorithm and architectural customization. The chip will be programmable allowing experimentation with variants of the BCPNN brain model.
The increasing demand for higher resolution of images and communication bandwidth requires the streaming applications to deal with ever increasing size of datasets. Further, with technology scaling the cost of moving data is reducing at a slower pace compared to the cost of computing. These trends have motivated the proposed micro-architectural reorganization of stream processors by dividing the stream computation into functional computation, address constraints computation and address generation and deploying independent, distributed micro-threads to implement them. This scheme is an alternative to parallelizing them at instruction level. The proposed scheme has two benefits: a more efficient sequencer logic and energy savings in address generation and transportation. These benefits are quantified for a set of streaming applications and show average percentage improvement of 39 in silicon efficiency of the sequencer logic and 23 in total computational efficiency.
A multi-chip custom digital super-computer called eBrain for simulating Bayesian Confidence Propagation Neural Network (BCPNN) model of the human brain has been proposed. It uses Hybrid Memory Cube (HMC), the 3D stacked DRAM memories for storing synaptic weights that are integrated with a custom designed logic chip that implements the BCPNN model. In 22nm node, eBrain executes BCPNN in real time with 740 TFlops/s while accessing 30 TBs synaptic weights with a bandwidth of 112 TBs/s while consuming less than 6 kWs power for the typical case. This efficiency is three orders better than general purpose supercomputers in the same technology node.
This paper presents hardware solution for runtime computation of loop constraints and synchronizing delays for multiple inner loops in parallel distributed implementation of digital signal processing sub-systems. Methods to map and generate the runtime computation code for loop constraints and synchronizing delays are also presented. Compared to the traditional methods, the proposed solution achieves 55% average code compaction and 32.7% average performance improvement. The solution has modest hardware cost that increases linearly with the dimension of the architecture and has no performance penalty. Results from multiple realistic examples are presented, analyzed and compared to the traditional methods.
This paper presents a hardware based solution for a scalable runtime address generation scheme for DSP applications mapped to a parallel distributed coarse grain reconfigurable computation and storage fabric. The scheme can also deal with non-affine functions of multiple variables that typically correspond to multiple nested loops. The key innovation is the judicious use of two categories of address generation resources. The first category of resource is the low cost AGU that generates addresses for given address bounds for affine functions of up to two variables. Such low cost AGUs are distributed and associated with every read/write port in the distributed memory architecture. The second category of resource is relatively more complex but is also distributed but shared among a few storage units and is capable of handling more complex address generation requirements like dynamic computation of address bounds that are then used to configure the AGUs, transformation of non-affine functions to affine function by computing the affine factor outside the loop, etc. The runtime computation of the address constraints results in negligibly small overhead in latency, area and energy while it provides substantial reduction in program storage, reconfiguration agility and energy compared to the prevalent pre-computation of address constraints. The efficacy of the proposed method has been validated against the prevalent address generation schemes for a set of six realistic DSP functions. Compared to the pre-computation method, the proposed solution achieved 75% average code compaction and compared to the centralized runtime address generation scheme, the proposed solution achieved 32.7% average performance improvement.
In spite of decades of research, only a small percentage of hardware is designed using high-level synthesis because of the large gap between the abstraction levels of standard cells and algorithmic level. We propose a grid-based regular physical design platform composed of large grain hardened building blocks called SiLago blocks. This platform is divided into regions which are specialized for different functionalities like computation, storage, system control, etc. The characterized micro-architectural operations of the SiLago platform serve as the interface to meet-in-the-middle high-level and system-level syntheses framework. This framework was used to generate three hardware macro instances, derived from SiLago platform for three applications from signal processing domain. Results show two orders of magnitude improvements in efficiency of the system-level design space exploration and synthesis time, with average loss in design quality of 18% for energy and 54% for area compared to the commercial SOC flow.
This paper presents an industrial case study of using a Coarse Grain Reconfigurable Architecture (CGRA) for a multi-mode accelerator for two kernels: FFT for the LTE standard and the Correlation Pool for the UMTS standard to be executed in a mutually exclusive manner. The CGRA multi-mode accelerator achieved computational efficiency of 39.94 GOPS/watt (OP is multiply-add) and silicon efficiency of 56.20 GOPS/mm2. By analyzing the code and inferring the unused features of the fully programmable solution, an in-house developed tool was used to automatically customize the design to run just the two kernels and the two efficiency metrics improved to 49.05 GOPS/watt and 107.57 GOPS/mm2. Corresponding numbers for the ASIC implementation are 63.84 GOPS/watt and 90.91 GOPS/mm2. Though the ASIC’s silicon and computational efficiency numbers are slightly better, the engineering efficiency of the pre-verified/characterized CGRA solution is at least 10X better than the ASIC solution.
This paper describes the restructuring of VLSI education at the Royal Institute of Technology (KTH). Changing needs of industry, advances in technology and design methodology has required a significant reorganization of VLSI education with combined emphasis on system issues and associated physical constraints. We present here a course structure which will address, in parallel fashion, the key design issues for future system products
Moore’s law has been the guiding principle for defining the semiconductor roadmap ever since it was proposed in 1965. This article examines and proposes a law that plays the role of Moore’s law in the very large-scale integration (VLSI) computer-aided design (CAD) field. This article also examines the difference in nature of semiconductor productivity and design productivity.