Comprehending the performance bottlenecks at the core of the intricate hardware-software inter-actions exhibited by highly parallel programs on HPC clusters is crucial. This paper sheds light on the issue of automatically asynchronous MPI communication in memory-bound parallel programs on multicore clusters and how it can be facilitated. For instance, slowing down MPI processes by deliberate injection of delays can improve performance if certain conditions are met. This leads to the counter-intuitive conclusion that noise, independent of its source, is not always detrimental but can be leveraged for performance improvements. We employ phase-space graphs as a new tool to visualize parallel program dynamics. They are useful in spotting certain patterns in parallel execution that will easily go unnoticed with traditional tracing tools. We investigate five different microbenchmarks and applications on different supercomputer platforms: an MPI-augmented STREAM Triad, two implementations of Lattice-Boltzmann fluid solvers (D3Q19 and SPEChpc D2Q37), the LULESH and HPCG proxy applications.
Dalton is a molecular electronic structure program featuring common methods of computational chemistry that are based on pure quantum mechanics (QM) as well as hybrid quantum mechanics/molecular mechanics (QM/MM). It is specialized and has a leading position in calculation of molecular properties with a large world-wide user community (over 2000 licenses issued). In this paper, we present a performance characterization and optimization of Dalton. We also propose a solution to avoid the master/worker design of Dalton to become a performance bottleneck for larger process numbers. With these improvements we obtain speedups of 4x, increasing the parallel efficiency of the code and being able to run in it in a much bigger number of cores.
This paper summarises ongoing research and recent results on the development of flexible access control infrastructure for complex resource provisioning in Grid-based collaborative applications and on-demand network services provisioning. The paper analyses the general access control model for Grid-based applications and discusses what mechanisms can be used for expressing and handling dynamic domain or process/workflowrelated security context. Suggestions are given on what specific functionality should be added to the Grid-oriented authorization frameworks to handle such dynamic security context. As an example, the paper explains how such functionality can be achieved in the GAAA Authorization framework (GAAA-AuthZ) and GAAA toolkit. Additionally, the paper describes AuthZ ticket format for extended AuthZ session management. The paper is based on experiences gained from major Grid-based and Grid-oriented projects such as EGEE, Phosphorus, NextGRID, and GigaPort Research on Network.
The popularity of virtual worlds and their increasing economic impact has created a situation where the value of trusted identification has risen substantially. We propose an identity management solution that provides the user with secure credentials and allows to decrease the required trust that the user must have towards the server running the virtual world. Additionally, the identity management system allows the virtual world to incorporate reputation information. This allows the “wisdom of the crowd” to provide more input to users about the reliability of a certain identity. We describe how to use these identities to provide secure services in the virtual world. These include secure communications, digital signatures and secure bindings to external services.
Analytical modelling is indeed the most cost-effective method to evaluate the performance of a system. Several analytical models have been proposed in the literature for different interconnection network systems. This paper proposes an accurate analytical model to predict message latency in wormhole-switched star graphs with fully adaptive routing. Although the focus of this research is on the star graph but the approach used for modelling can be, however, used for modelling some other regular and irregular interconnection networks. The results obtained from simulation experiments confirm that the proposed model exhibits a good accuracy for various network sizes and under different operating conditions.
The design of a molecular dynamics trajectory database is presented as an example of the organization of large-scale dynamic distributed repositories for scientific data. Large scientific datasets are usually interpreted through reduced data calculated by analysis functions. This allows a database architecture in which the analyzed datasets, that are kept in addition to the raw datasets, are transferred to a database user. A flexible user interface with a well defined Application Program Interface (API) allows for a wide array of analysis functions and the incorporation of user defined functions is a critical part of the database design. An analysis function is executed only when the requested analysis result is not available from an earlier request. A prototype implementation used to gain initial practical experiences with performance and scalability is presented.
Implementing artificial intelligence (AI) in the Internet of Things (IoT) involves a move from the cloud to the heterogeneous and low-power edge, following an urgent demand for deploying complex training tasks in a distributed and reliable manner. This work proposes a self-aware distributed deep learning (DDL) framework for IoT applications, which is applicable to heterogeneous edge devices aiming to improve adaptivity and amortize the training cost. The self-aware design including the dynamic self-organizing approach and the self-healing method enhances the system reliability and resilience. Three typical edge devices are adopted with cross-platform Docker deployment: Personal Computers (PC) for general computing devices, Raspberry Pi 4Bs (Rpi) for resource-constrained edge devices, and Jetson Nanos (Jts) for AI-enabled edge devices. Benchmarked with ResNet-32 on CIFAR-10, the training efficiency of tested distributed clusters is increased by 8.44x compared to the standalone Rpi. The cluster with 11 heterogeneous edge devices achieves a training efficiency of 200.4 images/s and an accuracy of 92.45%. Results prove that the self-organizing approach functions well with dynamic changes like devices being removed or added. The self-healing method is evaluated with various stabilities, cluster scales, and breakdown cases, testifying that the reliability can be largely enhanced for extensively distributed deployments. The proposed DDL framework shows excellent performance for training implementation with heterogeneous edge devices in IoT applications with high-degree scalability and reliability.
Many authors have proposed criteria to assess the “environmental friendliness” or “sustainability” of software products. However, a causal model that links observable properties of a software product to conditions of it being green or (more general) sustainable is still missing. Such a causal model is necessary because software products are intangible goods and, as such, only have indirect effects on the physical world. In particular, software products are not subject to any wear and tear, they can be copied without great effort, and generate no waste or emissions when being disposed of. Viewed in isolation, software seems to be a perfectly sustainable type of product. In real life, however, software products with the same or similar functionality can differ substantially in the burden they place on natural resources, especially if the sequence of released versions and resulting hardware obsolescence is taken into account. In this article, we present a model describing the causal chains from software products to their impacts on natural resources, including energy sources, from a life-cycle perspective. We focus on (i) the demands of software for hardware capacities (local, remote, and in the connecting network) and the resulting hardware energy demand, (ii) the expectations of users regarding such demands and how these affect hardware operating life, and (iii) the autonomy of users in managing their software use with regard to resource efficiency. We propose a hierarchical set of criteria and indicators to assess these impacts. We demonstrate the application of this set of criteria, including the definition of standard usage scenarios for chosen categories of software products. We further discuss the practicability of this type of assessment, its acceptability for several stakeholders and potential consequences for the eco-labeling of software products and sustainable software design.
Internet of Things (IoT) devices and technology are increasingly being integrated into smart grids. These devices come with a lot of security vulnerabilities. To combat this, IoT protocols have been extended with security mechanisms. However, these mechanisms introduce extra processing that can lead to additional delay. This additional delay can affect the reliable operation of a smart power system which depends on prompt communication. This paper investigates the real time properties of IoT communication security protocols. We determine the impact of IoT protocols on the real time requirements of smart grid functions (protection, control, and monitoring). We measure how much communication traffic size and delay are added and how it scales. We determine the requirements that must be met by the optimized security protocols. Three IoT communication protocols were considered: CoAP/DTLS, MQTT/TLS, and XMPP/TLS. Results show both latency and overhead increased by a minimum of 3x for each protocol. The increase in delay was below the recommended maximum standard for microgrid monitoring but exceeded the recommended standard for control operations.
Data replication is one of the best known strategies to achieve high levels of availability and fault tolerance, as well as minimal access times for large, distributed user communities using a world-wide Data Grid. In certain scientific application domains, the data volume can reach the order of several petabytes; in these domains, data replication and access optimization play an important role in the manageability and usability of the Grid.
The past few years have dramatically changed the view of high performance applications and computing. While traditionally such applications have been targeted towards dedicated parallel machines, we see the emerging trend of building "meta-applications" composed of several modules that exploit heterogeneous platforms and employ hybrid forms of parallelism. In particular, Java has been recognized as modem programming language for heterogeneous distributed computing. In this paper we present OpusJava, a Java based framework for distributed high performance computing (DHPC) that provides a high level component infrastructure and facilitates a seamless integration of high performance Opus (i.e., HPF) modules into larger distributed environments. OpusJava offers a comprehensive programming model that supports the exploitation of hybrid parallelism and provides high level coordination means. (C) 2001 Elsevier Science B.V. All rights reserved.
We propose an end-to-end security scheme for mobility enabled healthcare Internet of Things (IoT). The proposed scheme consists of (i) a secure and efficient end-user authentication and authorization architecture based on the certificate based DTLS handshake, (ii) secure end-to-end communication based on session resumption, and (iii) robust mobility based on interconnected smart gateways. The smart gateways act as an intermediate processing layer (called fog layer) between IoT devices and sensors (device layer) and cloud services (cloud layer). In our scheme, the fog layer facilitates ubiquitous mobility without requiring any reconfiguration at the device layer. The scheme is demonstrated by simulation and a full hardware/software prototype. Based on our analysis, our scheme has the most extensive set of security features in comparison to related approaches found in literature. Energy-performance evaluation results show that compared to existing approaches, our scheme reduces the communication overhead by 26% and the communication latency between smart gateways and end users by 16%. In addition, our scheme is approximately 97% faster than certificate based and 10% faster than symmetric key based DTLS. Compared to our scheme, certificate based DTLS consumes about 2.2 times more RAM and 2.9 times more ROM resources. On the other hand, the RAM and ROM requirements of our scheme are almost as low as in symmetric key-based DTLS. Analysis of our implementation revealed that the handover latency caused by mobility is low and the handover process does not incur any processing or communication overhead on the sensors.
A wide range of Internet of Things devices, platforms and applications have been implemented in the past decade. The variation in platforms, communication protocols and data formats of these systems creates islands of applications. Many organizations are working towards standardizing the technologies used at different layers of communication in these systems. However, interoperability still remains one of the main challenges towards realizing the grand vision of IoT. Intergration approaches proven in the existing Internet or enterprise applications are not suitable for the IoT, mainly due to the nature of the devices involved; the majority of the devices are resource constrained. To address this problem of interoperability, our work considers various types of IoT application domains, architecture of the IoT and the works of standards organizations to give a holistic abstract model of IoT. According to this model, there are three computing layers, each with a different level of interoperability needs — technical, syntactic or semantic. This work presents a Web of Virtual Things (WoVT) server that can be deployed at the middle layer of IoT (Fog layer) and Cloud to address the problem of interoperability. It exposes a REST like uniform interface for syntactic integration of devices at the bottom layer of IoT (perception layer). An additional RESTful api is used for integration with other similar WoVT servers at the Fog or the Cloud layer. The server uses a state of the art architecture to enable this integration pattern and provides means towards semantic interoperability. The analysis and evaluation of the implementation, such as performance, resource utilization and security perspectives, are presented. The simulation results demonstrate that an integrated and scalable IoT through the web of virtual things can be realized.
We present an efficient and incremental (un)marshaling framework designed for distributed applications. A marshaler/ unmarshaler pair converts arbitrary structured data between its host and network representations. This technology can also be used for persistent storage. Our framework simplifies the design of efficient and flexible marshalers. The network latency is reduced by concurrent execution of (un)marshaling and network operations. The framework is actually used in Mozart, a distributed programming system that implements Oz, a multi-paradigm concurrent language.
Compute-intensive applications have gradually changed focus from massively parallel supercomputers to capacity as a resource obtained on-demand. This is particularly true for the large-scale adoption of cloud computing and MapReduce in industry, while it has been difficult for traditional high-performance computing (HPC) usage in scientific and engineering computing to exploit this type of resources. However, with the strong trend of increasing parallelism rather than faster processors, a growing number of applications target parallelism already on the algorithm level with loosely coupled approaches based on sampling and ensembles. While these cannot trivially be formulated as MapReduce, they are highly amenable to throughput computing. There are many general and powerful frameworks, but in particular for sampling-based algorithms in scientific computing there are some clear advantages from having a platform and scheduler that are highly aware of the underlying physical problem. Here, we present how these challenges are addressed with combinations of dataflow programming, peer-to-peer techniques and peer-to-peer networks in the Copernicus platform. This allows automation of sampling-focused workflows, task generation, dependency tracking, and not least distributing these to a diverse set of compute resources ranging from supercomputers to clouds and distributed computing (across firewalls and fragile networks). Workflows are defined from modules using existing programs, which makes them reusable without programming requirements. The system achieves resiliency by handling node failures transparently with minimal loss of computing time due to checkpointing, and a single server can manage hundreds of thousands of cores e.g. for computational chemistry applications.
Constrained Application Protocol (CoAP) has become the de-facto web standard for the IoT. Unlike traditional wireless sensor networks, Internet-connected smart thing deployments require security. CoAP mandates the use of the Datagram TLS (DTLS) protocol as the underlying secure communication protocol. In this paper we implement DTLS-protected secure CoAP for both resource-constrained IoT devices and a cloud backend and evaluate all three security modes (pre-shared key, raw-public key, and certificate-based) of CoAP in a real cloud-connected IoT setup. We extend SicsthSense– a cloud platform for the IoT– with secure CoAP capabilities, and compliment a DTLS implementation for resource-constrained IoT devices with raw-public key and certificate-based asymmetric cryptography. To the best of our knowledge, this is the first effort toward providing end-to-end secure communication between resource-constrained smart things and cloud back-ends which supports all three security modes of CoAP both on the client side and the server side. SecureSense– our End-to-End (E2E) secure communication architecture for the IoT– consists of all standard-based protocols, and implementation of these protocols are open source and BSD-licensed. The SecureSense evaluation benchmarks and open source and open license implementation make it possible for future IoT product and service providers to account for security overhead while using all standardized protocols and while ensuring interoperability among different vendors. The core contributions of this paper are: (i) a complete implementation for CoAP security modes for E2E IoT security, (ii) IoT security and communication protocols for a cloud platform for the IoT, and (iii) detailed experimental evaluation and benchmarking of E2E security between a network of smart things and a cloud platform.
Grid computing has been the subject of many large national and international IT projects. However, not all goals of these projects have been achieved. In particular. the number of users lags behind the initial forecasts laid out by proponents of grid technologies. This underachievement may have led to claims that the grid concept as a whole is on its way to being replaced by Cloud computing and various X-as-a-Service approaches. In this paper, we try to analyze the current situation and to identify promising directions for future grid development. Although there are shortcomings in current grid systems, we are convinced that the concept as a whole remains valid and can benefit from new developments, including Cloud computing. Furthermore, we strongly believe that some future applications will require the grid approach and that, as a result, further research is required in order to turn this concept into reliable, efficient and user-friendly computing platforms. (c) 2010 Elsevier B.V. All rights reserved.
In this paper, we present a novel fault injection system called ChaosOrca for system calls in containerized applications. ChaosOrca aims at evaluating a given application's self-protection capability with respect to system call errors. The unique feature of ChaosOrca is that it conducts experiments under production-like workload without instrumenting the application. We exhaustively analyze all kinds of system calls and utilize different levels of monitoring techniques to reason about the behaviour under perturbation. We evaluate ChaosOrca on three real-world applications: a file transfer client, a reverse proxy server and a micro-service oriented web application. Our results show that it is promising to detect weaknesses of resilience mechanisms related to system calls issues.
Resource location or discovery is a key issue for Grid systems in which applications are composed of hardware and software resources that need to be located. Classical approaches to Grid resource location are either centralized or hierarchical and will prove inefficient as the scale of Grid systems rapidly increases. On the other hand, the Peer-to-Peer (P2P) paradigm emerged as a Successful model that achieves scalability in distributed systems. One possibility would be to borrow existing methods from the P2P paradigm and to adopt them to Grid systems taking into consideration the existing differences. Several such attempts have been made during the last couple of years. This paper aims to serve as a review of the most promising Grid systems that use P2P techniques to facilitate resource discovery in order to perform a qualitative comparison of the existing approaches and to draw conclusions about their advantages and weaknesses. Future research directions are also discussed.