Machine Learning (ML) is the discipline that studies methods for automatically inferring models from data. Machine learning has been successfully applied in many areas of software engineering including: behaviour extraction, testing and bug fixing. Many more applications are yet to be defined. Therefore, a better fundamental understanding of ML methods, their assumptions and guarantees can help to identify and adopt appropriate ML technology for new applications. In this chapter, we present an introductory survey of ML applications in software engineering, classified in terms of the models they produce and the learning methods they use. We argue that the optimal choice of an ML method for a particular application should be guided by the type of models one seeks to infer. We describe some important principles of ML, give an overview of some key methods, and present examples of areas of software engineering benefiting from ML. We also discuss the open challenges for reaching the full potential of ML for software engineering and how ML can benefit from software engineering methods.
In this paper, we present a general methodology to estimate safety related parameter values of cooperative cyber-physical system-of- systems. As a case study, we consider a vehicle platoon model equipped with a novel distributed protocol for coordinated emergency braking. The estimation methodology is based on learning-based testing; which is an approach to automated requirements testing that combines machine learning with model checking.
Our methodology takes into account vehicle dynamics, control algorithm design, inter-vehicle communication protocols and environmental factors such as message packet loss rates. Empirical measurements from road testing of vehicle-to-vehicle communication in a platoon are modeled and used in our case study. We demonstrate that the minimum global time headway for our platoon model equipped with the CEBP function scales well with respect to platoon size.
A system-of-systems (SoS) is inherently open inconfiguration and evolutionary in lifecycle. For the nextgeneration of cooperative cyber-physical system-of-systems,safety and security constitute two key issues of public concernthat affect the deployment and acceptance. In engineering, theopenness and evolutionary nature also entail radical paradigmshifts. This paper presents one novel approach to thedevelopment of qualified cyber-physical system-of-systems, withCooperative Intelligent Transport Systems (C-ITS) as one target.The approach, referred to as knowledge-in-the-loop, aims toallow a synergy of well-managed lifecycles, formal qualityassurance, and smart system features. One research goal is toenable an evolutionary development with continuous andtraceable flows of system rationale from design-time to postdeploymenttime and back, supporting automated knowledgeinference and enrichment. Another research goal is to develop aformal approach to risk-aware dynamic treatment of safety andsecurity as a whole in the context of system-of-systems. Key basetechnologies include: (1) EAST-ADL for the consolidation ofsystem-wide concerns and for the creation of an ontology foradvanced run-time decisions, (2) Learning Based-Testing for runtimeand post-deployment model inference, safety monitoringand testing, (3) Provable Isolation for run-time attack detectionand enforcement of security in real-time operating systems.
We present case studies which show how the paradigm of learning-based testing (LBT) can be successfully applied to black-box requirements testing of industrial reactive systems. For this, we apply a new testing tool LBTest, which combines algorithms for incremental black-box learning of Kripke structures with model checking technology. We show how test requirements can be modeled in propositional linear temporal logic extended by finite data types.We then provide benchmark performance results for LBTest applied to three industrial case studies.
We present case studies which show how the paradigm of learning-based testing (LBT) can be successfully applied to black-box requirements testing of reactive systems. For this we apply a new testing tool LBTest, which combines algorithms for incremental black-box learning of Kripke structures with model checking technology. We show how test requirements can be modeled in propositional linear temporal logic extended by finite abstract data types. We provide benchmark performance results for LBTest applied to two industrial case studies. Finally we present a first coverage study for the tool.
Identification and classification of cell-graph features using graph-neural networks (GNNs) has been shown to be useful in digital pathology. In this work, we consider the role of edge labels in cell-graph modeling, including histological modeling techniques, edge aggregation in GNN architectures, and edge label prediction. We propose EAGNN (Edge Aggregated GNN), a new GNN model that aggregates both node and edge label information to take advantage of topological information about cellular data and facilitate edge label prediction. We introduce new edge label features that improve histological modeling and prediction. We evaluate our EAGNN model for the task of detecting the presence and location of the basement membrane in oral mucosal tissue, as a proof-of-concept application.
Modeling analysis technique is one of the most important means in processing spaceflight mission. In this paper, the graph based description is given, and a series of dynamic-measurement system models are built for the attitude motion of a N-bodies spacecraft, which include the states evolvement equation of the main body, the states evolvement equations of the accessories, the inertia based measurement equations and the celestial bodies based measurement equation and the GPS based measurement equations of attitudes of the N-bodies etc.. These models are educed with some typical methods of theoretical analysis and dynamic description and are simplified into linearized discrete equations. At the end of the paper some valuable applications are pointed out.
The modeling analysis technique is one of the most important approaches in space-flight mission. Based on the dynamical analysis, the copling models of the attitudes with angle velocities were successfully decoupled into two series independently differential equations. And, these two series models were integrated into an extended dynamic-measurement system with joint relationship equations. This new integrated model could be widely used in attitude determination, attitude control and attitude monitoring of the spinning spacecraft. Based on this integrated model, a series of practical algorithms were described to estimate attitude as well as spinning angles velocities with high precision.
Computer Cyclic Check is widely used in many different fields. Some limitations as well as bugs of classical algorithms in the computer cyclic check system are analyzed in detailed. In order to overcome these advantages, two series of new algorithms are set up in this paper, one of which is built for the computer cyclic check system with fixed reference value and the other for the computer cyclic check under operation steering with variant directive trajectory. Theoretical analysis and simulation result shows that these new algorithms are fault-tolerant and safe. These new algorithms can be widely used in engineering to improve safety of computer cyclic check system.
Signal reconstruction is an important step in communication between the computer controlled system and computer system. Based on the famous Shannon reconstructing theory, some classical algorithms (such as ZOH algorithm and 1-order reconstruction algorithm, etc.) were analyzed and some shortages of these algorithms were pointed out. And, a new practical 1-step fault-tolerant prediction algorithm was set up. In order to realize this fault-tolerant algorithm, the three-points-interpolation algorithm was combined into the reconstruction process. This new algorithm has the ability to overcome bad influence from outliers and has very simple computing complexity. So, this algorithm can be safely used in on-line control.
A new kind of linear model with partially variant coefficients is proposed and a series of iterative algorithms are introduced and verified. The new generalized linear model includes the ordinary linear regression model as a special case. The iterative algorithms efficiently overcome some difficulties in computation with multidimensional inputs and incessantly appending parameters. An important application is described at the end of this article, which shows that this new model is reasonable and applicable in practical fields.
Outliers as well as outlier patches, which widely emerge in dynamic process sampling data series, have strong bad influence on signal processing. In this paper, a series of recursive outlier-tolerant fitting algorithms are built to fit reliably the trajectories of a non-stationary sampling process when there are some outliers arising from output components of the process. Based on the recursive outlier-tolerant fitting algorithms stated above, a series of practical programs are given to online detect outliers in dynamic process and to identify magnitudes of these outliers as well as outlier patches. Simulation results show that these new methods are efficient.
Cooperating cyber-physical systems-of-systems (CO-CPS) such as vehicle platoons, robot teams or drone swarms usually have strict safety requirements on both spatial and temporal behavior. Learning-based testing is a combination of machine learning and model checking that has been successfully used for black-box requirements testing of cyber-physical systems-of-systems. We present an overview of research in progress to apply learning-based testing to evaluate spatio-temporal requirements on autonomous systems-of-systems through modeling and simulation.
Learning-based testing (LBT) is an emerging paradigm for fully automated requirements testing. This approach combines machine learning and model-checking techniques for test case generation and verdict construction. LBT is well suited to requirements testing of low-latency safety critical embedded systems, such as can be found in the automotive sector. We evaluate the feasibility and effectiveness of applying LBT to two safety critical industrial automotive applications. We also benchmark our LBT tool against an existing industrial test tool that executes manually written test cases.
We introduce a new methodology for virtualized fault injection testing of safety critical embedded systems. This approach fully automates the key steps of test case generation, fault injection and verdict construction. We use machine learning to reverse engineer models of the system under test. We use model checking to generate test verdicts with respect to safety requirements formalised in temporal logic. We exemplify our approach by implementing a tool chain based on integrating the QEMU hardware emulator, the GNU debugger GDB and the LBTest requirements testing tool. This tool chain is then evaluated on two industrial safety critical applications from the automotive sector.
We explore the application of graph database technology to spatio-temporal model checking of cooperating cyber-physical systems-of-systems such as vehicle platoons. We present a translation of spatio-temporal automata(STA) and the spatio-temporal logic STAL to se-mantically equivalent property graphs and graph queries respectively. We prove a sound reduction of the spatio-temporal verification problem tograph database query solving. The practicability and efficiency of thisapproach is evaluated by introducing NeoMC, a prototype implementation of our explicit model checking approach based on Neo4j. To evaluate NeoMC we consider case studies of verifying vehicle platooning models. Our evaluation demonstrates the effectiveness of our approach in terms of execution time and counterexample detection.
Autonomous driving represents a significant challenge to all software quality assurance techniques, including testing. Generative machine learning (ML) techniques including active ML have considerable potential to generate high quality synthetic test data that can complement and improve on existing techniques such as hardware-in-the-loop and road testing.
We consider black-box testing of functional correctness as a special case of a satisfiability or constraint solving problem. We introduce a general method for solving this problem based on function approximation. We then describe some practical results obtained for an automated testing algorithm using approximation by piecewise polynomial functions.
We introduce a new algorithm for sequential learning of Mealy automata by congruence generator extension (CGE). Our approach makes use of techniques from term rewriting theory and universal algebra for compactly representing and manipulating automata using finite congruence generator sets represented as string rewriting systems (SRS). We prove that the CGE algorithm correctly learns in the limit.
Learning-based testing (LBT) is a paradigm for fully automated requirements testing that combines machine learning with model-checking techniques. LBT has been shown to be effective for unit and integration testing of safety critical components in cyber-physical systems, e.g. automotive ECU software. We consider the challenges faced, and some initial results obtained in an effort to scale up LBT to testing co-operative open cyber-physical systems-of-systems (CO-CPS). For this we focus on a case study of testing safety and performance properties of multi-vehicle platoons.
We present a survey of recent progress in the area of learning-based testing (LBT). The emphasis is primarily on fundamental concepts and theoretical principles, rather than applications and case studies. After surveying the basic principles and a concrete implementation of the approach, we describe recent directions in research such as: quantifying the hardness of learning problems, over-approximation methods for learning, and quantifying the power of model checker generated test cases. The common theme underlying these research directions is seen to be metrics for model convergence. Such metrics enable a precise, general and quantitative approach to both speed of learning and test coverage. Moreover, quantitative approaches to black-box test coverage serve to distinguish LBT from alternative approaches such as random and search-based testing. We conclude by outlining some prospects for future research.
We introduce a necessary and sufficient condition for the omega-extensionality rule of higher-order equational logic to be conservative over first-order many-sorted equational logic for ground first-order equations. This gives a precise condition under which computation in the higher-order initial model by term rewriting is possible. The condition is then generalised to characterise a normal form for higher-order equational proofs in which extensionality inferences occur only as the final proof inferences. The main result is based on a notion of observational equivalence between higher-order elements induced by a topology of finite information on such elements. Applied to extensional higher-order algebras with countable first-order carrier sets, the finite information topology is metric and second countable in every type.
Machine Learning (ML) is the discipline that studies methods for automatically inferring models from data. Machine learning has been successfully applied in many areas of software engineering ranging from behaviour extraction, to testing, to bug fixing. Many more applications are yet be defined. However, a better understanding of ML methods, their assumptions and guarantees would help software engineers adopt and identify the appropriate methods for their desired applications. We argue that this choice can be guided by the models one seeks to infer. In this technical briefing, we review and reflect on the applications of ML for software engineering organised according to the models they produce and the methods they use. We introduce the principles of ML, give an overview of some key methods, and present examples of areas of software engineering benefiting from ML. We also discuss the open challenges for reaching the full potential of ML for software engineering and how ML can benefit from software engineering methods.
As a methodology for system design and testing, use cases are well-known and widely used. While current active machine learning (ML) algorithms can effectively automate unit testing, they do not scale up to use case testing of complex systems in an efficient way. We present a new parallel distributed processing (PDP) architecture for a constrained active machine learning (CAML) approach to use case testing. To exploit CAML we introduce a use case modeling language with: (i) compile-time constraints on query generation, and (ii) run-time constraints using dynamic constraint checking. We evaluate this approach by applying a prototype implementation of CAML to use case testing of simulated multi-vehicle autonomous driving scenarios.
We present an application of learning-based testing to theproblem of automated test case generation (ATCG) for numerical soft-ware. Our approach uses n-dimensional polynomial models as an algo-rithmically learned abstraction of the SUT which supports n-wise testing.Test cases are iteratively generated by applying a satisfiability algorithmto first-order program specifications over real closed fields and iterativelyrefined piecewise polynomial models.We benchmark the performance of our iterative ATCG algorithm againstiterative random testing, and empirically analyse its performance in find-ing injected errors in numerical codes. Our results show that for softwarewith small errors, or long mean time to failure, learning-based testing isincreasingly more efficient than iterative random testing.
We present a new algorithm ICGE for incremental learning of extended Mealy automata computing over abstract data types. Our approach extends and refines our previous research on congruence generator extension (CGE) as an algebraic approach to automaton learning. In the congruence generator approach, confluent terminating string rewriting systems (SRS) are used to represent hypothesis automata. We show how an approximating sequence R 0 , R 1 , ... of confluent terminating SRS can be directly and incrementally generated from observations about the loop structure of an unknown automaton A. Such an approximating sequence converges finitely if A is finite state, and converges in the limit if A is an infinite state automaton.
We show how the paradigm of learning-based testing (LBT) can be applied to automate specification-based black-box testing of reactive systems using term rewriting technology. A general model for a reactive system can be given by an extended Mealy automata (EMA) over an abstract data type (ADT).A finite state EMA over an ADT can be efficiently learned in polynomial time using the CGE regular inference algorithm, which builds a compact representation as a complete term rewriting system. We show how this rewriting system can be used to model check the learned automaton against a temporal logic specification by means of narrowing. Combining CGE learning with a narrowing model checker we obtain a new and general architecture for learning-based testing of reactive systems. We compare the performance of this LBT architecture against random testing using a case study.
We present an overview of the paradigm of learning-based testing (LBT) for software systems. LBT is a fully automated method for specification-based black-box testing using computational learning principles. It applies the principle of tests as queries, where queries are either generated by a learning algorithm or by a model checker through use of a formal requirements specification. LBT can be applied to automate black-box testing of a variety of different software architectures including procedural and reactive systems. We describe some different testing platforms which have been designed using this paradigm and some representative evaluation results. We also compare LBT with related testing methods.
We report on early results in a long term project to apply learning-based testing (LBT) to evaluate the functional correctness of distributed systems and their robustness to injected faults. We present a case study of a commercial product for counter-party credit risk implemented as a distributed microservice architecture. We describe the experimental set-up, as well as test results. From this experiment, we draw some conclusions about prospects for future research in learning-based testing.
Learning-based testing (LBT) is an emerging methodology to automate iterative black-box requirements testing of software systems. The methodology involves combining model inference with model checking techniques. However, a variety of optimisations on model inference are necessary in order to achieve scalable testing for large systems.
In this paper we describe the IKL learning algorithm which is an active incremental learning algorithm for deterministic Kripke structures. We formally prove the correctness of IKL. We discuss the optimisations it incorporates to achieve scalability of testing. We also evaluate a black box heuristic for test termination based on convergence of IKL learning.
We present a new algorithm IDS for incremental learning of deterministic finite automata (DFA). This algorithm is based on the concept of distinguishing sequences introduced in (Angluin, 1981). We give a rigorous proof that two versions of this learning algorithm correctly learn in the limit. Finally we present an empirical performance analysis that compares these two algorithms, focussing on learning times and different types of learning queries. We conclude that IDS is an efficient algorithm for software engineering applications of automata learning, such as testing and model inference.
We present a new algorithm IDSfor incremental learning of deterministic finite automata (DFA). This algorithm is based on the concept of distinguishing sequences introduced in [Angluin 1981]. We give a rigorous proof that two versions of this learning algorithm correctly learn in the limit. Finally we present an empirical performance analysis that compares these two algorithms, focussing on learning times and different types of learning queries. We conclude that IDSis an efficient algorithm for software engineering applications of automata learning, such as testing and model inference.
We show how the paradigm of learning-based testing (LBT)can be applied to automate specification-based black-box testing of reactivesystems. Since reactive systems can be modeled as Kripke structures,we introduce an efficient incremental learning algorithm IKL forsuch structures. We show how an implementation of this algorithm combinedwith an efficient model checker such as NuSMV yields an effectivelearning-based testing architecture for automated test case generation(ATCG), execution and evaluation, starting from temporal logic requirements.
We give an introduction to the LBTest tool which implements learning-based testing for reactive systems. It makes use of incremental learning and model checking algorithms to automate: i) test case generation, ii) test execution and iii) test verdict construction. The paper illustrates the tool by means of a pedagogical case study, to enable the user to setup and learn the tool quickly. We provide a usability exercise to support tool evaluation.
We present two case studies which illustrate the use of second-order algebra as a formalism for specification and verification of hardware algorithms. In the first case study we specify a systolic algorithm for convolution and formally verify its correctness using second-order equational logic. The second case study demonstrates the expressive power of second-order algebraic specifications by presenting a non-constructive specification of the Hamming stream problem. A dataflow algorithm for computing the Hamming stream is then specified and the correctness of this algorithm is verified by semantical methods. Both case studies illustrate aspects of the metatheory of second-order equational logic.
Model-based software testing is well established, and can be traced back to Moore's "Gedanken experiments" on finite state machines from 1956 [10]. The best known approaches involve the use of models (such as UML interaction diagrams or state machines) as the basis for selecting test inputs that seek to explore the core functionality of the system. Outputs from the test executions can subsequently be checked against the model.
An oracle is used in software testing to derive the verdict (pass/fail) for a test case. Lack of precise test oracles is one of the major problems in software testing which can hinder judgements about quality. Metamorphic testing is an emerging technique which solves both the oracle problem and the test case generation problem by testing special forms of software requirements known as metamorphic requirements. However, manually deriving the metamorphic requirements for a given program requires a high level of domain expertise, is labor intensive and error prone. As an alternative, we consider the problem of automatic detection of metamorphic requirements using machine learning (ML). For this problem we can apply graph kernels and support vector machines (SVM). A significant problem for any ML approach is to obtain a large labeled training set of data (in this case programs) that generalises well. The main contribution of this paper is a general method to generate large volumes of synthetic training data which can improve ML assisted detection of metamorphic requirements. For training data synthesis we adopt mutation testing techniques. This research is the first to explore the area of data augmentation techniques for ML-based analysis of software code. We also have the goal to enhance black-box testing using white-box methodologies. Our results show that the mutants incorporated into the source code corpus not only efficiently scale the dataset size, but they can also improve the accuracy of classification models.
Background Histological feature representation is advantageous for computer aided diagnosis (CAD) and disease classification when using predictive techniques based on machine learning. Explicit feature representations in computer tissue models can assist explainability of machine learning predictions. Different approaches to feature representation within digital tissue images have been proposed. Cell-graphs have been demonstrated to provide precise and general constructs that can model both low- and high-level features. The basement membrane is high-level tissue architecture, and interactions across the basement membrane are involved in multiple disease processes. Thus, the basement membrane is an important histological feature to study from a cell-graph and machine learning perspective. Results We present a two stage machine learning pipeline for generating a cell-graph from a digital H &E stained tissue image. Using a combination of convolutional neural networks for visual analysis and graph neural networks exploiting node and edge labels for topological analysis, the pipeline is shown to predict both low- and high-level histological features in oral mucosal tissue with good accuracy. Conclusions Convolutional and graph neural networks are complementary technologies for learning, representing and predicting local and global histological features employing node and edge labels. Their combination is potentially widely applicable in histopathology image analysis and can enhance explainability in CAD tools for disease prediction.
Background: Program similarity is a fundamental concept, central to the solution of software engineering tasks such as software plagiarism, clone identification, code refactoring and code search. Accurate similarity estimation between programs requires an in-depth understanding of their structure, semantics and flow. A control flow graph (CFG), is a graphical representation of a program which captures its logical control flow and hence its semantics. A common approach is to estimate program similarity by analysing CFGs using graph similarity measures, e.g. graph edit distance (GED). However, graph edit distance is an NP-hard problem and computationally expensive, making the application of graph similarity techniques to complex software programs impractical. Aim: This study intends to examine the effectiveness of graph neural networks to estimate program similarity, by analysing the associated control flow graphs. Method: We introduce funcGNN1, which is a graph neural network trained on labeled CFG pairs to predict the GED between unseen program pairs by utilizing an effective embedding vector. To our knowledge, this is the first time graph neural networks have been applied on labeled CFGs for estimating the similarity between highlevel language programs. Results: We demonstrate the effectiveness of funcGNN to estimate the GED between programs and our experimental analysis demonstrates how it achieves a lower error rate (1.94 ×10-3), with faster (23 times faster than the quickest traditional GED approximation method) and better scalability compared with state of the art methods. Conclusion: funcGNN posses the inductive learning ability to infer program structure and generalise to unseen programs. The graph embedding of a program proposed by our methodology could be applied to several related software engineering problems (such as code plagiarism and clone identification) thus opening multiple research directions.
To test the boundary values of input variables is very useful but difficult when designing and debugging a software system for scientific computations. In this paper, the model-based method and the simulation-based method are integrated to form a practical approach to deal with this troublesome problem of boundary testing. After selectively building a model library, the testing algorithms are designed in detail are used in designing and evaluating the systems for exterior tracking and post processing of measurement data.
This thesis concerns the design, implementation and evaluation of a specification based testing architecture for reactive systems using the paradigm of learning-based testing. As part of this work we have designed, verified and implemented new incremental learning algorithms for DFA and Kripke structures.These have been integrated with the NuSMV model checker to give a new learning-based testing architecture. We have evaluated our architecture on case studies and shown that the method is effective.
We present a range of testing techniques for the Abstract Behavioral Specification (ABS) language and apply them to an industrial case study. ABS is a formal modeling language for highly variable, concurrent, component-based systems. The nature of these systems makes them susceptible to the introduction of subtle bugs that are hard to detect in the presence of steady adaptation. While static analysis techniques are available for an abstract language such as ABS, testing is still indispensable and complements analytic methods. We focus on fully automated testing techniques including black-box and glass-box test generation as well as runtime assertion checking, which are shown to be effective in an industrial setting.