Morris, K.; Snook, C.; Hoang, T.S.; Hulette, G.; Armstrong, Robert C.; Butler, M.
State chart notations with ‘run to completion’ semantics are popular with engineers for designing controllers that react to environment events with a sequence of state transitions but lack formal refinement and rigorous verification methods. State chart models are typically used to design complex control systems that respond to environmental triggers with a sequential process. The model is usually constructed at a concrete level and verified and validated using animation techniques relying on human judgement. Event-B, on the other hand, is based on refinement from an initial abstraction and is designed to make formal verification by automatic theorem provers feasible. Abstraction and formal verification provide greater assurance that critical (e.g. safety or security) properties are not violated by the control system. In this paper, we introduce a notion of refinement into a ‘run to completion’ state chart modelling notation and leverage Event-B’s tool support for theorem proving. We describe the difficulties in translating ‘run to completion’ semantics into Event-B refinements and suggest a solution. We illustrate our approach and show how models can be validated at different refinement levels using our scenario checker animation tools. We show how critical invariant properties can be verified by proof despite the reactive nature of the system and how behavioural aspects of the system can be verified by testing the expected reactions using a temporal logic, model checking approach. To verify liveness, we outline a proof that the run to completion is deadlock-free and converges to complete the run.
Proceedings of FTXS 2020: Fault Tolerance for HPC at eXtreme Scale, Held in conjunction with SC 2020: The International Conference for High Performance Computing, Networking, Storage and Analysis
Benefits of local recovery (restarting only a failed process or task) have been previously demonstrated in parallel solvers. Local recovery has a reduced impact on application performance due to masking of failure delays (for message-passing codes) or dynamic load balancing (for asynchronous many-task codes). In this paper, we implement MPI-process-local checkpointing and recovery of data (as an extension of the Fenix library) in combination with an existing method for local detection of silent errors in partial-differential-equation solvers, to show a path for incorporating lightweight silent-error resilience. In addition, we demonstrate how asynchrony introduced by maximizing computation-communication overlap can halt the propagation of delays. For a prototype stencil solver (including an iterative-solver-like variant) with injected memory bit flips, results show greatly reduced overhead under weak scaling compared to global recovery, and high failure-masking efficiency. The approach is expected to be generalizable to other MPI-based solvers.
Although popular in industry, state-chart notations with ‘run to completion’ semantics lack formal refinement and rigorous verification methods. State-chart models are typically used to design complex control systems that respond to environmental triggers with a sequential process. The model is usually constructed at a concrete level and verified and validated using animation techniques relying on human judgement. Event-B, on the other hand, is based on refinement from an initial abstraction and is designed to make formal verification by automatic theorem provers feasible. We introduce a notion of refinement into a ‘run to completion’ statechart modelling notation, and leverage Event-B ’s tool support for theorem proving. We describe the difficulties in translating ‘run to completion’ semantics into Event-B refinements and suggest a solution. We illustrate our approach and show how critical (e.g. safety) invariant properties can be verified by proof despite the reactive nature of the system. We also show how behavioural aspects of the system can be verified by testing the expected reactions using a temporal logic model checking approach.
Statechart notations with ‘run to completion’ semantics, are popular with engineers for designing controllers that respond to events in the environment with a sequence of state transitions. However, they lack formal refinement and rigorous verification methods., on the other hand, is based on refinement from an initial abstraction and is designed to make formal verification by automatic theorem provers feasible. We introduce a notion of refinement into a ‘run to completion’ statechart modelling notation, and leveragetool support for theorem proving. We describe the difficulties in translating ‘run to completion’ semantics intorefinements and suggest a solution. We outline how safety and liveness properties could be verified.
We discuss techniques for efficient local detection of silent data corruption in parallel scientific computations, leveraging physical quantities such as momentum and energy that may be conserved by discretized PDEs. The conserved quantities are analogous to “algorithm-based fault tolerance” checksums for linear algebra but, due to their physical foundation, are applicable to both linear and nonlinear equations and have efficient local updates based on fluxes between subdomains. These physics-based checksums enable precise intermittent detection of errors and recovery by rollback to a checkpoint, with very low overhead when errors are rare. We present applications to both explicit hyperbolic and iterative elliptic (unstructured finite-element) solvers with injected memory bit flips.
While most software development for control systems is directed at what the system is supposed to do (i.e., function), high-consequence controls must account for what the system is not supposed to do (i.e., safety, security and reliability requirements). A Domain Specific Language (DSL) for high-consequence digital controls is proposed. As with similar tools for the design of controls, the DSL will have plug-in modules for common controller functions. However, the DSL will also augment these modules with attendant "templates" that aid in the proof of safety, security and reliability requirements, not available in current tools. The object is to create a development methodology that makes construction of high-assurance control systems as easy as controls that are designed for function alone.
In this article, we describe a prototype cosimulation framework using Xyce, GHDL and CocoTB that can be used to analyze digital hardware designs in out-of-nominal environments. We demonstrate current software methods and inspire future work via analysis of an open-source encryption core design. Note that this article is meant as a proof-of-concept to motivate integration of general cosimulation techniques with Xyce, an open-source circuit simulator. ------------------------------------------------
Verifying that hardware design implementations adhere to specifications is a time intensive and sometimes intractable problem due to the massive size of the system's state space. Formal methods techniques can be used to prove certain tractable specification properties; however, they are expensive, and often require subject matter experts to develop and solve. Nonetheless, hardware verification is a critical process to ensure security and safety properties are met, and encapsulates problems associated with trust and reliability. For complex designs where coverage of the entire state space is unattainable, prioritizing regions most vulnerable to security or reliability threats would allow efficient allocation of valuable verification resources. Stackelberg security games model interactions between a defender, whose goal is to assign resources to protect a set of targets, and an attacker, who aims to inflict maximum damage on the targets after first observing the defender's strategy. In equilibrium, the defender has an optimal security deployment strategy, given the attacker's best response. We apply this Stackelberg security framework to synthesized hardware implementations using the design's network structure and logic to inform defender valuations and verification costs. The defender's strategy in equilibrium is thus interpreted as a prioritization of the allocation of verification resources in the presence of an adversary. We demonstrate this technique on several open-source synthesized hardware designs.
We present a characterization of short-term stability of Kauffman's NK (random) Boolean networks under arbitrary distributions of transfer functions. Given such a Boolean network where each transfer function is drawn from the same distribution, we present a formula that determines whether short-term chaos (damage spreading) will happen. Our main technical tool which enables the formal proof of this formula is the Fourier analysis of Boolean functions, which describes such functions as multilinear polynomials over the inputs. Numerical simulations on mixtures of threshold functions and nested canalyzing functions demonstrate the formula's correctness.