Synthesis of Opacity-Enforcing Supervisory Strategies Using Reinforcement LearningShow others and affiliations
2024 (English)In: IEEE Transactions on Automation Science and Engineering, ISSN 1545-5955, E-ISSN 1558-3783, p. 1-11Article in journal (Refereed) Epub ahead of print
Abstract [en]
In the control of discrete-event systems for current-state opacity enforcement, it is difficult to synthesize a supervisor by supervisory control theory (SCT) without explicit formal models of the systems. This study utilizes the reinforcement learning (RL) method to obtain supervisory policies for opacity enforcement in the case when the automaton model of the system is unavailable. The state space of the environment in the RL is dynamically generated through system simulation. Actions are defined according to the control patterns of the SCT. A reward function is proposed to evaluate whether the secret is exposed or not. Then, a sequence of state-action-reward chains are obtained as system simulation goes on. The frameworks of Q-learning and State-Action-Reward-State-Action (SARSA) algorithms are adopted to implement the proposed approach. The goal of the training is to maximize the total accumulative reward by optimizing the action selection in the learning process. Then, an optimal supervisory policy is obtained when the training process converges. Experiments are performed to illustrate the effectiveness of the proposed approach. The contributions are two aspects. Firstly, a supervisor for opacity enforcement is learned by RL training without an explicit formal model of the system. Secondly, the ability of the proposed method in computing supervisory policies without formal models addresses a significant gap in the literature and offers a new direction for research in opacity enforcement in discrete event systems. Note to Practitioners —Supervisory Control Theory (SCT) supplies an effective way to synthesize supervisors, which traditionally handles tasks with explicit system models for current-state opacity enforcement by restricting behavior of systems. However, formal models of systems are often confidential or otherwise unavailable. This paper presents a method for supervisor synthesis via reinforcement learning in the case of lacking formal models of systems. The proposed method leverages the characteristics of control patterns in SCT and optimizes the action selection in the training process through a reward mechanism that evaluates the secrecy of states. The approach can be applied to model-free RL frameworks such as Q-learning and State-Action-Reward-State-Action (SARSA) algorithms. The training is performed as the system simulation goes on. When the training process converges, the optimal policy can be used to enforce opacity for the system. However, Q-table is used to save the Q-value in both Q-learning and SARSA algorithms. In the worst case, the size of the Q-table grows exponentially with the number of states and controllable events increasing. This can lead to memory exhaustion when the system’s scale is large. To make the approach scalable, we will attempt to use DRL to train control policies for opacity enforcement in the future.
Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE) , 2024. p. 1-11
Keywords [en]
Discrete event systems; current-state opacity; Supervisory control theory; Reinforcement learning
National Category
Computer Engineering Control Engineering
Research subject
Computer Science; Information and Communication Technology
Identifiers
URN: urn:nbn:se:kth:diva-357837DOI: 10.1109/tase.2024.3456239ISI: 001317678100001Scopus ID: 2-s2.0-105001068382OAI: oai:DiVA.org:kth-357837DiVA, id: diva2:1922172
Funder
XPRES - Initiative for excellence in production research
Note
QC 20241219
2024-12-182024-12-182025-05-27Bibliographically approved