kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Synthesis of Opacity-Enforcing Supervisory Strategies Using Reinforcement Learning
Guangxi Normal University.ORCID iD: 0000-0002-5945-9161
Key Laboratory of Education Blockchain and Intelligent Technology, Ministry of Education, and Guangxi Key Laboratory of Multi-Source Information Mining and Security, Guangxi Normal University, Guilin, China.
Key Laboratory of Education Blockchain and Intelligent Technology, Ministry of Education, and Guangxi Key Laboratory of Multi-Source Information Mining and Security, Guangxi Normal University, Guilin, China.
KTH, School of Industrial Engineering and Management (ITM), Engineering Design, Mechatronics and Embedded Control Systems.ORCID iD: 0000-0001-5703-5923
Show others and affiliations
2024 (English)In: IEEE Transactions on Automation Science and Engineering, ISSN 1545-5955, E-ISSN 1558-3783, p. 1-11Article in journal (Refereed) Epub ahead of print
Abstract [en]

In the control of discrete-event systems for current-state opacity enforcement, it is difficult to synthesize a supervisor by supervisory control theory (SCT) without explicit formal models of the systems. This study utilizes the reinforcement learning (RL) method to obtain supervisory policies for opacity enforcement in the case when the automaton model of the system is unavailable. The state space of the environment in the RL is dynamically generated through system simulation. Actions are defined according to the control patterns of the SCT. A reward function is proposed to evaluate whether the secret is exposed or not. Then, a sequence of state-action-reward chains are obtained as system simulation goes on. The frameworks of Q-learning and State-Action-Reward-State-Action (SARSA) algorithms are adopted to implement the proposed approach. The goal of the training is to maximize the total accumulative reward by optimizing the action selection in the learning process. Then, an optimal supervisory policy is obtained when the training process converges. Experiments are performed to illustrate the effectiveness of the proposed approach. The contributions are two aspects. Firstly, a supervisor for opacity enforcement is learned by RL training without an explicit formal model of the system. Secondly, the ability of the proposed method in computing supervisory policies without formal models addresses a significant gap in the literature and offers a new direction for research in opacity enforcement in discrete event systems. Note to Practitioners —Supervisory Control Theory (SCT) supplies an effective way to synthesize supervisors, which traditionally handles tasks with explicit system models for current-state opacity enforcement by restricting behavior of systems. However, formal models of systems are often confidential or otherwise unavailable. This paper presents a method for supervisor synthesis via reinforcement learning in the case of lacking formal models of systems. The proposed method leverages the characteristics of control patterns in SCT and optimizes the action selection in the training process through a reward mechanism that evaluates the secrecy of states. The approach can be applied to model-free RL frameworks such as Q-learning and State-Action-Reward-State-Action (SARSA) algorithms. The training is performed as the system simulation goes on. When the training process converges, the optimal policy can be used to enforce opacity for the system. However, Q-table is used to save the Q-value in both Q-learning and SARSA algorithms. In the worst case, the size of the Q-table grows exponentially with the number of states and controllable events increasing. This can lead to memory exhaustion when the system’s scale is large. To make the approach scalable, we will attempt to use DRL to train control policies for opacity enforcement in the future.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE) , 2024. p. 1-11
Keywords [en]
Discrete event systems; current-state opacity; Supervisory control theory; Reinforcement learning
National Category
Computer Engineering Control Engineering
Research subject
Computer Science; Information and Communication Technology
Identifiers
URN: urn:nbn:se:kth:diva-357837DOI: 10.1109/tase.2024.3456239ISI: 001317678100001Scopus ID: 2-s2.0-105001068382OAI: oai:DiVA.org:kth-357837DiVA, id: diva2:1922172
Funder
XPRES - Initiative for excellence in production research
Note

QC 20241219

Available from: 2024-12-18 Created: 2024-12-18 Last updated: 2025-05-27Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Feng, Lei

Search in DiVA

By author/editor
Zhang, HuiminFeng, Lei
By organisation
Mechatronics and Embedded Control Systems
In the same journal
IEEE Transactions on Automation Science and Engineering
Computer EngineeringControl Engineering

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 32 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf