In this ongoing project we employ reinforcement learning in a simulation environment to learn policies for cyber defense. The environment is based on attack graphs produced using the Meta Attack Language, a modeling language used to assess the security of systems. Two RL algorithms are utilized to prevent a simulated attacker agent to reach a series of targets within attack graphs. The defensive agent has to make decisions based on the value of keeping assets enabled, or suffering the consequence of the attacker reaching its goal. The initial results are promising, and show that both algorithms are able to find distinct strategies for defense. However, further analysis is needed to evaluate policy quality, including the implementation of sensible baseline policies for comparison.
Part of proceedings: ISBN 978-1-6654-0601-7
QC 20220926