kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Intrusion Prevention Through Optimal Stopping
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering. KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for Cyber Defence and Information Security CDIS.ORCID iD: 0000-0003-1773-8354
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering. KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for Cyber Defence and Information Security CDIS.ORCID iD: 0000-0001-6039-8493
2022 (English)In: IEEE Transactions on Network and Service Management, E-ISSN 1932-4537, Vol. 19, no 3, p. 2333-2348Article in journal (Refereed) Published
Abstract [en]

We study automated intrusion prevention using reinforcement learning. Following a novel approach, we formulate the problem of intrusion prevention as an (optimal) multiple stopping problem. This formulation gives us insight into the structure of optimal policies, which we show to have threshold properties. For most practical cases, it is not feasible to obtain an optimal defender policy using dynamic programming. We therefore develop a reinforcement learning approach to approximate an optimal threshold policy. We introduce T- SPSA, an efficient reinforcement learning algorithm that learns threshold policies through stochastic approximation. We show that T- SPSA outperforms state-of-the-art algorithms for our use case. Our overall method for learning and validating policies includes two systems: a simulation system where defender policies are incrementally learned and an emulation system where statistics are produced that drive simulation runs and where learned policies are evaluated. We show that this approach can produce effective defender policies for a practical IT infrastructure.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE) , 2022. Vol. 19, no 3, p. 2333-2348
Keywords [en]
Reinforcement learning, Markov processes, Security, Logic gates, Emulation, Dynamic programming, Computational modeling, Network security, automation, optimal stopping, Markov decision process, MDP, POMDP
National Category
Computer Systems Reliability and Maintenance
Identifiers
URN: urn:nbn:se:kth:diva-320997DOI: 10.1109/TNSM.2022.3176781ISI: 000866556800029Scopus ID: 2-s2.0-85130427555OAI: oai:DiVA.org:kth-320997DiVA, id: diva2:1708506
Note

QC 20221104

Available from: 2022-11-04 Created: 2022-11-04 Last updated: 2024-07-04Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Hammar, KimStadler, Rolf

Search in DiVA

By author/editor
Hammar, KimStadler, Rolf
By organisation
Network and Systems EngineeringCentre for Cyber Defence and Information Security CDIS
In the same journal
IEEE Transactions on Network and Service Management
Computer SystemsReliability and Maintenance

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 72 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf