kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Active Learning for Improvement of Classification of Cyberthreat Actors in Text Fragments
KTH, School of Industrial Engineering and Management (ITM), Learning, Digital Learning.ORCID iD: 0009-0001-1204-6124
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS. FOI Swedish Defence Research Agency, Stockholm, Sweden, SE-164 90.ORCID iD: 0000-0002-2677-9759
FOI Swedish Defence Research Agency, Stockholm, Sweden, SE-164 90.ORCID iD: 0000-0002-3155-8408
2023 (English)In: Proceedings - 22nd IEEE International Conference on Machine Learning and Applications, ICMLA 2023, Institute of Electrical and Electronics Engineers (IEEE) , 2023, p. 1279-1286Conference paper, Published paper (Refereed)
Abstract [en]

In the domain of cybersecurity, machine learning can offer advanced threat detection. However, the volume of unlabeled data poses challenges for efficient data management. This study investigates the potential for active learning to reduce the effort required for manual data labeling. Through different query strategies, the most informative unlabeled data points were selected for labeling. The performance of different query strategies was assessed by testing a transformer model's ability to accurately distinguish tweets mentioning names of advanced persistent threats. The findings suggest that the K-means diversity-based query strategy outperformed both the uncertainty-based approach and the random data point selection, when the amount of labeled training data was limited. This study also evaluated the cost-effective active learning approach, which incorporates high-confidence data points into the training dataset. However, this was shown to be the least effective strategy.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE) , 2023. p. 1279-1286
Keywords [en]
Active learning, advanced persistent threat, cybersecurity, natural language processing
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-350002DOI: 10.1109/ICMLA58977.2023.00193Scopus ID: 2-s2.0-85190143463OAI: oai:DiVA.org:kth-350002DiVA, id: diva2:1882369
Conference
22nd IEEE International Conference on Machine Learning and Applications, ICMLA 2023, Jacksonville, United States of America, Dec 15 2023 - Dec 17 2023
Note

Part of ISBN 9798350345346

QC 20240705

Available from: 2024-07-05 Created: 2024-07-05 Last updated: 2024-08-01Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Carp, AmandaBrynielsson, JoelTegen, Agnes

Search in DiVA

By author/editor
Carp, AmandaBrynielsson, JoelTegen, Agnes
By organisation
Digital LearningTheoretical Computer Science, TCS
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 37 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf