kth.sePublications KTH
Change search
Link to record
Permanent link

Direct link
Publications (4 of 4) Show all publications
Wang, X. & Stadler, R. (2024). IT Intrusion Detection Using Statistical Learning and Testbed Measurements. In: Proceedings of IEEE/IFIP Network Operations and Management Symposium 2024, NOMS 2024: . Paper presented at 2024 IEEE/IFIP Network Operations and Management Symposium, NOMS 2024, Seoul, Korea, May 6 2024 - May 10 2024. Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>IT Intrusion Detection Using Statistical Learning and Testbed Measurements
2024 (English)In: Proceedings of IEEE/IFIP Network Operations and Management Symposium 2024, NOMS 2024, Institute of Electrical and Electronics Engineers (IEEE) , 2024Conference paper, Published paper (Refereed)
Abstract [en]

We study automated intrusion detection in an IT infrastructure, specifically the problem of identifying the start of an attack, the type of attack, and the sequence of actions an attacker takes, based on continuous measurements from the infrastructure. We apply statistical learning methods, including Hidden Markov Model (HMM), Long Short-Term Memory (LSTM), and Random Forest Classifier (RFC) to map sequences of observations to sequences of predicted attack actions. In contrast to most related research, we have abundant data to train the models and evaluate their predictive power. The data comes from traces we generate on an in-house testbed where we run attacks against an emulated IT infrastructure. Central to our work is a machine-learning pipeline that maps measurements from a high-dimensional observation space to a space of low dimensionality or to a small set of observation symbols. Investigating intrusions in offline as well as online scenarios, we find that both HMM and LSTM can be effective in predicting attack start time, attack type, and attack actions. If sufficient training data is available, LSTM achieves higher prediction accuracy than HMM. HMM, on the other hand, requires less computational resources and less training data for effective prediction. Also, we find that the methods we study benefit from data produced by traditional intrusion detection systems like SNORT.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Keywords
automated security, forensics, Hidden Markov Model, intrusion detection, Long Short-Term Memory, SNORT
National Category
Computer Engineering
Identifiers
urn:nbn:se:kth:diva-351006 (URN)10.1109/NOMS59830.2024.10575087 (DOI)001270140300036 ()2-s2.0-85198353664 (Scopus ID)
Conference
2024 IEEE/IFIP Network Operations and Management Symposium, NOMS 2024, Seoul, Korea, May 6 2024 - May 10 2024
Note

QC 20241007

Part of ISBN 9798350327939

Available from: 2024-07-24 Created: 2024-07-24 Last updated: 2024-10-07Bibliographically approved
Wang, X. & Stadler, R. (2022). Online Feature Selection for Efficient Learning in Networked Systems. IEEE Transactions on Network and Service Management, 19(3), 2885-2898
Open this publication in new window or tab >>Online Feature Selection for Efficient Learning in Networked Systems
2022 (English)In: IEEE Transactions on Network and Service Management, E-ISSN 1932-4537, Vol. 19, no 3, p. 2885-2898Article in journal (Refereed) Published
Abstract [en]

Current AI/ML methods for data-driven engineering use models that are mostly trained offline. Such models can be expensive to build in terms of communication and computing costs, and they rely on data that is collected over extended periods of time. Further, they become out-of-date when changes in the system occur. To address these challenges, we investigate online learning techniques that automatically reduce the number of available data sources for model training. We present an online algorithm called Online Stable Feature Set Algorithm (OSFS), which selects a small feature set from a large number of available data sources after receiving a small number of measurements. The algorithm is initialized with a feature ranking algorithm, a feature set stability metric, and a search policy. We perform an extensive experimental evaluation of this algorithm using traces from an in-house testbed and from two external datasets. We find that OSFS achieves a massive reduction in the size of the feature set by 1-3 orders of magnitude on all investigated datasets. Most importantly, we find that the accuracy of a predictor trained on a OSFS-produced feature set is somewhat better than when the predictor is trained on a feature set obtained through offline feature selection. OSFS is thus shown to be effective as an online feature selection algorithm and robust regarding the sample interval used for feature selection. We also find that, when concept drift in the data underlying the model occurs, its effect can be mitigated by recomputing the feature set and retraining the prediction model.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2022
Keywords
Feature extraction, Data models, Computational modeling, Predictive models, Task analysis, Soft sensors, Monitoring, Data-driven engineering, machine learning (ML), dimensionality reduction, online learning, online feature selection
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-321009 (URN)10.1109/TNSM.2022.3180936 (DOI)000866556800068 ()2-s2.0-85131743339 (Scopus ID)
Note

QC 20251002

Available from: 2022-11-04 Created: 2022-11-04 Last updated: 2025-10-02Bibliographically approved
Wang, X., Samani, F. S., Johnsson, A. & Stadler, R. (2021). Online Feature Selection for Low-overhead Learning in Networked Systems. In: Chemouil, P Ulema, M Clayman, S Sayit, M Cetinkaya, C Secci, S (Ed.), Proceedings of the 2021 17th International Conference on Network and Service Management: Smart Management for Future Networks and Services, CNSM 2021. Paper presented at 17th International Conference on Network and Service Management, CNSM 2021, Online/Virtual, 25-29 October 2021 (pp. 527-529). Institute of Electrical and Electronics Engineers Inc.
Open this publication in new window or tab >>Online Feature Selection for Low-overhead Learning in Networked Systems
2021 (English)In: Proceedings of the 2021 17th International Conference on Network and Service Management: Smart Management for Future Networks and Services, CNSM 2021 / [ed] Chemouil, P Ulema, M Clayman, S Sayit, M Cetinkaya, C Secci, S, Institute of Electrical and Electronics Engineers Inc. , 2021, p. 527-529Conference paper, Published paper (Refereed)
Abstract [en]

Data-driven functions for operation and management require measurements and readings from distributed data sources for model training and prediction. While the number of candidate data sources can be very large, research has shown that it is often possible to reduce the number of data sources significantly while still allowing for accurate prediction. Consequently, there is potential to lower communication and computing resources needed to continuously extract, collect, and process this data. We demonstrate the operation of a novel online algorithm called OSFS, which sequentially processes the collected data and reduces the number of data sources for training prediction models. OSFS builds on two main ideas, namely (1) ranking the available data sources using (unsupervised) feature selection algorithms and (2) identifying stable feature sets that include only the top features. The demonstration shows the search space exploration, the iterative selection of feature sets, and the evaluation of the stability of these sets. The demonstration uses measurements collected from a KTH testbed, and the predictions relate to end-to-end KPIs for network services. 

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers Inc., 2021
Series
International Conference on Network and Service Management, ISSN 2165-9605
Keywords
Data-driven Engineering, Feature Selection, Machine Learning, Network Management, Forecasting, Information management, Iterative methods, Online systems, Space research, Data driven, Data-source, Features selection, Features sets, Low overhead, Machine-learning, Networks management, Number of datum, Online feature selection, Feature extraction
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-316390 (URN)10.23919/CNSM52442.2021.9615548 (DOI)000836226700084 ()2-s2.0-85123422408 (Scopus ID)
Conference
17th International Conference on Network and Service Management, CNSM 2021, Online/Virtual, 25-29 October 2021
Note

Part of proceedings: ISBN 978-3-903176-36-2

QC 20220816

Available from: 2022-08-16 Created: 2022-08-16 Last updated: 2024-06-10Bibliographically approved
Wang, X., Samani, F. S. & Stadler, R. (2020). Online feature selection for rapid, low-overhead learning in networked systems. In: ZincirHeywood, N Ulema, M Sayit, M Clayman, S Kim, MS Cetinkaya, C (Ed.), 2020 16th international conference on network and service management (CNSM): . Paper presented at 16th International Conference on Network and Service Management (CNSM) / 2nd International Workshop on Analytics for Service and Application Management (AnServApp) / 1st International Workshop on the Future Evolution of Internet Protocols (IPFuture), NOV 02-06, 2020, ELECTR NETWORK. IEEE
Open this publication in new window or tab >>Online feature selection for rapid, low-overhead learning in networked systems
2020 (English)In: 2020 16th international conference on network and service management (CNSM) / [ed] ZincirHeywood, N Ulema, M Sayit, M Clayman, S Kim, MS Cetinkaya, C, IEEE , 2020Conference paper, Published paper (Refereed)
Abstract [en]

Data-driven functions for operation and management often require measurements collected through monitoring for model training and prediction. The number of data sources can be very large, which requires a significant communication and computing overhead to continuously extract and collect this data, as well as to train and update the machine-learning models. We present an online algorithm, called OSFS, that selects a small feature set from a large number of available data sources, which allows for rapid, low-overhead, and effective learning and prediction. OSFS is instantiated with a feature ranking algorithm and applies the concept of a stable feature set, which we introduce in the paper. We perform extensive, experimental evaluation of our method on data from an in-house testbed. We find that OSFS requires several hundreds measurements to reduce the number of data sources by two orders of magnitude, from which models are trained with acceptable prediction accuracy. While our method is heuristic and can be improved in many ways, the results clearly suggests that many learning tasks do not require a lengthy monitoring phase and expensive offline training.

Place, publisher, year, edition, pages
IEEE, 2020
Series
International Conference on Network and Service Management, ISSN 2165-9605
Keywords
Data-driven engineering, Machine learning (ML), Dimensionality reduction
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-291038 (URN)10.23919/CNSM50824.2020.9269066 (DOI)000612229200029 ()2-s2.0-85098668191 (Scopus ID)
Conference
16th International Conference on Network and Service Management (CNSM) / 2nd International Workshop on Analytics for Service and Application Management (AnServApp) / 1st International Workshop on the Future Evolution of Internet Protocols (IPFuture), NOV 02-06, 2020, ELECTR NETWORK
Note

QC 20210303

Available from: 2021-03-03 Created: 2021-03-03 Last updated: 2024-06-10Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-2414-3108

Search in DiVA

Show all publications