kth.sePublications
Change search
Refine search result
1 - 13 of 13
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Samani, Forough Shahab
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering.
    Hammar, Kim
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering.
    Stadler, Rolf
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering.
    Demonstrating a System for Dynamically Meeting Management Objectives on a Service Mesh2023In: Proceedings of IEEE/IFIP Network Operations and Management Symposium 2023, NOMS 2023, Institute of Electrical and Electronics Engineers (IEEE) , 2023Conference paper (Refereed)
    Abstract [en]

    We demonstrate a management system that lets a service provider achieve end-to-end management objectives under varying load for applications on a service mesh based on the Istio and Kubernetes platforms. The management objectives for the demonstration include end-to-end delay bounds on service requests, throughput objectives, and service differentiation. Our method for finding effective control policies includes a simulator and a control module. The simulator is instantiated with traces from a testbed, and the control module trains a reinforcement learning (RL) agent to efficiently learn effective control policies on the simulator. The learned policies are then transfered to the testbed to perform dynamic control actions based on monitored system metrics. We show that the learned policies dynamically meet management objectives on the testbed and can be changed on the fly.

  • 2.
    Samani, Forough Shahab
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering.
    Hammar, Kim
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering.
    Stadler, Rolf
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering.
    Online Policy Adaptation for Networked Systems using Rollout2024In: Proceedings of IEEE/IFIP Network Operations and Management Symposium 2024, NOMS 2024, Institute of Electrical and Electronics Engineers (IEEE) , 2024Conference paper (Refereed)
    Abstract [en]

    Dynamic resource allocation in networked systems is needed to continuously achieve end-to-end management objectives. Recent research has shown that reinforcement learning can achieve near-optimal resource allocation policies for realistic system configurations. However, most current solutions require expensive retraining when changes in the system occur. We address this problem and introduce an efficient method to adapt a given base policy to system changes, e.g., to a change in the service offering. In our approach, we adapt a base control policy using a rollout mechanism, which transforms the base policy into an improved rollout policy. We perform extensive evaluations on a testbed where we run applications on a service mesh based on the Istio and Kubernetes platforms. The experiments provide insights into the performance of different rollout algorithms. We find that our approach produces policies that are equally effective as those obtained by offline retraining. On our testbed, effective policy adaptation takes seconds when using rollout, compared to minutes or hours when using retraining. Our work demonstrates that rollout, which has been applied successfully in other domains, is an effective approach for policy adaptation in networked systems.

  • 3.
    Samani, Forough Shahab
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering.
    Larsson, Hannes
    Ericsson Research, Sweden.
    Damberg, Simon
    Ericsson Research, Sweden.
    Johnsson, Andreas
    Ericsson Research, Sweden; Uppsala University, Department of Information Technology, Sweden.
    Stadler, Rolf
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering.
    Comparing Transfer Learning and Rollout for Policy Adaptation in a Changing Network Environment2024In: Proceedings of IEEE/IFIP Network Operations and Management Symposium 2024, NOMS 2024, Institute of Electrical and Electronics Engineers (IEEE) , 2024Conference paper (Refereed)
    Abstract [en]

    Dynamic resource allocation for network services is pivotal for achieving end-to-end management objectives. Previous research has demonstrated that Reinforcement Learning (RL) is a promising approach to resource allocation in networks, allowing to obtain near-optimal control policies for non-trivial system configurations. Current RL approaches however have the drawback that a change in the system or the management objective necessitates expensive retraining of the RL agent. To tackle this challenge, practical solutions including offline retraining, transfer learning, and model-based rollout have been proposed. In this work, we study these methods and present comparative results that shed light on their respective performance and benefits. Our study finds that rollout achieves faster adaptation than transfer learning, yet its effectiveness highly depends on the accuracy of the system model.

  • 4.
    Samani, Forough Shahab
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering.
    Stadler, Rolf
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering.
    A Framework for dynamically meeting performance objectives on a service mesh2024Manuscript (preprint) (Other academic)
    Abstract [en]

    We present a framework for achieving end-to-end management objectives for multiple services that concurrently execute on a service mesh. We apply reinforcement learning (RL) techniques to train an agent that periodically performs control actions to reallocate resources. We develop and evaluate the framework using a laboratory testbed where we run information and computing services on a service mesh, supported by the Istio and Kubernetes platforms. We investigate different management objectives that include end-to-end delay bounds on service requests, throughput objectives, cost-related objectives, and service differentiation. Our framework supports the design of a control agent for a given management objective. It is novel in that it advocates a top-down approach whereby the management objective is defined first and then mapped onto the available control actions. Several types of control actions can be executed simultaneously, which allows for efficient resource utilization. Second, the framework separates learning of the system model and the operating region from learning of the control policy. By first learning the system model and the operating region from testbed traces, we can instantiate a simulator and train the agent for different management objectives in parallel. Third, the use of a simulator shortens the training time by orders of magnitude compared with training the agent on the testbed. We evaluate the learned policies on the testbed and show the effectiveness of our approach in several scenarios. In one scenario, we design a controller that achieves the management objectives with $50\%$ less system resources than Kubernetes HPA autoscaling.

  • 5.
    Samani, Forough Shahab
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering.
    Stadler, Rolf
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering.
    Dynamically meeting performance objectives for multiple services on a service mesh2022In: 2022 18TH INTERNATIONAL CONFERENCE ON NETWORK AND SERVICE MANAGEMENT (CNSM 2022): INTELLIGENT MANAGEMENT OF DISRUPTIVE NETWORK TECHNOLOGIES AND SERVICES / [ed] Charalambides, M Papadimitriou, P Cerroni, W Kanhere, S Mamatas, L, IEEE , 2022, p. 219-225Conference paper (Refereed)
    Abstract [en]

    We present a framework that lets a service provider achieve end-to-end management objectives under varying load. Dynamic control actions are performed by a reinforcement learning (RL) agent. Our work includes experimentation and evaluation on a laboratory testbed where we have implemented basic information services on a service mesh supported by the Istio and Kubernetes platforms. We investigate different management objectives that include end-to-end delay bounds on service requests, throughput objectives, and service differentiation. These objectives are mapped onto reward functions that an RL agent learns to optimize, by executing control actions, namely, request routing and request blocking. We compute the control policies not on the testbed, but in a simulator, which speeds up the learning process by orders of magnitude. In our approach, the system model is learned on the testbed; it is then used to instantiate the simulator, which produces near-optimal control policies for various management objectives. The learned policies are then evaluated on the testbed using unseen load patterns.

  • 6.
    Samani, Forough Shahab
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering.
    Stadler, Rolf
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering.
    Predicting Distributions of Service Metrics using Neural Networks2018In: 2018 14TH INTERNATIONAL CONFERENCE ON NETWORK AND SERVICE MANAGEMENT (CNSM) / [ed] Salsano, S Riggio, R Ahmed, T Samak, T DosSantos, CRP, IEEE , 2018, p. 45-53Conference paper (Refereed)
    Abstract [en]

    We predict the conditional distributions of service metrics, such as response time or frame rate, from infrastructure measurements in a cloud environment. From such distributions, key statistics of the service metrics, including mean, variance, or percentiles can be computed, which are essential for predicting SLA conformance or enabling service assurance. We model the distributions as Gaussian mixtures, whose parameters we predict using mixture density networks, a class of neural networks. We apply the method to a Voll service and a KY store running on our lab testbed. The results validate the effectiveness of the method when applied to operational data. In the case of predicting the mean of the frame rate or response time, the accuracy matches that of random forest, a baseline model.

  • 7.
    Samani, Forough Shahab
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering. RISE Res Inst Sweden, Dept Digital Syst, S-50115 Borås, Sweden..
    Stadler, Rolf
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering. RISE Res Inst Sweden, Dept Digital Syst, S-50115 Borås, Sweden..
    Flinta, Christofer
    Ericsson Res, RA Machine Intelligence, S-16480 Stockholm, Sweden..
    Johnsson, Andreas
    Ericsson Res, Res Area Artificial Intelligence Dept, S-16483 Stockholm, Sweden..
    Conditional Density Estimation of Service Metrics for Networked Services2021In: IEEE Transactions on Network and Service Management, E-ISSN 1932-4537, Vol. 18, no 2, p. 2350-2364Article in journal (Refereed)
    Abstract [en]

    We predict the conditional distributions of service metrics, such as response time or frame rate, from infrastructure measurements in a networked environment. From such distributions, key statistics of the service metrics, including mean, variance, or quantiles can be computed, which are essential for predicting SLA conformance and enabling service assurance. We present and assess two methods for prediction: (1) mixture models with Gaussian or Lognormal kernels, whose parameters are estimated using mixture density networks, a class of neural networks, and (2) histogram models, which require the target space to be discretized. We apply these methods to a VoD service and a KV store service running on our lab testbed. A comparative evaluation shows the relative effectiveness of the methods when applied to operational data. We find that both methods allow for accurate prediction. While mixture models provide a general and elegant solution, they incur a very high overhead related to hyper-parameter search and neural network training. Histogram models, on the other hand, allow for efficient training, but require adjustment to the specific use case.

  • 8.
    Samani, Forough Shahab
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering.
    Stadler, Rolf
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering. RISE SICS, Luleå, Sweden..
    Johnsson, Andreas
    Ericsson Res, Gothenburg, Sweden..
    Flinta, Christofer
    Ericsson Res, Gothenburg, Sweden..
    Demonstration: Predicting Distributions of Service Metrics2019In: 2019 2019 IFIP/IEEE Symposium on Integrated Network and Service Management (IM), Institute of Electrical and Electronics Engineers (IEEE), 2019, p. 745-746, article id 8717915Conference paper (Refereed)
    Abstract [en]

    The ability to predict conditional distributions of service metrics is key to understanding end-to-end service behavior. From conditional distributions, other metrics can be derived, such as expected values and quantiles, which are essential for assessing SLA conformance. Our demonstrator predicts conditional distributions and derived metrics estimation in real-time, using infrastructure measurements. The distributions are modeled as Gaussian mixtures whose parameters are estimated using a mixture density network. The predictions are produced for a Video-on-Demand service that runs on a testbed at KTH.

  • 9.
    Samani, Forough Shahab
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering. RISE AI, Gothenburg, Sweden..
    Zhang, Hongyi
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering.
    Stadler, Rolf
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering. RISE AI, Gothenburg, Sweden..
    Efficient Learning on High-dimensional Operational Data2019In: 2019 15th International conference on network and service management (CNSM) / [ed] Lutfiyya, H Diao, YX Zincir-Heywood, N Badonnel, R Madeira, E, IEEE , 2019Conference paper (Refereed)
    Abstract [en]

    In networked systems engineering, operational data gathered from sensors or logs can be used to build data-driven functions for performance prediction, anomaly detection, and other operational tasks. The number of data sources used for this purpose determines the dimensionality of the feature space for learning and can reach millions for medium-sized systems. Learning on a space with high dimensionality generally incurs high communication and computational costs for the learning process. In this work, we apply and compare a range of methods, including, feature selection, Principle Component Analysis (PCA), and autoencoders with the objective to reduce the dimensionality of the feature space while maintaining the prediction accuracy when compared with learning on the full space. We conduct the study using traces gathered from a test-bed at KTH that runs a video-on-demand service and a key-value store under dynamic load. Our results suggest the feasibility of reducing the dimensionality of the feature space of operational data significantly, by one to two orders of magnitude in our scenarios, while maintaining prediction accuracy. The findings confirm the Manifold Hypothesis in machine learning, which states that real-world data sets tend to occupy a small subspace of the full feature space. In addition, we investigate the tradeoff between prediction accuracy and prediction overhead, which is crucial for applying the results to operational systems.

  • 10.
    Shahabsamani, Forough
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering.
    End-to-end performance prediction and automated resource management of cloud services2024Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Cloud-based services are integral to modern life. Cloud systems aim to provide customers with uninterrupted services of high quality while enabling cost-effective fulfillment by providers. The key to meet quality requirements and end-to-end performance objectives is to devise effective strategies to allocate resources to the services. This in turn requires automation of resource allocation. Recently, researchers have studied learning-based approaches, especially reinforcement learning (RL) for automated resource allocation. These approaches are particularly promising to perform resource allocation in cloud systems as they allow to deal with the architectural complexity of a cloud environment. Previous research shows that reinforcement learning is effective for specific types of controls, such as horizontal or vertical scaling of compute resources. Major obstacles for operational deployment remain however. Chief among them is the fact that reinforcement learning methods require long times for training and retraining after system changes. 

    With this thesis, we aim to overcome these obstacles and demonstrate dynamic resource allocation using reinforcement learning on a testbed. On the conceptual level, we address two interconnected problems: predicting end-to-end service performance and automated resource allocation for cloud services. First, we study methods to predict the conditional density of service metrics and demonstrate the effectiveness of employing dimensionality reduction methods to reduce monitoring, communication, and model-training overhead. For automated resource allocation, we develop a framework for RL-based control. Our approach involves learning a system model from measurements, using a simulator to learn resource allocation policies, and adapting these policies online using a rollout mechanism. Experimental results from our testbed show that using our framework, we can effectively achieve end-to-end performance objectives by dynamically allocating resources to the services using different types of control actions simultaneously.

    Download full text (pdf)
    Kappa
  • 11.
    Shahabsamani, Forough
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering.
    Hammar, Kim
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering.
    Stadler, Rolf
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering.
    Online Policy Adaptation for Networked Systems using Rollout2024Conference paper (Refereed)
    Abstract [en]

    Dynamic resource allocation in networked systems is needed to continuously achieve end-to-end management objectives. Recent research has shown that reinforcement learning can achieve near-optimal resource allocation policies for realistic system configurations. However, most current solutions require expensive retraining when changes in the system occur. We address this problem and introduce an efficient method to adapt a given base policy to system changes, e.g., to a change in the service offering. In our approach, we adapt a base control policy using a rollout mechanism, which transforms the base policy into an improved rollout policy. We perform extensive evaluations on a testbed where we run applications on a service mesh based on the Istio and Kubernetes platforms. The experiments provide insights into the performance of different rollout algorithms. We find that our approach produces policies that are equally effective as those obtained by offline retraining. On our testbed, effective policy adaptation takes seconds when using rollout, compared to minutes or hours when using retraining. Our work demonstrates that rollout, which has been applied successfully in other domains, is an effective approach for policy adaptation in networked systems.

  • 12.
    Wang, Xiaoxuan
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science.
    Samani, Forough Shahab
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering. RISE Res Inst Sweden, Gothenburg, Sweden.
    Johnsson, Andreas
    KTH, School of Electrical Engineering and Computer Science (EECS), Electrical Engineering, Electronics and Embedded systems, Electronic and embedded systems. Ericsson Res, Stockholm, Sweden.
    Stadler, Rolf
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering. RISE Res Inst Sweden, Gothenburg, Sweden.
    Online Feature Selection for Low-overhead Learning in Networked Systems2021In: Proceedings of the 2021 17th International Conference on Network and Service Management: Smart Management for Future Networks and Services, CNSM 2021 / [ed] Chemouil, P Ulema, M Clayman, S Sayit, M Cetinkaya, C Secci, S, Institute of Electrical and Electronics Engineers Inc. , 2021, p. 527-529Conference paper (Refereed)
    Abstract [en]

    Data-driven functions for operation and management require measurements and readings from distributed data sources for model training and prediction. While the number of candidate data sources can be very large, research has shown that it is often possible to reduce the number of data sources significantly while still allowing for accurate prediction. Consequently, there is potential to lower communication and computing resources needed to continuously extract, collect, and process this data. We demonstrate the operation of a novel online algorithm called OSFS, which sequentially processes the collected data and reduces the number of data sources for training prediction models. OSFS builds on two main ideas, namely (1) ranking the available data sources using (unsupervised) feature selection algorithms and (2) identifying stable feature sets that include only the top features. The demonstration shows the search space exploration, the iterative selection of feature sets, and the evaluation of the stability of these sets. The demonstration uses measurements collected from a KTH testbed, and the predictions relate to end-to-end KPIs for network services. 

  • 13.
    Wang, Xiaoxuan
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science.
    Samani, Forough Shahab
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering. RISE Res Inst Sweden, Gothenburg, Sweden..
    Stadler, Rolf
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering. RISE Res Inst Sweden, Gothenburg, Sweden..
    Online feature selection for rapid, low-overhead learning in networked systems2020In: 2020 16th international conference on network and service management (CNSM) / [ed] ZincirHeywood, N Ulema, M Sayit, M Clayman, S Kim, MS Cetinkaya, C, IEEE , 2020Conference paper (Refereed)
    Abstract [en]

    Data-driven functions for operation and management often require measurements collected through monitoring for model training and prediction. The number of data sources can be very large, which requires a significant communication and computing overhead to continuously extract and collect this data, as well as to train and update the machine-learning models. We present an online algorithm, called OSFS, that selects a small feature set from a large number of available data sources, which allows for rapid, low-overhead, and effective learning and prediction. OSFS is instantiated with a feature ranking algorithm and applies the concept of a stable feature set, which we introduce in the paper. We perform extensive, experimental evaluation of our method on data from an in-house testbed. We find that OSFS requires several hundreds measurements to reduce the number of data sources by two orders of magnitude, from which models are trained with acceptable prediction accuracy. While our method is heuristic and can be improved in many ways, the results clearly suggests that many learning tasks do not require a lengthy monitoring phase and expensive offline training.

1 - 13 of 13
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf