kth.sePublications
Change search
Refine search result
123 1 - 50 of 117
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Adam, Constantin
    et al.
    KTH, School of Information and Communication Technology (ICT), Microelectronics and Information Technology, IMIT.
    Stadler, Rolf
    KTH, School of Information and Communication Technology (ICT), Microelectronics and Information Technology, IMIT.
    A Middleware Design for Large-scale Clusters offering Multiple Services2006In: IEEE Transactions on Network and Service Management, E-ISSN 1932-4537, Vol. 3, no 1, p. 1-12Article in journal (Refereed)
    Abstract [en]

    We present a decentralized design that dynamically allocates resources to multiple services inside a global server cluster. The design supports QoS objectives (maximum response time and maximum loss rate) for each service. A system administrator can modify policies that assign relative importance to services and, in this way, control the resource allocation process. Distinctive features of our design are the use of an epidemic protocol to disseminate state and control information, as well as the decentralized evaluation of utility functions to control resource partitioning among services. Simulation results show that the system operates both effectively and efficiently; it meets the QoS objectives and dynamically adapts to load changes and to failures. In case of overload, the service quality degrades gracefully, controlled by the cluster policies.

  • 2.
    Adam, Constantin
    et al.
    KTH, School of Information and Communication Technology (ICT), Microelectronics and Information Technology, IMIT.
    Stadler, Rolf
    KTH, School of Information and Communication Technology (ICT), Microelectronics and Information Technology, IMIT.
    Adaptable Server Clusters with QoS Objectives2005In: Integrated Network Management IX - MANAGING NEW NETWORKED WORLDS / [ed] Clemm A, Festor O, Pras A, New York: IEEE , 2005, p. 149-163Conference paper (Refereed)
    Abstract [en]

    We present a decentralized design for a server cluster that supports a single service with response time guarantees. Three distributed mechanisms represent the key elements of our design. Topology construction maintains a dynamic overlay of cluster nodes. Request routing directs service requests towards available servers. Membership control allocates/releases servers to/from the cluster, in response to changes in the external load. We advocate a decentralized approach, because it is scalable, fault-tolerant, and has a lower configuration complexity than a centralized solution. We demonstrate through simulations that our system operates efficiently by comparing it to an ideal centralized system. In addition, we show that our system rapidly adapts to changing load. We found that the interaction of the various mechanisms in the system leads to desirable global properties. More precisely, for a fixed connectivity c (i.e., the number of neighbors of a node in the overlay), the average experienced delay in the cluster is independent of the external load. In addition, increasing c increases the average delay but decreases the system size for a given load. Consequently, the cluster administrator can use c as a management parameter that permits control of the tradeoff between a small system size and a small experienced delay for the service.

  • 3.
    Adam, Constantin
    et al.
    KTH, School of Information and Communication Technology (ICT), Microelectronics and Information Technology, IMIT.
    Stadler, Rolf
    KTH, School of Information and Communication Technology (ICT), Microelectronics and Information Technology, IMIT.
    Externally Controllable, Self-Oganizing Server Clusters2005In: Designing a Scalable, Self-organizing Middleware for Server Clusters (NGNM05): in the scope of Networking 2005, 2005, p. 1-12Chapter in book (Other academic)
  • 4.
    Adam, Constantin
    et al.
    KTH, School of Electrical Engineering (EES), Communication Networks.
    Stadler, Rolf
    KTH, School of Electrical Engineering (EES), Communication Networks.
    Implementation and evaluation of a middleware for self-organizing decentralized web services2006In: Integrated Network Management IX: MANAGING NEW NETWORKED WORLDS, 2006, Vol. 3996, p. 1-14Conference paper (Refereed)
    Abstract [en]

    We present the implementation of Chameleon, a peer-to-peer middleware for self-organizing web services, and we provide evaluation results from a test bed. The novel aspect of Chameleon is that key functions, including resource allocation, are decentralized, which facilitates scalability and robustness of the overall system. Chameleon is implemented in Java on the Tomcat web server environment. The implementation is non-intrusive in the sense that it does not require code modifications in Tomcat or in the underlying operating system. We evaluate the system by running the TPC-W benchmark. We show that the middleware dynamically and effectively reconfigures in response to changes in load patterns and server failures, while enforcing operating policies, namely, QoS objectives and service differentiation under overload.

  • 5.
    Adam, Constantin
    et al.
    KTH, Superseded Departments (pre-2005), Microelectronics and Information Technology, IMIT.
    Stadler, Rolf
    KTH, Superseded Departments (pre-2005), Microelectronics and Information Technology, IMIT.
    Patterns for Routing and Self-Stabilization2004In: NOMS 2004: IEEE/IFIP NETWORK OPERATIONS AND MANAGMENT SYMPOSIUM - MANAGING NEXT GENERATION CONVERGENCE NETWORKS AND SERVICES, New York: IEEE , 2004, p. 61-74Conference paper (Refereed)
    Abstract [en]

    This paper contributes towards engineering self-stabilizing networks and Services. We propose the use of navigation patterns, which define how information for state updates is disseminated in the system, as fundamental building blocks for self-stabilizing systems. We present two navigation patterns for self-stabilization: the progaressive wave pattern and the stationary wave pattern. The progressive wave pattern defines the update dissemination in Internet routing systems running the DUAL and OSPF protocols. Similarly, the stationary wave pattern defines the interactions of peer nodes in structured-peer-to-peer systems, including Chord, Pastry, Tapestry, and CAN. It turns out that both patterns are related. They both disseminate information in form of waves, i.e, sets of messages that originate from single events. Patterns can be instrumented to obtain wave statistics, which enables monitoring the process of self-stabilization in a system. We focus on Internet routing and peer-to-peer systems in this work, since we believe that studying these (existing) systems can lead to engineering principles for self-stabilizing system in various application areas.

  • 6.
    Adam, Constantin
    et al.
    KTH, School of Electrical Engineering (EES), Communication Networks.
    Stadler, Rolf
    KTH, School of Electrical Engineering (EES), Communication Networks.
    Service middleware for self-managing large-scale systems2007In: IEEE Transactions on Network and Service Management, E-ISSN 1932-4537, Vol. 4, no 3, p. 50-64Article in journal (Refereed)
    Abstract [en]

    Resource management poses particular challenges in large-scale systems, such as server clusters that simultaneously process requests from a large number of clients. A resource management scheme for such systems must scale both in the in the number of cluster nodes and the number of applications the cluster supports. Current solutions do not exhibit both of these properties at the same time. Many are centralized, which limits their scalability in terms of the number of nodes, or they are decentralized but rely on replicated directories, which also reduces their ability to scale. In this paper, we propose novel solutions to request routing and application placementtwo key mechanisms in a scalable resource management scheme. Our solution to request routing is based on selective update propagation, which ensures that the control load on a cluster node is independent of the system size. Application placement is approached in a decentralized manner, by using a distributed algorithm that maximizes resource utilization and allows for service differentiation under overload. The paper demonstrates how the above solutions can be integrated into an overall design for a peer-to-peer management middleware that exhibits properties of self-organization. Through complexity analysis and simulation, we show to which extent the system design is scalable. We have built a prototype using accepted technologies and have evaluated it using a standard benchmark. The testbed measurements show that the implementation, within the parameter range tested, operates efficiently, quickly adapts to a changing environment and allows for effective service differentiation by a system administrator.

  • 7.
    Adam, Constantin
    et al.
    KTH, School of Electrical Engineering (EES), Communication Networks.
    Stadler, Rolf
    KTH, School of Electrical Engineering (EES), Communication Networks.
    Tang, Chunqiang
    Steinder, Malgorzata
    Spreitzer, Michael
    A service middleware that scales in system size and applications2007In: 2007 10TH IFIP/IEEE INTERNATIONAL SYMPOSIUM ON INTEGRATED NETWORK MANAGEMENT (IM 2009): VOLS 1 AND 2, NEW YORK: IEEE , 2007, p. 70-79Conference paper (Refereed)
    Abstract [en]

    We present a peer-to-peer service management middleware that dynamically allocates system resources to a large set of applications. The system achieves scalability in number of nodes (1000s or more) through three decentralized mechanisms that run on different time scales. First, overlay construction interconnects all nodes in the system for exchanging control and state information. Second, request routing directs requests to nodes that offer the corresponding applications. Third, application placement controls the set of offered applications on each node, in order to achieve efficient operation and service differentiation. The design supports a large number of applications (100s or more) through selective propagation of configuration information needed for request routing. The control load on a node increases linearly with the number of applications in the system. Service differentiation is achieved through assigning a utility to each application which influences the application placement process. Simulation studies show that the system operates efficiently for different sizes, adapts fast to load changes and failures and effectively differentiates between different applications under overload.

  • 8. Ahmed, J.
    et al.
    Johnsson, A.
    Moradi, F.
    Pasquini, R.
    Flinta, C.
    Stadler, Rolf
    KTH, School of Electrical Engineering (EES), Centres, ACCESS Linnaeus Centre.
    Online approach to performance fault localization for cloud and datacenter services2017In: Proceedings of the IM 2017 - 2017 IFIP/IEEE International Symposium on Integrated Network and Service Management, Institute of Electrical and Electronics Engineers Inc. , 2017, p. 873-874Conference paper (Refereed)
    Abstract [en]

    Automated detection and diagnosis of the performance faults in cloud and datacenter environments is a crucial task to maintain smooth operation of different services and minimize downtime. We demonstrate an effective machine learning approach based on detecting metric correlation stability violations (CSV) for automated localization of performance faults for datacenter services running under dynamic load conditions.

  • 9. Ahmed, J.
    et al.
    Johnsson, A.
    Yanggratoke, Rerngvit
    KTH, School of Electrical Engineering (EES), Centres, ACCESS Linnaeus Centre. KTH, School of Electrical Engineering (EES), Communication Networks.
    Ardelius, J.
    Flinta, C.
    Stadler, Rolf
    KTH, School of Electrical Engineering (EES), Communication Networks. KTH, School of Electrical Engineering (EES), Centres, ACCESS Linnaeus Centre.
    Predicting SLA conformance for cluster-based services using distributed analytics2016In: Proceedings of the NOMS 2016 - 2016 IEEE/IFIP Network Operations and Management Symposium, IEEE conference proceedings, 2016, p. 848-852Conference paper (Refereed)
    Abstract [en]

    Service assurance for the telecom cloud is a challenging task and is continuously being addressed by academics and industry. One promising approach is to utilize machine learning to predict service quality in order to take early mitigation actions. In previous work we have shown how to predict service-level metrics, such as frame rate for a video application on the client side, from operational data gathered at the server side. This gives the service provider early indications on whether the platform can support the current load demand. This paper extends previous work by addressing scalability issues for cluster-based services. Operational data being generated in large volumes, from several sources, and at high velocity puts strain on computational and communication resources. We propose and evaluate a distributed machine learning system based on the Winnow algorithm to tackle scalability issues, and then compare the new distributed solution with the previously proposed centralized solution. We show that network overhead and computational execution time is substantially reduced while maintaining high prediction accuracy making it possible to achieve real-time service quality predictions in large systems.

  • 10.
    Ahmed, J.
    et al.
    Ericsson Research, Sweden.
    Josefsson, T.
    Uppsala University, Sweden.
    Johnsson, A.
    Ericsson Research, Sweden.
    Flinta, C.
    Ericsson Research, Sweden.
    Moradi, F.
    Ericsson Research, Sweden.
    Pasquini, R.
    Faculty of Computing (FACOM/UFU), Uberlândia, MG, Brazil.
    Stadler, Rolf
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering. KTH, School of Electrical Engineering and Computer Science (EECS), Centres, ACCESS Linnaeus Centre. RISE Swedish Institute of Computer Science (SICS), Sweden.
    Automated diagnostic of virtualized service performance degradation2018In: Proceedings 2018 IEEE/IFIP Network Operations and Management Symposium, NOMS 2018: Cognitive Management in a Cyber World, NOMS 2018, Institute of Electrical and Electronics Engineers (IEEE) , 2018, p. 1-9Conference paper (Refereed)
    Abstract [en]

    Service assurance for cloud applications is a challenging task and is an active area of research for academia and industry. One promising approach is to utilize machine learning for service quality prediction and fault detection so that suitable mitigation actions can be executed. In our previous work, we have shown how to predict service-level metrics in real-time just from operational data gathered at the server side. This gives the service provider early indications on whether the platform can support the current load demand. This paper provides the logical next step where we extend our work by proposing an automated detection and diagnostic capability for the performance faults manifesting themselves in cloud and datacenter environments. This is a crucial task to maintain the smooth operation of running services and minimizing downtime. We demonstrate the effectiveness of our approach which exploits the interpretative capabilities of Self- Organizing Maps (SOMs) to automatically detect and localize different performance faults for cloud services.

  • 11. Baliosian, J.
    et al.
    Stadler, Rolf
    KTH, School of Electrical Engineering (EES), Communication Networks.
    Decentralized configuration of neighboring cells for radio access networks2007In: 2007 IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks, WOWMOM, IEEE , 2007, p. 4351740-Conference paper (Refereed)
    Abstract [en]

    In order to execute a handover processes in a Radio Access Network, each cell has a configured list of neighbors to which such handovers are made. Rapid re-configuration of the neigh-borhood list in response to network failures and other events is currently not possible. To address this problem, this paper suggests an autonomic approach for dynamically configuring neighboring cell lists and introduces a decentralized, three-layered framework. As a key element of this framework, a novel probabilistic protocol that detects and continuously tracks the coverage overlaps among cells is presented and evaluated. The protocol, called DOC, maintains a distributed graph of over-lapping cells. Due to using Bloom fillers and aggregation techniques, it exhibits a low traffic and computational overhead. A first series of simulation studies suggests that DOC is scalable with respect to network size and the number of terminals.

  • 12. Baliosian, Javier
    et al.
    Matusikova, Katarina
    Quinn, Karl
    Stadler, Rolf
    KTH, School of Electrical Engineering (EES), Communication Networks. KTH, School of Electrical Engineering (EES), Centres, ACCESS Linnaeus Centre.
    Policy-based self-healing for radio access networks2008In: 2008 IEEE NETWORK OPERATIONS AND MANAGEMENT SYMPOSIUM, IEEE , 2008, p. 1007-1010Conference paper (Refereed)
    Abstract [en]

    Various centralized, distributed or cooperative management systems have been proposed to address the demands of wireless telecommunication networks. However, considering the size, complexity and heterogeneity that those networks will have in the future, current solutions either do not scale properly, or have no support for automation, or lack of the flexibility and simple control that operators will need for managing future networks in a cost-effective way. To address this problem, we designed Omega, a distributed and policy-based network management system that uses rich knowledge-modeling techniques to develop self-configuration capabilities. Omega also implements a novel conflict-resolution method that uses high-level goals and machine learning techniques to optimize its policy-based decisions. Using simulations, in this paper we show how Omega reduces the impact of a node crash on the overall availability of a radio access network by optimizing the lists of neighboring cells of the nodes in the vicinity.

    Download full text (pdf)
    fulltext
  • 13.
    Baliosian, Javier
    et al.
    Ericsson Ireland Research Centre, Athlone, Ireland.
    Stadler, Rolf
    KTH, School of Electrical Engineering (EES), Communication Networks.
    Method of Discovering Overlapping Cells2007Patent (Other (popular science, discussion, etc.))
  • 14. Brunner, M
    et al.
    Galis, A
    Cheng, L
    Colas, J A
    Ahlgren, B
    Gunnar, A
    Abrahamsson, H
    Szabo, R
    Csaba, S
    Nielsen, J
    Prieto, Alberto Gonzalez
    KTH, Superseded Departments (pre-2005), Microelectronics and Information Technology, IMIT.
    Stadler, Rolf
    KTH, Superseded Departments (pre-2005), Microelectronics and Information Technology, IMIT.
    Molnar, G
    Ambient networks management challenges and approaches2004In: MOBILITY AWARE TECHNOLOGIES AND APPLICATIONS, PROCEEDINGS / [ed] Karmouch, A; Korba, L; Madeira, ERM, BERLIN: SPRINGER , 2004, Vol. 3284, p. 196-216Conference paper (Refereed)
    Abstract [en]

    System management addresses the provision of functions required for controlling, planning, allocating, monitoring, and deploying the resources of a network and of its services in order to optimize its efficiency and productivity and to safeguard its operation. It is also an enabler for the creation and sustenance of new business models and value chains, reflecting the different roles the service providers and users of a network can assume. Ambient Network represents a new networking approach and it aims to enable the cooperation of heterogeneous networks, on demand and transparently, to the potential users, without the need for pre-configuration or offline negotiation between network operators. To achieve these goals, ambient network management systems have to become dynamic, adaptive, autonomic and responsive to the network and its ambience. This paper discusses relationships between the concepts of autonomous and self-manageability and those of ambient networking, and the challenges and benefits that arise from their employment.

  • 15. Brunner, M
    et al.
    Galis, A
    Cheng, L
    Colas, J A
    Ahlgren, B
    Gunnar, A
    Abrahamsson, H
    Szabo, R
    Csaba, S
    Nielsen, J
    Schuetz, S
    Gonzalez Prieto, Alberto
    KTH, School of Electrical Engineering (EES), Communication Networks.
    Stadler, Rolf
    KTH, School of Electrical Engineering (EES), Communication Networks.
    Molnar, G
    Towards Ambient Networks Management2005In: MOBILITY AWARE TECHNOLOGIES AND APPLICATIONS, PROCEEDINGS, 2005, Vol. 3744, p. 215-229Conference paper (Refereed)
    Abstract [en]

    Ambient Networks (AN) are under development and they are based on novel networking concepts and systems that will enable a wide range of user and business communication scenarios beyond today's fixed, 3(rd) generation mobile and IP standards. Central to this project is the concept of Ambient Control Space (ACS) and the Domain Manager control function, which manages the underlying data transfer capabilities and presents a set of interfaces towards the supported services and applications. Network Management Systems of Ambient Networks must work in an environment where heterogeneous networks compose and cooperate, on demand and transparently, without the need for manual (pre or re)-configuration or offline negotiations between network operators. To achieve these goals, ambient network management systems must become dynamic, distributed, self-managing and responsive to the network and its ambience. This paper describes the different management research challenges and four complementary solution approaches (i.e. Pattern-based Management, Peer-to-Peer Management, (Un)PnP Management, Traffic Engineering Management Application Approaches) that enable efficient management of ambient networks, and the relationships between them, and presents the main results achieved so far.

  • 16.
    Burgess, Mark
    et al.
    Oslo Univ Coll, Oslo, Norway..
    Disney, Matthew
    Oslo Univ Coll, Oslo, Norway..
    Stadler, Rolf
    KTH.
    Network patterns in cfengine and scalable data aggregation2007In: USENIX ASSOCIATION PROCEEDING OF THE 21ST LARGE INSTALLATION SYSTEMS ADMINISTRATION CONFERENCE, USENIX ASSOC , 2007, p. 275-+Conference paper (Refereed)
    Abstract [en]

    Network patterns are based on generic algorithms that execute on tree-based overlays. A set of such patterns has been developed at KTH to support distributed monitoring in networks with non-trivial topologies. We consider the use of this approach in logical peer networks in cfengine as a way of scaling aggregation of data to large organizations. Use of 'deep' network structures can lead to temporal anomalies. We show how to minimize temporal fragmentation during data aggregation by using time offsets and what effect these choices might have on power consumption. We offer proof of concept for this technology to initiate either multicast or inverse multicast pulses through sensor networks.

  • 17.
    Chemouil, Prosper
    et al.
    Orange Labs, Convergent Network Control Lab, F-92320 Chatillon, France..
    Hui, Pan
    Univ Helsinki, Dept Comp Sci, Helsinki 00014, Finland.;Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Peoples R China..
    Kellerer, Wolfgang
    Tech Univ Munich, Dept Elect & Comp Engn, D-80333 Munich, Germany..
    Li, Yong
    Tsinghua Univ, Dept Elect Engn, Beijing Natl Res Ctr Informat Sci & Technol, Beijing 100084, Peoples R China..
    Stadler, Rolf
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering.
    Tao, Dacheng
    Univ Sydney, Fac Engn & IT, Sch Comp Sci, UBTECH Sydney AI Ctr, Sydney, NSW 2008, Australia..
    Wen, Yonggang
    Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore 639798, Singapore..
    Zhang, Ying
    Facebook, Network Infrastruct, Menlo Pk, CA 94025 USA..
    Special Issue on Artificial Intelligence and Machine Learning for Networking and Communications2019In: IEEE Journal on Selected Areas in Communications, ISSN 0733-8716, E-ISSN 1558-0008, Vol. 37, no 6, p. 1185-1191Article in journal (Refereed)
  • 18. Chemouil, Prosper
    et al.
    Hui, Pan
    Kellerer, Wolfgang
    Limam, Noura
    Stadler, Rolf
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering. KTH, School of Electrical Engineering and Computer Science (EECS), Centres, ACCESS Linnaeus Centre.
    Wen, Yonggang
    Special Issue on Advances in Artificial Intelligence and Machine Learning for Networking2020In: IEEE Journal on Selected Areas in Communications, ISSN 0733-8716, E-ISSN 1558-0008, Vol. 38, no 10, p. 2229-2233Article in journal (Other academic)
    Abstract [en]

    Artificial Intelligence (AI) and Machine Learning (ML) approaches have emerged in the networking domain with great expectation. They can be broadly divided into AI/ML techniques for network engineering and management, network designs for AI/ML applications, and system concepts. AI/ML techniques for networking and management improve the way we address networking. They support efficient, rapid, and trustworthy engineering, operations, and management. As such, they meet the current interest in softwarization and network programmability that fuels the need for improved network automation in agile infrastructures, including edge and fog environments. Network design and optimization for AI/ML applications addresses the complementary topic of supporting AI/ML-based systems through novel networking techniques, including new architectures and algorithms. The third topic area is system implementation and open-source software development.

  • 19. Clemm, A.
    et al.
    Granville, L. Z.
    Stadler, Rolf
    KTH, School of Electrical Engineering (EES), Communication Networks.
    Managing virtualization of networks and services2015In: Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349, Vol. 4785Article in journal (Refereed)
  • 20. Clemm, A.
    et al.
    Granville, L. Z.
    Stadler, Rolf
    KTH, School of Electrical Engineering (EES), Communication Networks.
    Shaping the network management Research agenda-report on DSOM 20072008In: Journal of Network and Systems Management, ISSN 1064-7570, Vol. 16, no 2, p. 223-225Article in journal (Refereed)
    Abstract [en]

    The 18th IFIP/IEEE international Workshop on Distributed Systems: Operations and Management (DSOM 2007) was held at San Jones, California/USA, form October 29-31, 2007. The aim of DSOM workshops is to bring together researchers from industry and academia in the areas of network, systems, and service management, in order to discuss recent advances and foster growth. The workshops have a single-track program in order to enable intense interaction among participants. DSOM 2007 continued its tradition of giving a platform to papers that address general topics related to the management of distributed systems. It included sessions on on decentralized and peer-to-peer management, fault detection and diagnosis, service accounting and auditing, problem detection and mitigation, and web services and management. DSOM 2008, will be held between September 22-26, 2008, at Samos Island, Greece. The theme will be Managing Large-scale Service Deployment, which takes up a key aspect of current research in network management.

  • 21.
    Dam, Mads
    et al.
    KTH, School of Information and Communication Technology (ICT), Microelectronics and Information Technology, IMIT.
    Stadler, Rolf
    KTH, School of Information and Communication Technology (ICT), Microelectronics and Information Technology, IMIT.
    A Generic Protocol for Network State Aggregation2005Conference paper (Refereed)
    Abstract [en]

    Aggregation functions, which compute global parameters, such as the sum, minimum or average of local device variables, are needed for many network monitoring and management tasks. As networks grow larger and become more dynamic, it is crucial to compute these functions in a scalable and robust manner. To this end, we have developed GAP (Generic Aggregation Protocol), a novel protocol that computes aggregates of device variables for network management purposes. GAP supports continuous estimation of aggregates in a network where local state variables and the network graph may change. Aggregates are computed in a decentralized way using an aggregation tree. We have performed a functional evaluation of GAP in a simulation environment and have identied conguration choices that potentially allow us to control the performance characteristics of the protocol.

  • 22.
    Dan, Jurca
    et al.
    NTT DOCOMO Eurolabs in Munich, Germany.
    Stadler, Rolf
    KTH, School of Electrical Engineering (EES).
    H-GAP: Estimating Histograms of Local Variables with Accuracy Objectives for Distributed Real-Time Monitoring2010In: IEEE Transactions on Network and Service Management, E-ISSN 1932-4537, Vol. 7, no 2, p. 83-95Article in journal (Refereed)
    Abstract [en]

    We present H-GAP, a protocol for continuous monitoring,which provides a management station with the valuedistribution of local variables across the network. The protocolestimates the histogram of local state variables for a givenaccuracy and with minimal overhead. H-GAP is decentralizedand asynchronous to achieve robustness and scalability, and itexecutes on an overlay interconnecting management processesin network devices. On this overlay, the protocol maintains aspanning tree and updates the histogram through incrementalaggregation. The protocol is tunable in the sense that it allowscontrolling, at runtime, the trade-off between protocol overheadand an accuracy objective. This functionality is realized throughdynamic configuration of local filters that control the flow ofupdates towards the management station. The paper includes ananalysis of the problem of histogram aggregation over aggregationtrees, a formulation of the global optimization problem, anda distributed solution containing heuristic, tree-based algorithms.Using SUM as an example, we show how general aggregationfunctions over local variables can be efficiently computed withH-GAP. We evaluate our protocol through simulation using realtraces. The results demonstrate the controllability of H-GAP ina selection of scenarios and its efficiency in large-scale networks.

    Download full text (pdf)
    fulltext
  • 23. Di Fatta, G.
    et al.
    Liotta, A.
    Agoulmine, N.
    Agrawal, G.
    Berthold, M. R.
    Bordini, R. H.
    Borgelt, C.
    Boutaba, R.
    Calzarossa, M. C.
    Cannataro, M.
    Choudhary, A.
    Cortes, U.
    Dagiuklas, T.
    De Turck, F.
    De Vleeschauwer, B.
    Demestichas, P.
    Dhoedt, B.
    Festor, O.
    Fortino, G.
    Friderikos, V.
    Giunchiglia, F.
    Gravier, C.
    Guo, Y.
    Hunter, D.
    Karypis, G.
    Krishnaswamy, S.
    Limam, N.
    Medhi, D.
    Merani, M. L.
    Nürnberger, A.
    Pardede, E.
    Parthasarathy, S.
    Gaspary, L. P.
    Ranc, D.
    Sivakumar, K.
    Stadler, Rolf
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering.
    Stiller, B.
    Strassner, J.
    Syed, A.
    Talia, D.
    Urso, M. A.
    Van Der Meer, S.
    Wolff, R.
    Granville, L. Z.
    Preface2011In: IEEE International Conference on Data Mining. Proceedings, ISSN 1550-4786, p. xlviii-xlvix, article id 6137551Article in journal (Refereed)
  • 24.
    Fetahi, Wuhib
    et al.
    KTH, School of Electrical Engineering (EES), Communication Networks.
    Dam, Mads
    KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS.
    Stadler, Rolf
    KTH, School of Electrical Engineering (EES), Communication Networks.
    Alexander, Clemm
    Cisco Systems, San Jose, CA USA.
    Robust Monitoring of Network-wide Aggregates through Gossiping2009In: IEEE Transactions on Network and Service Management, E-ISSN 1932-4537, Vol. 6, no 2, p. 95-109Article in journal (Refereed)
    Abstract [en]

    We investigate the use of gossip protocols for continuousmonitoring of network-wide aggregates under crash failures.Aggregates are computed from local management variablesusing functions such as SUM, MAX, or AVERAGE. For this typeof aggregation, crash failures offer a particular challenge dueto the problem of mass loss, namely, how to correctly accountfor contributions from nodes that have failed. In this paper wegive a partial solution. We present G-GAP, a gossip protocolfor continuous monitoring of aggregates, which is robust againstfailures that are discontiguous in the sense that neighboringnodes do not fail within a short period of each other. We giveformal proofs of correctness and convergence, and we evaluatethe protocol through simulation using real traces. The simulationresults suggest that the design goals for this protocol have beenmet. For instance, the tradeoff between estimation accuracyand protocol overhead can be controlled, and a high estimationaccuracy (below some 5% error in our measurements) is achievedby the protocol, even for large networks and frequent nodefailures. Further, we perform a comparative assessment of GGAPagainst a tree-based aggregation protocol using simulation.Surprisingly, we find that the tree-based aggregation protocolconsistently outperforms the gossip protocol for comparativeoverhead, both in terms of accuracy and robustness.

    Download full text (pdf)
    fulltext
  • 25. Flinta, C.
    et al.
    Johnsson, A.
    Ahmed, J.
    Moradi, F.
    Pasquini, R.
    Stadler, Rolf
    KTH, School of Electrical Engineering (EES), Centres, ACCESS Linnaeus Centre.
    Real-time resource prediction engine for cloud management2017In: Proceedings of the IM 2017 - 2017 IFIP/IEEE International Symposium on Integrated Network and Service Management, Institute of Electrical and Electronics Engineers Inc. , 2017, p. 877-878Conference paper (Refereed)
    Abstract [en]

    Predicting resource requirements for cloud services is critical for dimensioning, anomaly detection and service assurance. We demonstrate a system for real-time estimation of the needed amount of infrastructure resources, such as CPU and memory, for a given service. Statistical learning methods on server statistics and load parameters of the service are used for learning a resource prediction model. The model can be used as a guideline for service deployment and for real-time identification of resource bottlenecks. 

  • 26.
    Gonzales Prieto, Alberto
    et al.
    KTH, School of Information and Communication Technology (ICT), Microelectronics and Information Technology, IMIT.
    Stadler, Rolf
    KTH, School of Information and Communication Technology (ICT), Microelectronics and Information Technology, IMIT.
    Design and Implementation of Performance Policies for SMS Systems2005In: AMBIENT NETWORKS, Berlin: Springer Verlag , 2005, p. 169-180Conference paper (Refereed)
    Abstract [en]

    We present a design for policy-based performance management of SMS Systems. The design takes as input the operator's performance goals, which are expressed as policies that can be adjusted at run-time. In our specific design, an SMS administrator can specify the maximum delay for a message and the maximum percentage of messages that can be postponed during periods of congestion. The system attempts to maximize the overall throughput while adhering to the performance policies. It does so by periodically solving a linear optimization problem that takes as input the policies and traffic statistics and computes a new configuration. We show that the computational cost for solving this problem is low, even for large system configurations. We have evaluated the design through extensive simulations in various scenarios. It has proved effective in achieving the administrator's performance goals and fast in adapting to changing network conditions. A prototype has been developed on a commercial SMS platform, which proves the validity of our design.

  • 27.
    Gonzales Prieto, Alberto
    et al.
    KTH, School of Electrical Engineering (EES).
    Stadler, Rolf
    KTH, School of Electrical Engineering (EES).
    Distributed real-time monitoring with accuracy objectives2006In: NETWORKING 2006: NETWORKING TECHNOLOGIES, SERVICES, AND PROTOCOLS; PERFORMANCE OF COMPUTER AND COMMUNICATION NETWORKS; MOBILE AND WIRELESS COMMUNICATIONS SYSTEMS    / [ed] Boavida F, Plagemann T, Stiller B, Westphal C, Monteiro E, Berlin: Springer Verlag , 2006, p. 1246-1251Conference paper (Refereed)
    Abstract [en]

    We introduce A-GAP, a protocol for continuous monitoring of network state variables with configurable accuracy. Network state variables are computed from device counters using aggregation functions, such as SUM, AVERAGE and MAX. In A-GAP, the accuracy is expressed in terms of the average error and is controlled by dynamically configuring filters in the management nodes. The protocol follows the push approach to monitoring and uses the concept of incremental aggregation on a self-stabilizing spanning tree. A-GAP is decentralized and asynchronous to achieve robustness and scalability. We provide some results from evaluating the protocol for an ISP topology (Abovenet) in several scenarios through simulation. The results show that we can effectively control the fundamental trade-off between accuracy and overhead. The protocol overhead can be reduced significantly by allowing only small error objectives.

  • 28.
    Gonzales Prieto, Alberto
    et al.
    KTH, School of Electrical Engineering (EES).
    Stadler, Rolf
    KTH, School of Electrical Engineering (EES).
    Scalable policy distribution for ambient networks2005In: Proceedings of the 14th IST Mobile and Wireless Communication Summit, 2005Conference paper (Refereed)
    Abstract [en]

     The characteristics of policy-based management make it an interesting candidate for managing Ambient Networks, which are characterized for being highly dynamic and heterogeneous. However, current policy-based approaches are not scalable, which is a must for such dynamic scenarios. A key aspect for developing scalable systems is policy distribution, the mechanism that provides the right policies at the right locations in the network when they are needed. In this paper, we present a scalable framework for policy distribution for Ambient Networks. The framework is based on aggregating the addresses of the policies and applying multipoint communication techniques. The aggregation is based on grouping the managed elements by the role they play in the network and distributing policies that apply to all the elements in a group. We show the validity of the framework by applying it to a study case.

  • 29.
    Gonzalez Prieto, Alberto
    et al.
    KTH, School of Electrical Engineering (EES), Communication Networks.
    Dudkowski, Dominique
    Meirosu, Catalin
    Mingardi, Chiara
    Nunzi, Giorgio
    Brunner, Marcus
    Stadler, Rolf
    KTH, School of Electrical Engineering (EES), Communication Networks.
    Decentralized In-Network Management for the Future Internet2009In: 2009 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION WORKSHOPS, NEW YORK: IEEE , 2009, p. 803-807Conference paper (Refereed)
    Abstract [en]

    In-network management (INM) is a new paradigm for the management of the future Internet that is based on the principles of decentralization and self-organization. Its goal is to overcome the limitations of traditional network management and to achieve scalable and robust management systems with low complexity for large-scale, dynamic network environments. In this paper, we describe a framework for INM that provides a systematic approach to the embedding of management algorithms within the elements of a communication networks. In addition, we demonstrate the benefits of decentralized management in the context of two key management functions, namely real-time monitoring and event handling.

  • 30.
    Gonzalez Prieto, Alberto
    et al.
    KTH, School of Electrical Engineering (EES), Communication Networks.
    Stadler, Rolf
    KTH, School of Electrical Engineering (EES), Communication Networks.
    Adaptive distributed monitoring with accuracy objectives2006In: Proceedings of the 2006 SIGCOMM Workshop on Internet Network Management, INM'06, 2006, Vol. 2006, p. 65-70Conference paper (Refereed)
    Abstract [en]

    We present A-GAP, a novel protocol for continuous monitoring of network state variables, which aims at achieving a given monitoring accuracy with minimal overhead. Network state variables are computed from device counters using aggregation functions, such as SUM, AVERAGE and MAX. The accuracy objective is expressed as the average estimation error. A-GAP is decentralized and asynchronous to achieve robustness and scalability. It executes on an overlay that interconnects management processes on the devices. On this overlay, the protocol maintains a spanning tree and updates the network state variables through incremental aggregation. It dynamically configures local filters that control whether an update is sent towards the root of the tree. It reduces the overhead by attempting to minimize the maximum processing load over all management processes. We evaluate A-GAP through simulation using an ISP topology and real traces. The results show that we can effectively control the trade-off between accuracy and protocol overhead, that the overhead can be reduced significantly by allowing small errors, and that an accurate estimation of the error distribution can be provided in real-time.

  • 31.
    Gonzalez Prieto, Alberto
    et al.
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Stadler, Rolf
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Adaptive Performance Management for SMS Systems2009In: Journal of Network and Systems Management, ISSN 1064-7570, E-ISSN 1573-7705, Vol. 17, no 4, p. 397-421Article in journal (Refereed)
    Abstract [en]

    We present a design for performance management of SMS systems. The design takes as input the administrator's performance objectives, which can be adjusted at run-time. Based on these objectives, the design takes the necessary actions to achieve them and it dynamically adapts to changing networking conditions. It does so by periodically solving a linear optimization problem that computes a new configuration for the SMS system. We have evaluated the design through extensive simulations in various scenarios using traces from a production SMS system. It has proved effective in achieving the administrator's performance objectives, and efficient in terms of computational cost. Our experiments also show that the design is adaptive, i.e., it effectively adapts the systems's configuration to changes in the networking conditions, in order to continuously meet the performance objectives. Finally, the feasibility of our design is proved through the development of a prototype on a commercial SMS platform.

  • 32.
    Gonzalez Prieto, Alberto
    et al.
    KTH, School of Electrical Engineering (EES), Communication Networks.
    Stadler, Rolf
    KTH, School of Electrical Engineering (EES), Communication Networks.
    A-GAP: An Adaptive Protocol for Continuous Network Monitoring with Accuracy Objectives2007In: IEEE Transactions on Network and Service Management, E-ISSN 1932-4537, Vol. 4, no 1, p. 2-12Article in journal (Refereed)
    Abstract [en]

    We present A-GAP, a novel protocol for continuous monitoring of network state variables, which aims at achieving a given monitoring accuracy with minimal overhead. Network state variables are computed from device counters using aggregation functions, such as SUM, AVERAGE and MAX. The accuracy objective is expressed as the average estimation error. A-GAP is decentralized and asynchronous to achieve robustness and scalability. It executes on an overlay that interconnects management processes on the devices. On this overlay, the protocol maintains a spanning tree and updates the network state variables through incremental aggregation. Based on a stochastic model, it dynamically configures local filters that control whether an update is sent towards the root of the tree. We evaluate A-GAP through simulation using real traces and two different types of topologies of up to 650 nodes. The results show that we can effectively control the trade-off between accuracy and protocol overhead, and that the overhead can be reduced by almost two orders of magnitude when allowing for small errors. The protocol quickly adapts to a node failure and exhibits short spikes in the estimation error. Lastly, it can provide an accurate estimate of the error distribution in real-time.

  • 33.
    Gonzalez Prieto, Alberto
    et al.
    KTH, School of Electrical Engineering (EES), Communication Networks.
    Stadler, Rolf
    KTH, School of Electrical Engineering (EES), Communication Networks.
    Controlling Performance Trade-offs in Adaptive Network Monitoring2009In: 11th IFIP/IEEE International Symposium on Integrated Network Management (IM 2009), IEEE , 2009, p. 359--366Conference paper (Refereed)
    Abstract [en]

    A key requirement for autonomic (i.e., self-*) management systems is a short adaptation time to changes in the networking conditions. In this paper, we show that the adaptation time of a distributed monitoring protocol can be controlled. We show this for A-CAP, a protocol for continuous monitoring of global metrics with controllable accuracy. We demonstrate through simulations that, for the case of A-GAP, the choice of the topology of the aggregation tree controls the tradeoff between adaptation time and protocol overhead in steady-state. Generally, allowing a larger adaptation time permits reducing the protocol overhead. Our results suggest that the adaptation time primarily depends on the height of the aggregation tree and that the protocol overhead is strongly influenced by the number of internal nodes. We outline how A-GAP can be extended to dynamically self-configure and to continuously adapt its configuration to changing conditions, in order to meet a set of performance objectives, including adaptation time, protocol overhead, and estimation accuracy.

    Download full text (pdf)
    fulltext
  • 34.
    Gonzalez Prieto, Alberto
    et al.
    KTH, School of Electrical Engineering (EES), Communication Networks.
    Stadler, Rolf
    KTH, School of Electrical Engineering (EES), Communication Networks.
    Monitoring Flow Aggregates with Controllable Accuracy2007In: Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349, Vol. 4787, p. 64-75Article in journal (Refereed)
    Abstract [en]

    In this paper, we show the feasibility of real-time flow monitoringwith controllable accuracy in today’s IP networks. Our approach is based onNetflow and A-GAP. A-GAP is a protocol for continuous monitoring ofnetwork state variables, which are computed from device metrics usingaggregation functions, such as SUM, AVERAGE and MAX. A-GAP isdesigned to achieve a given monitoring accuracy with minimal overhead. AGAPis decentralized and asynchronous to achieve robustness and scalability.The protocol incrementally computes aggregation functions inside the networkand, based on a stochastic model, it dynamically configures local filters thatcontrol the overhead and accuracy. We evaluate a prototype in a testbed of 16commercial routers and provide measurements from a scenario where theprotocol continuously estimates the total number of FTP flows in the network.Local flow metrics are read out from Netflow buffers and aggregated in realtime.We evaluate the prototype for the following criteria. First, the ability toeffectively control the trade off between monitoring accuracy and processingoverhead; second, the ability to accurately predict the distribution of theestimation error ; third, the impact of a sudden change in topology on theperformance of the protocol. The testbed measurements are consistent withsimulation studies we performed for different topologies and network sizes,which proves the feasibility of the protocol design, and, more generally, thefeasibility of effective and efficient real-time flow monitoring in large networkenvironments.

  • 35.
    Gonzalez Prieto, Alberto
    et al.
    KTH, School of Electrical Engineering (EES), Communication Networks.
    Stadler, Rolf
    KTH, School of Electrical Engineering (EES), Communication Networks.
    Real-time Network Monitoring Supporting Percentile Error Objectives2007In: 14th HP Software University Association (HP-SUA) Workshop, 8-11 July 2007,Munich, Germany, 2007Conference paper (Refereed)
    Abstract [en]

    We report on the versatility of A-GAP for supporting different typesof accuracy objectives. Previously, we considered accuracy objectivesexpressed in terms of the average error. In this paper, we focus on percentileerror objectives. A-GAP is a protocol for continuous monitoring of networkstate variables. Network state variables are computed from device countersusing aggregation functions, such as SUM, AVERAGE and MAX. A-GAP isdesigned to achieve a given monitoring accuracy with minimal overhead. AGAPis decentralized and asynchronous to achieve robustness and scalability. Itexecutes on an overlay that interconnects management processes on the devices.On this overlay, the protocol maintains a spanning tree and updates the networkstate variables through incremental aggregation. Based on a stochastic model, itdynamically configures local filters that control whether an update is senttowards the root of the tree. We evaluate A-GAP through simulation using realtraces for an ISP topology (Abovenet). The results prove the versatility of AGAPfor supporting different types of accuracy objectives. The results alsoshow that we can effectively control the trade-off between accuracy andprotocol overhead, and that the overhead can be reduced significantly byallowing small errors.

  • 36.
    Gonzalez Prieto, Alberto
    et al.
    KTH, School of Electrical Engineering (EES), Communication Networks.
    Stadler, Rolf
    KTH, School of Electrical Engineering (EES), Communication Networks.
    Kersch, P.
    Szabo, R.
    Nunzi, G.
    Brunner, M.
    Schuetz, S.
    Distributed management in Ambient Networks2007In: 2007 PROCEEDINGS OF THE 16TH IST MOBILE AND WIRELESS COMMUNICATIONS, NEW YORK: IEEE , 2007, p. 1091-1095Conference paper (Refereed)
    Abstract [en]

    Traditional centralized management approaches are not suitable for Ambient Networks (ANs), since centralized management systems neither scales well nor adapts fast enough to changing topologies and network compositions. To meet the requirements for AN management systems, we propose the use of distributed approaches. Specifically, we demonstrate the validity of these approaches through three instantiations: (i) a solution for real-time AN monitoring, (ii) a solution for load balancing in wireless networks and (iii) a solution for resource discovery in AN.

  • 37.
    Hammar, Kim
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering. KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for Cyber Defence and Information Security CDIS.
    Stadler, Rolf
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering. KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for Cyber Defence and Information Security CDIS.
    A System for Interactive Examination of Learned Security Policies2022In: Proceedings of the IEEE/IFIP Network Operations and Management Symposium 2022: Network and Service Management in the Era of Cloudification, Softwarization and Artificial Intelligence, NOMS 2022 / [ed] Varga, P Granville, LZ Galis, A Godor, I Limam, N Chemouil, P Francois, J Pahl, M, IEEE, 2022Conference paper (Refereed)
    Abstract [en]

    We present a system for interactive examination of learned security policies. It allows a user to traverse episodes of Markov decision processes in a controlled manner and to track the actions triggered by security policies. Similar to a software debugger, a user can continue or or halt an episode at any time step and inspect parameters and probability distributions of interest. The system enables insight into the structure of a given policy and in the behavior of a policy in edge cases. We demonstrate the system with a network intrusion use case. We examine the evolution of an IT infrastructure's state and the actions prescribed by security policies while an attack occurs. The policies for the demonstration have been obtained through a reinforcement learning approach that includes a simulation system where policies are incrementally learned and an emulation system that produces statistics that drive the simulation runs.

  • 38.
    Hammar, Kim
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering. KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for Cyber Defence and Information Security CDIS.
    Stadler, Rolf
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering. KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for Cyber Defence and Information Security CDIS.
    An Online Framework for Adapting Security Policies in Dynamic IT Environments2022In: 2022 18Th International Conference On Network And Service Management (CNSM 2022): INTELLIGENT MANAGEMENT OF DISRUPTIVE NETWORK TECHNOLOGIES AND SERVICES / [ed] Charalambides, M Papadimitriou, P Cerroni, W Kanhere, S Mamatas, L, IEEE , 2022Conference paper (Refereed)
    Abstract [en]

    We present an online framework for learning and updating security policies in dynamic IT environments. It includes three components: a digital twin of the target system, which continuously collects data and evaluates learned policies; a system identification process, which periodically estimates system models based on the collected data; and a policy learning process that is based on reinforcement learning. To evaluate our framework, we apply it to an intrusion prevention use case that involves a dynamic IT infrastructure. Our results demonstrate that the framework automatically adapts security policies to changes in the IT infrastructure and that it outperforms a stateof-the-art method.

  • 39.
    Hammar, Kim
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering.
    Stadler, Rolf
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering.
    Digital Twins for Security Automation2023In: Proceedings of IEEE/IFIP Network Operations and Management Symposium 2023, NOMS 2023, Institute of Electrical and Electronics Engineers (IEEE) , 2023Conference paper (Refereed)
    Abstract [en]

    We present a novel emulation system for creating high-fidelity digital twins of IT infrastructures. The digital twins replicate key functionality of the corresponding infrastructures and allow to play out security scenarios in a safe environment. We show that this capability can be used to automate the process of finding effective security policies for a target infrastructure. In our approach, a digital twin of the target infrastructure is used to run security scenarios and collect data. The collected data is then used to instantiate simulations of Markov decision processes and learn effective policies through reinforcement learning, whose performances are validated in the digital twin. This closed-loop learning process executes iteratively and provides continuously evolving and improving security policies. We apply our approach to an intrusion response scenario. Our results show that the digital twin provides the necessary evaluative feedback to learn near-optimal intrusion response policies.

  • 40.
    Hammar, Kim
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering. KTH Ctr Cyber Def & Informat Secur, Stockholm, Sweden..
    Stadler, Rolf
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering. KTH Ctr Cyber Def & Informat Secur, Stockholm, Sweden..
    Finding Effective Security Strategies through Reinforcement Learning and Self-Play2020In: 2020 16th international conference on network and service management (CNSM) / [ed] ZincirHeywood, N Ulema, M Sayit, M Clayman, S Kim, MS Cetinkaya, C, IEEE , 2020Conference paper (Refereed)
    Abstract [en]

    We present a method to automatically find security strategies for the use case of intrusion prevention. Following this method, we model the interaction between an attacker and a defender as a Markov game and let attack and defense strategies evolve through reinforcement learning and self-play without human intervention. Using a simple infrastructure configuration, we demonstrate that effective security strategies can emerge from self-play. This shows that self-play, which has been applied in other domains with great success, can be effective in the context of network security. Inspection of the converged policies show that the emerged policies reflect common-sense knowledge and are similar to strategies of humans. Moreover, we address known challenges of reinforcement learning in this domain and present an approach that uses function approximation, an opponent pool, and an autoregressive policy representation. Through evaluations we show that our method is superior to two baseline methods but that policy convergence in self-play remains a challenge.

  • 41.
    Hammar, Kim
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering. KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for Cyber Defence and Information Security CDIS.
    Stadler, Rolf
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering. KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for Cyber Defence and Information Security CDIS.
    Intrusion Prevention Through Optimal Stopping2022In: IEEE Transactions on Network and Service Management, E-ISSN 1932-4537, Vol. 19, no 3, p. 2333-2348Article in journal (Refereed)
    Abstract [en]

    We study automated intrusion prevention using reinforcement learning. Following a novel approach, we formulate the problem of intrusion prevention as an (optimal) multiple stopping problem. This formulation gives us insight into the structure of optimal policies, which we show to have threshold properties. For most practical cases, it is not feasible to obtain an optimal defender policy using dynamic programming. We therefore develop a reinforcement learning approach to approximate an optimal threshold policy. We introduce T- SPSA, an efficient reinforcement learning algorithm that learns threshold policies through stochastic approximation. We show that T- SPSA outperforms state-of-the-art algorithms for our use case. Our overall method for learning and validating policies includes two systems: a simulation system where defender policies are incrementally learned and an emulation system where statistics are produced that drive simulation runs and where learned policies are evaluated. We show that this approach can produce effective defender policies for a practical IT infrastructure.

  • 42.
    Hammar, Kim
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering.
    Stadler, Rolf
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering.
    Intrusion tolerance for networked systems through two-level feedback control2024In: Proceedings - 2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2024, Institute of Electrical and Electronics Engineers (IEEE) , 2024, p. 338-352Conference paper (Refereed)
    Abstract [en]

    We formulate intrusion tolerance for a system with service replicas as a two-level optimal control problem. On the local level node controllers perform intrusion recovery, and on the global level a system controller manages the replication factor. The local and global control problems can be formulated as classical problems in operations research, namely, the machine replacement problem and the inventory replenishment problem. Based on this formulation, we design TOLERANCE, a novel control architecture for intrusion-tolerant systems. We prove that the optimal control strategies on both levels have threshold structure and design efficient algorithms for computing them. We implement and evaluate TOLERANCE in an emulation environment where we run 10 types of network intrusions. The results show that TOLERANCE can improve service availability and reduce operational cost compared with state-of-the-art intrusion-tolerant systems.

  • 43.
    Hammar, Kim
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering.
    Stadler, Rolf
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering.
    Learning Intrusion Prevention Policies through Optimal Stopping2021In: Proceedings Of The 2021 17Th International Conference On Network And Service Management (CNSM 2021): Smart Management For Future Networks And Services / [ed] Chemouil, P Ulema, M Clayman, S Sayit, M Cetinkaya, C Secci, S, IEEE, 2021, p. 509-517Conference paper (Refereed)
    Abstract [en]

    We study automated intrusion prevention using reinforcement learning. In a novel approach, we formulate the problem of intrusion prevention as an optimal stopping problem. This formulation allows us insight into the structure of the optimal policies, which turn out to be threshold based. Since the computation of the optimal defender policy using dynamic programming is not feasible for practical cases, we approximate the optimal policy through reinforcement learning in a simulation environment. To define the dynamics of the simulation, we emulate the target infrastructure and collect measurements. Our evaluations show that the learned policies are close to optimal and that they indeed can be expressed using thresholds.

  • 44.
    Hammar, Kim
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering. KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for Cyber Defence and Information Security CDIS.
    Stadler, Rolf
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering. KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for Cyber Defence and Information Security CDIS.
    Learning Near-Optimal Intrusion Responses Against Dynamic Attackers2024In: IEEE Transactions on Network and Service Management, E-ISSN 1932-4537, Vol. 21, no 1, p. 1158-1177Article in journal (Refereed)
    Abstract [en]

    We study automated intrusion response and formulate the interaction between an attacker and a defender as an optimal stopping game where attack and defense strategies evolve through reinforcement learning and self-play. The game-theoretic modeling enables us to find defender strategies that are effective against a dynamic attacker, i.e., an attacker that adapts its strategy in response to the defender strategy. Further, the optimal stopping formulation allows us to prove that best response strategies have threshold properties. To obtain near-optimal defender strategies, we develop Threshold Fictitious Self-Play (T-FP), a fictitious self-play algorithm that learns Nash equilibria through stochastic approximation. We show that T-FP outperforms a state-of-the-art algorithm for our use case. The experimental part of this investigation includes two systems: a simulation system where defender strategies are incrementally learned and an emulation system where statistics are collected that drive simulation runs and where learned strategies are evaluated. We argue that this approach can produce effective defender strategies for a practical IT infrastructure.

  • 45.
    Hammar, Kim
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering.
    Stadler, Rolf
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering.
    Scalable Learning of Intrusion Response Through Recursive Decomposition2023In: Decision and Game Theory for Security - 14th International Conference, GameSec 2023, Proceedings, Springer Nature , 2023, p. 172-192Conference paper (Refereed)
    Abstract [en]

    We study automated intrusion response for an IT infrastructure and formulate the interaction between an attacker and a defender as a partially observed stochastic game. To solve the game we follow an approach where attack and defense strategies co-evolve through reinforcement learning and self-play toward an equilibrium. Solutions proposed in previous work prove the feasibility of this approach for small infrastructures but do not scale to realistic scenarios due to the exponential growth in computational complexity with the infrastructure size. We address this problem by introducing a method that recursively decomposes the game into subgames with low computational complexity which can be solved in parallel. Applying optimal stopping theory we show that the best response strategies in these subgames exhibit threshold structures, which allows us to compute them efficiently. To solve the decomposed game we introduce an algorithm called Decompositional Fictitious Self-Play (dfsp), which learns Nash equilibria through stochastic approximation. We evaluate the learned strategies in an emulation environment where real intrusions and response actions can be executed. The results show that the learned strategies approximate an equilibrium and that dfsp significantly outperforms a state-of-the-art algorithm for a realistic infrastructure configuration.

  • 46.
    Javier, Baliosian
    et al.
    Ericsson Ireland Research Centre, Athlone, Ireland.
    Stadler, Rolf
    KTH, School of Electrical Engineering (EES), Centres, ACCESS Linnaeus Centre.
    Distributed Auto-configuration of Neighboring Cell Graphs in Radio Access Networks2010In: IEEE Transactions on Network and Service Management, E-ISSN 1932-4537, Vol. 7, no 3, p. 145-157Article in journal (Refereed)
    Abstract [en]

    In order to execute a handover processes in a GSMor UMTS Radio Access Network, each cell has a list of neighborsto which such handovers may be made. Today, these lists arestatically configured during network planning, which does notallow for dynamic adaptation of the network to changes andunexpected events such as a cell failure. This paper advocatesan autonomic, decentralized approach to dynamically configureneighboring cell lists. The main contribution of this work isa novel protocol, called DOC, which detects and continuouslytracks the coverage overlaps among cells. The protocol executeson a spanning tree where the nodes are radio base stations andthe links represent communication channels. Over this tree, nodesperiodically exchange information about terminals that are intheir respective coverage area. Bloom filters are used for efficientrepresentations of terminal sets and efficient set operations. Theprotocol aggregates Bloom filters to reduce the communicationoverhead and also for routing messages along the tree. Usingsimulation, we study the system in steady state, when a basestation is added or a base station fails, and also during theinitialization phase where the system self-configures.

    Download full text (pdf)
    fulltext
  • 47. Jennings, Brendan
    et al.
    Stadler, Rolf
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, ACCESS Linnaeus Centre.
    Resource Management in Clouds: Survey and Research Challenges2015In: Journal of Network and Systems Management, ISSN 1064-7570, E-ISSN 1573-7705, Vol. 23, no 3, p. 567-619Article in journal (Refereed)
    Abstract [en]

    Resource management in a cloud environment is a hard problem, due to: the scale of modern data centers; the heterogeneity of resource types and their interdependencies; the variability and unpredictability of the load; as well as the range of objectives of the different actors in a cloud ecosystem. Consequently, both academia and industry began significant research efforts in this area. In this paper, we survey the recent literature, covering 250+ publications, and highlighting key results. We outline a conceptual framework for cloud resource management and use it to structure the state-of-the-art review. Based on our analysis, we identify five challenges for future investigation. These relate to: providing predictable performance for cloud-hosted applications; achieving global manageability for cloud systems; engineering scalable resource management systems; understanding economic behavior and cloud pricing; and developing solutions for the mobile cloud paradigm.

  • 48.
    Johansson, Björn
    et al.
    KTH, School of Electrical Engineering (EES), Automatic Control.
    Adam, Constantin
    KTH, School of Electrical Engineering (EES), Communication Networks.
    Johansson, Mikael
    KTH, School of Electrical Engineering (EES), Automatic Control.
    Stadler, Rolf
    KTH, School of Electrical Engineering (EES), Communication Networks.
    Distributed resource allocation strategies for achieving quality of service in server clusters2006In: PROCEEDINGS OF THE 45TH IEEE CONFERENCE ON DECISION AND CONTROL, 2006, p. 1990-1995Conference paper (Refereed)
    Abstract [en]

    We investigate the resource allocation problem for large-scale server clusters with quality-of-service objectives, where key functions are decentralized. Specifically, the optimal service selection is posed as a discrete utility maximization problem that reflects management objectives and resource constraints. We develop an efficient centralized algorithm that solves this problem, and we propose three suboptimal schemes that operate with local information. The performance of the suboptimal schemes is evaluated in simulations, both under idealized conditions and in a full-scale system simulator.

  • 49.
    Jurca, Dan
    et al.
    KTH, School of Electrical Engineering (EES), Communication Networks.
    Stadler, Rolf
    KTH, School of Electrical Engineering (EES), Communication Networks.
    Computing Histograms of Local Variables for Real-Time Monitoring using Aggregation Trees2009In: 2009 IFIP/IEEE INTERNATIONAL SYMPOSIUM ON INTEGRATED NETWORK MANAGEMENT (IM 2009) VOLS 1 AND 2, NEW YORK: IEEE , 2009, p. 367-374Conference paper (Refereed)
    Abstract [en]

    In this paper we present a protocol for the continuous monitoring of a local network state variable. Our aim is to provide a management station with the value distribution of the local variables across the network, by means of partial histogram aggregation, with minimum protocol overhead. Our protocol is decentralized and asynchronous to achieve robustness and scalability, and it executes on an overlay interconnecting management processes in network devices. On this overlay, the protocol maintains a spanning tree and updates the histogram of the network state variables through incremental aggregation. The protocol allows to control the trade-off between protocol overhead and a global accuracy objective. This functionality is implemented by a dynamic configuration of local error filters that control whether an update is sent towards the management station or not. We evaluate our protocol by means of simulations. Our results demonstrate the controllability of our method in a wide selection of scenarios, and the scalability of our protocol for large-scale networks.

    Download full text (pdf)
    fulltext
  • 50. Krishnamurthy, Supriya
    et al.
    Ardelius, John
    Aurell, Erik
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB. KTH, School of Electrical Engineering (EES), Centres, ACCESS Linnaeus Centre.
    Dam, Mads
    KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS.
    Stadler, Rolf
    KTH, School of Electrical Engineering (EES), Communication Networks. KTH, School of Electrical Engineering (EES), Centres, ACCESS Linnaeus Centre.
    Wuhib, Fetahi Zebenigus
    KTH, School of Electrical Engineering (EES), Communication Networks. KTH, School of Electrical Engineering (EES), Centres, ACCESS Linnaeus Centre.
    Brief Announcement: The Accuracy of Tree-based Counting in Dynamic Networks2010In: PODC 2010: PROCEEDINGS OF THE 2010 ACM SYMPOSIUM ON PRINCIPLES OF DISTRIBUTED COMPUTING, NEW YORK: ASSOC COMPUTING MACHINERY , 2010, p. 291-292Conference paper (Refereed)
    Abstract [en]

    We study a simple Bellman-Ford-like protocol which performs network size estimation over a tree-shaped overlay. A continuous time Markov model is constructed which allows key protocol characteristics to be estimated under churn, including the expected number of nodes at a given (perceived) distance to the root and, for each such node, the expected (perceived) size of the subnetwork rooted at that node. We validate the model by simulations, using a range of network sizes, node degrees, and churn-to-protocol rates, with convincing results.

123 1 - 50 of 117
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf