Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A-GAP: An Adaptive Protocol for Continuous Network Monitoring with Accuracy Objectives
KTH, School of Electrical Engineering (EES), Communication Networks.
KTH, School of Electrical Engineering (EES), Communication Networks.
2007 (English)In: IEEE Transactions on Network and Service Management, ISSN 1932-4537, Vol. 4, no 1, 2-12 p.Article in journal (Refereed) Published
Abstract [en]

We present A-GAP, a novel protocol for continuous monitoring of network state variables, which aims at achieving a given monitoring accuracy with minimal overhead. Network state variables are computed from device counters using aggregation functions, such as SUM, AVERAGE and MAX. The accuracy objective is expressed as the average estimation error. A-GAP is decentralized and asynchronous to achieve robustness and scalability. It executes on an overlay that interconnects management processes on the devices. On this overlay, the protocol maintains a spanning tree and updates the network state variables through incremental aggregation. Based on a stochastic model, it dynamically configures local filters that control whether an update is sent towards the root of the tree. We evaluate A-GAP through simulation using real traces and two different types of topologies of up to 650 nodes. The results show that we can effectively control the trade-off between accuracy and protocol overhead, and that the overhead can be reduced by almost two orders of magnitude when allowing for small errors. The protocol quickly adapts to a node failure and exhibits short spikes in the estimation error. Lastly, it can provide an accurate estimate of the error distribution in real-time.

Place, publisher, year, edition, pages
2007. Vol. 4, no 1, 2-12 p.
Keyword [en]
Distributed management, real-time monitoring, large-scale distributed systems, adaptive systems
National Category
Engineering and Technology
Identifiers
URN: urn:nbn:se:kth:diva-9464DOI: 10.1109/TNSM.2007.030101Scopus ID: 2-s2.0-34547880763OAI: oai:DiVA.org:kth-9464DiVA: diva2:114071
Note
QC 20100727Available from: 2008-11-05 Created: 2008-11-05 Last updated: 2010-07-27Bibliographically approved
In thesis
1. Adaptive Real-time Monitoring for Large-scale Networked Systems
Open this publication in new window or tab >>Adaptive Real-time Monitoring for Large-scale Networked Systems
2008 (English)Doctoral thesis, comprehensive summary (Other scientific)
Abstract [en]

Large-scale networked systems, such as the Internet and server clusters, are omnipresent today. They increasingly deliver services that are critical to both businesses and the society at large, and therefore their continuous and correct operation must be guaranteed. Achieving this requires the realization of adaptive management systems, which continuously reconfigure such large-scale dynamic systems, in order to maintain their state near a desired operating point, despite changes in the networking conditions.The focus of this thesis is continuous real-time monitoring, which is essential for the realization of adaptive management systems in large-scale dynamic environments. Real-time monitoring provides the necessary input to the decision-making process of network management, enabling management systems to perform self-configuration and self-healing tasks.We have developed, implemented, and evaluated a design for real-time continuous monitoring of global metrics with performance objectives, such as monitoring overhead and estimation accuracy. Global metrics describe the state of the system as a whole, in contrast to local metrics, such as device counters or local protocol states, which capture the state of a local entity. Global metrics are computed from local metrics using aggregation functions, such as SUM, AVERAGE and MAX.Our approach is based on in-network aggregation, where global metrics are incrementally computed using spanning trees. Performance objectives are achieved through filtering updates to local metrics that are sent along that tree. A key part in the design is a model for the distributed monitoring process that relates performance metrics to parameters that tune the behavior of a monitoring protocol. The model allows us to describe the behavior of individual nodes in the spanning tree in their steady state. The model has been instrumental in designing a monitoring protocol that is controllable and achieves given performance objectives.We have evaluated our protocol, called A-GAP, experimentally, through simulation and testbed implementation. It has proved to be effective in meeting performance objectives, efficient, adaptive to changes in the networking conditions, controllable along different performance dimensions, and scalable. We have implemented a prototype on a testbed of commercial routers. The testbed measurements are consistent with simulation studies we performed for different topologies and network sizes. This proves the feasibility of the design, and, more generally, the feasibility of effective and efficient real-time monitoring in large network environments.

Place, publisher, year, edition, pages
Stockholm: KTH, 2008. 46 p.
Series
Trita-EE, ISSN 1653-5146 ; 2008:051
National Category
Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-9459 (URN)978-91-7415-168-8 (ISBN)
Public defence
2008-11-21, Salongen, KTHB,, Osquarsbacke 31, KTH, Stockholm, 10:00 (English)
Opponent
Supervisors
Note
QC 20100727Available from: 2008-11-05 Created: 2008-11-05 Last updated: 2010-07-27Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textScopus

Search in DiVA

By author/editor
Gonzalez Prieto, AlbertoStadler, Rolf
By organisation
Communication Networks
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 53 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf