Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Distributed Monitoring and Resource Management for Large Cloud Environments
KTH, School of Electrical Engineering (EES), Communication Networks. (Distributed Management)
2010 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Over the last decade, the number, size and complexity of large-scale networked systems has been growing fast, and this trend is expected to accelerate. The best known example of a large-scale networked system is probably the Internet, while large datacenters for cloud services are the most recent ones. In such environments, a key challenge is to develop scalable and adaptive technologies for management functions. This thesis addresses the challenge by engineering several protocols  for distributed monitoring and resource management that are suitable for large-scale networked systems. First, we present G-GAP, a gossip-based protocol we developed for continuous monitoring of aggregates that are computed from device variables. We prove the robustness of this protocol to node failures and validate, through simulations, that its estimation accuracy does not change with increasing size of the monitored system under certain conditions. Second, we present TCA-GAP, a tree-based protocol, and TG-GAP, a gossip-based protocol for the purpose of monitoring threshold crossings of aggregates. For both protocols, we prove correctness properties and show, again through simulations, that both protocols are efficient, by showing that their overhead is at least two orders of magnitude smaller than that of a na\"ive approach, for cases where the monitored aggregate is sufficiently far from the threshold. Third, we present a gossip-based protocol for resource management in cloud environments. The protocol allocates CPU and memory resources to sites that are hosted by the cloud. We prove that the resource allocation computed by the protocol converges exponentially fast to an optimal allocation, for cases where sufficient memory is available. Through simulations, we show that the quality of the resource allocation approaches that of an ideal system when the total memory demand decreases significantly below the memory capacity of the entire system. In addition, we validate that the quality of the allocation does not change with increasing the number of hosted sites and machines, for the case where both metrics are scaled proportionally. Finally, we compare two approaches (tree-based and gossip-based) to engineering protocols for distributed management, for the case of real-time monitoring. Results of our simulation studies indicate that, regardless of the system size and failure rates in the monitored system, gossip protocols incur a significantly larger overhead than tree-based protocols for achieving the same monitoring quality (e.g., estimation accuracy or detection delay).

Place, publisher, year, edition, pages
Stockholm: KTH , 2010. , vi, 26 p.
Series
Trita-EE, ISSN 1653-5146 ; 2010:051
Keyword [en]
decentralized management, engineering protocols, distributed monitoring, resource management
National Category
Telecommunications Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-26207ISBN: 978-91-7415-794-9 (print)OAI: oai:DiVA.org:kth-26207DiVA: diva2:371527
Public defence
2010-12-10, Q2, Osquldas väg 10, plan 2, KTH, Stockholm, 14:00 (English)
Opponent
Supervisors
Note
QC 20101124Available from: 2010-11-24 Created: 2010-11-21 Last updated: 2012-03-22Bibliographically approved
List of papers
1.
The record could not be found. The reason may be that the record is no longer available or you may have typed in a wrong id in the address field.
2.
The record could not be found. The reason may be that the record is no longer available or you may have typed in a wrong id in the address field.
3. Decentralized detection of global threshold crossings using aggregation trees
Open this publication in new window or tab >>Decentralized detection of global threshold crossings using aggregation trees
2008 (English)In: Computer Networks, ISSN 1389-1286, E-ISSN 1872-7069, Vol. 52, no 9, 1745-1761 p.Article in journal (Refereed) Published
Abstract [en]

The timely detection that a monitored variable has crossed a given threshold is a fundamental requirement for many network management applications. A challenge is the detection of threshold crossing of network-wide variables, which are computed from device counters across the network, using aggregation functions such as SUM, MAX and AVERAGE. This paper contains a detailed description and a comprehensive evaluation of TCA-GAP, a protocol for detecting threshold crossings of network-wide aggregates in a distributed way. Elements of its design include tree-based incremental aggregation for estimating the value of aggregates, a local hysteresis mechanism to reduce overhead and dynamic recomputation of local thresholds to ensure correctness. The protocol is evaluated through extensive simulation using real traces in scenarios with network sizes up to 5232 nodes. From the measurements, we conclude that the protocol is efficient in the sense that the overhead is negligible when the aggregate is far from the threshold. It is scalable as the protocol overhead is independent of the system size for the network sizes and scenario configurations considered. We demonstrate that the local hysteresis parameter can be used to control the tradeoff between protocol overhead and detection delay. We further report on results on how node failures impact overhead and detection quality of the protocol.

Place, publisher, year, edition, pages
Elsevier, 2008
Keyword
decentralized network management, threshold crossing alerts, real-time, monitoring, tree-based aggregation protocols
National Category
Telecommunications
Identifiers
urn:nbn:se:kth:diva-17634 (URN)10.1016/j.comnet.2008.02.015 (DOI)000257012600006 ()2-s2.0-43449096331 (Scopus ID)
Note
NOTICE: this is the author’s version of a work that was accepted for publication in . Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in PUBLICATION, VOL 52, ISSUE 9, 2008, DOI 10.1016/j.comnet.2008.02.015 QC 20100525 QC 20120213Available from: 2012-02-13 Created: 2010-08-05 Last updated: 2017-12-12Bibliographically approved
4. A Gossiping Protocol for Detecting Global Threshold Crossings
Open this publication in new window or tab >>A Gossiping Protocol for Detecting Global Threshold Crossings
2010 (English)In: IEEE Transactions on Network and Service Management, ISSN 1932-4537, E-ISSN 1932-4537, Vol. 7, no 1, 42-57 p.Article in journal (Refereed) Published
Abstract [en]

We investigate the use of gossip protocols for the detection of network-wide threshold crossings. Our design goals are low protocol overhead, small detection delay, low probability of false positives and negatives, scalability, robustness to node failures and controllability of the trade-off between overhead and detection delay. Based on push-synopses, a gossip protocol introduced by Kempe et al., we present a protocol that indicates whether a global aggregate of static local values is above or below a given threshold. For this protocol, we prove correctness and show that it converges to a state with no overhead when the aggregate is sufficiently far from the threshold. Then, we introduce an extension we call TG-GAP, a protocol that (1) executes in a dynamic network environment where local values change and (2) implements hysteresis behavior with upper and lower thresholds. Key elements of its design are the construction of snapshots of the global aggregate for threshold detection and a mechanism for synchronizing local states, both of which are realized through the underlying gossip protocol. Simulation studies suggest that TG-GAP is efficient in that the protocol overhead is minimal when the aggregate is sufficiently far from the threshold, that its overhead and the detection delay are largely independent on the system size, and that the tradeoff between overhead and detection quality can be effectively controlled. Lastly, we perform a comparative evaluation of TG-GAP against a tree-based protocol. We conclude that, for detecting global threshold crossings in the type of scenarios investigated, the tree-based protocol incurs a significantly lower overhead and a smaller detection delay than a gossip protocol such as TG-GAP.

Place, publisher, year, edition, pages
IEEE, 2010
Keyword
Distributed monitoring, threshold detection, gossip protocol
National Category
Computer Systems Communication Systems Telecommunications
Identifiers
urn:nbn:se:kth:diva-86051 (URN)10.1109/TNSM.2010.I9P0329 (DOI)2-s2.0-77249131803 (Scopus ID)
Note

© 2010 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. QC 20120215

Available from: 2012-02-15 Created: 2012-02-13 Last updated: 2017-12-07Bibliographically approved
5. Gossip-based Resource Management for Cloud Environments
Open this publication in new window or tab >>Gossip-based Resource Management for Cloud Environments
2010 (English)In: International Conference on Network and Service Management, 2010, 1-8 p.Conference paper, Published paper (Refereed)
Abstract [en]

We address the problem of resource management for a large-scale cloud environment that hosts sites. Our contribution centers around outlining a distributed middleware architecture and presenting one of its key elements, a gossip protocol that meets our design goals: fairness of resource allocation with respect to hosted sites, efficient adaptation to load changes and scalability in terms of both the number of machines and sites. We formalize the resource allocation problem as that of dynamically maximizing the cloud utility under CPU and memory constraints. While we can show that an optimal solution without considering memory constraints is straightforward (but not useful), we provide an efficient heuristic solution for the complete problem instead. We evaluate the protocol through simulation and find its performance to be well-aligned with our design goals.

Keyword
cloud computing, distributed management, resource allocation, gossip protocols
National Category
Telecommunications Computer Systems Communication Systems
Identifiers
urn:nbn:se:kth:diva-26205 (URN)10.1109/CNSM.2010.5691347 (DOI)2-s2.0-79951608881 (Scopus ID)
Conference
International Conference on Network and Service Management, Niagara Falls, ON, Canada, 25-29 Oct. 2010
Note
“© 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.”QC 20120124Available from: 2010-11-21 Created: 2010-11-21 Last updated: 2012-03-12Bibliographically approved

Open Access in DiVA

fulltext(649 kB)1946 downloads
File information
File name FULLTEXT02.pdfFile size 649 kBChecksum SHA-512
179763200c5320e4c94301f14e2892d5f97c3912e3f74080eceb6e02a576a9570905a40830656d51c3c5db1e46a4e3837ca63a2da4512d8105214945d06ed12d
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Wuhib, Fetahi Zebenigus
By organisation
Communication Networks
TelecommunicationsComputer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 1946 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 611 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf