Change search
Link to record
Permanent link

Direct link
BETA
Alternative names
Publications (10 of 56) Show all publications
Farshin, A., Roozbeh, A., Maguire Jr., G. Q. & Kostic, D. (2019). Make the Most out of Last Level Cache in Intel Processors. In: Proceedings of the Fourteenth EuroSys Conference (EuroSys'19), Dresden, Germany, 25-28 March 2019.: . Paper presented at EuroSys'19. ACM Digital Library
Open this publication in new window or tab >>Make the Most out of Last Level Cache in Intel Processors
2019 (English)In: Proceedings of the Fourteenth EuroSys Conference (EuroSys'19), Dresden, Germany, 25-28 March 2019., ACM Digital Library, 2019Conference paper, Published paper (Refereed)
Abstract [en]

In modern (Intel) processors, Last Level Cache (LLC) is divided into multiple slices and an undocumented hashing algorithm (aka Complex Addressing) maps different parts of memory address space among these slices to increase the effective memory bandwidth. After a careful study of Intel’s Complex Addressing, we introduce a slice-aware memory management scheme, wherein frequently used data can be accessed faster via the LLC. Using our proposed scheme, we show that a key-value store can potentially improve its average performance ∼12.2% and ∼11.4% for 100% & 95% GET workloads, respectively. Furthermore, we propose CacheDirector, a network I/O solution which extends Direct Data I/O (DDIO) and places the packet’s header in the slice of the LLC that is closest to the relevant processing core. We implemented CacheDirector as an extension to DPDK and evaluated our proposed solution for latency-critical applications in Network Function Virtualization (NFV) systems. Evaluation results show that CacheDirector makes packet processing faster by reducing tail latencies (90-99th percentiles) by up to 119 µs (∼21.5%) for optimized NFV service chains that are running at 100 Gbps. Finally, we analyze the effectiveness of slice-aware memory management to realize cache isolation

Place, publisher, year, edition, pages
ACM Digital Library, 2019
Keywords
Slice-aware Memory Management, Last Level Cache, Non-Uniform Cache Architecture, CacheDirector, DDIO, DPDK, Network Function Virtualization, Cache Partitioning, Cache Allocation Technology, Key-Value Store.
National Category
Communication Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-244750 (URN)10.1145/3302424.3303977 (DOI)000470898700008 ()2-s2.0-85063919722 (Scopus ID)9781450362818 (ISBN)
Conference
EuroSys'19
Projects
Time-Critical CloudsULTRAWASP
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)Swedish Foundation for Strategic Research EU, Horizon 2020, 770889
Note

QC 20190226

Available from: 2019-02-24 Created: 2019-02-24 Last updated: 2019-07-29Bibliographically approved
Liu, S., Steinert, R. & Kostic, D. (2018). Control under Intermittent Network Partitions. In: 2018 IEEE International Conference on Communications (ICC): . Paper presented at 2018 IEEE International Conference on Communications, ICC 2018, Kansas City, United States, 20 May 2018 through 24 May 2018. Institute of Electrical and Electronics Engineers (IEEE), Article ID 8422615.
Open this publication in new window or tab >>Control under Intermittent Network Partitions
2018 (English)In: 2018 IEEE International Conference on Communications (ICC), Institute of Electrical and Electronics Engineers (IEEE), 2018, article id 8422615Conference paper, Published paper (Refereed)
Abstract [en]

We propose a novel distributed leader election algorithm to deal with the controller and control service availability issues in programmable networks, such as Software Defined Networks (SDN) or programmable Radio Access Network (RAN). Our approach can deal with a wide range of network failures, especially intermittent network partitions, where splitting and merging of a network repeatedly occur. In contrast to traditional leader election algorithms that mainly focus on the (eventual) consensus on one leader, the proposed algorithm aims at optimizing control service availability, stability and reducing the controller state synchronization effort during intermittent network partitioning situations. To this end, we design a new framework that enables dynamic leader election based on real-time estimates acquired from statistical monitoring. With this framework, the proposed leader election algorithm has the capability of being flexibly configured to achieve different optimization objectives, while adapting to various failure patterns. Compared with two existing algorithms, our approach can significantly reduce the synchronization overhead (up to 12x) due to controller state updates, and maintain up to twice more nodes under a controller.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2018
National Category
Communication Systems
Identifiers
urn:nbn:se:kth:diva-234101 (URN)10.1109/ICC.2018.8422615 (DOI)2-s2.0-85051430600 (Scopus ID)9781538631805 (ISBN)
Conference
2018 IEEE International Conference on Communications, ICC 2018, Kansas City, United States, 20 May 2018 through 24 May 2018
Note

QC 20180905

Available from: 2018-09-05 Created: 2018-09-05 Last updated: 2018-09-05Bibliographically approved
Bogdanov, K., Reda, W., Maguire Jr., G. Q., Kostic, D. & Canini, M. (2018). Fast and accurate load balancing for geo-distributed storage systems. In: SoCC 2018 - Proceedings of the 2018 ACM Symposium on Cloud Computing: . Paper presented at 2018 ACM Symposium on Cloud Computing, SoCC 2018, Carlsbad, United States, 11 October 2018 through 13 October 2018 (pp. 386-400). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Fast and accurate load balancing for geo-distributed storage systems
Show others...
2018 (English)In: SoCC 2018 - Proceedings of the 2018 ACM Symposium on Cloud Computing, Association for Computing Machinery (ACM), 2018, p. 386-400Conference paper, Published paper (Refereed)
Abstract [en]

The increasing density of globally distributed datacenters reduces the network latency between neighboring datacenters and allows replicated services deployed across neighboring locations to share workload when necessary, without violating strict Service Level Objectives (SLOs). We present Kurma, a practical implementation of a fast and accurate load balancer for geo-distributed storage systems. At run-time, Kurma integrates network latency and service time distributions to accurately estimate the rate of SLO violations for requests redirected across geo-distributed datacenters. Using these estimates, Kurma solves a decentralized rate-based performance model enabling fast load balancing (in the order of seconds) while taming global SLO violations. We integrate Kurma with Cassandra, a popular storage system. Using real-world traces along with a geo-distributed deployment across Amazon EC2, we demonstrate Kurma’s ability to effectively share load among datacenters while reducing SLO violations by up to a factor of 3 in high load settings or reducing the cost of running the service by up to 17%.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2018
Keywords
Cloud Computing, Distributed Systems, Server Load Balancing, Service Level Objectives, Wide Area Networks
National Category
Communication Systems
Identifiers
urn:nbn:se:kth:diva-241481 (URN)10.1145/3267809.3267820 (DOI)2-s2.0-85059006718 (Scopus ID)9781450360111 (ISBN)
Conference
2018 ACM Symposium on Cloud Computing, SoCC 2018, Carlsbad, United States, 11 October 2018 through 13 October 2018
Funder
EU, Horizon 2020, 770889Swedish Foundation for Strategic Research
Note

QC 20190123

Available from: 2019-01-23 Created: 2019-01-23 Last updated: 2019-04-29Bibliographically approved
Liu, S., Steinert, R. & Kostic, D. (2018). Flexible distributed control plane deployment. In: IEEE/IFIP Network Operations and Management Symposium: Cognitive Management in a Cyber World, NOMS 2018. Paper presented at 2018 IEEE/IFIP Network Operations and Management Symposium, NOMS 2018, 23 April 2018 through 27 April 2018 (pp. 1-7). Institute of Electrical and Electronics Engineers Inc.
Open this publication in new window or tab >>Flexible distributed control plane deployment
2018 (English)In: IEEE/IFIP Network Operations and Management Symposium: Cognitive Management in a Cyber World, NOMS 2018, Institute of Electrical and Electronics Engineers Inc. , 2018, p. 1-7Conference paper, Published paper (Refereed)
Abstract [en]

For large-scale programmable networks, flexible deployment of distributed control planes is essential for service availability and performance. However, existing approaches only focus on placing controllers whereas the consequent control traffic is often ignored. In this paper, we propose a black-box optimization framework offering the additional steps for quanti-fying the effect of the consequent control traffic when deploying a distributed control plane. Evaluating different implementations of the framework over real-world topologies shows that close to optimal solutions can be achieved. Moreover, experiments indicate that running a method for controller placement without considering the control traffic, cause excessive bandwidth usage (worst cases varying between 20.1%-50.1% more) and congestion, compared to our approach. © 2018 IEEE.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers Inc., 2018
Keywords
Optimization, Traffic congestion, Bandwidth usage, Black-box optimization, Control traffic, Controller placements, Distributed control planes, Optimal solutions, Programmable network, Service availability, Controllers
National Category
Telecommunications
Identifiers
urn:nbn:se:kth:diva-238085 (URN)10.1109/NOMS.2018.8406150 (DOI)2-s2.0-85050656041 (Scopus ID)9781538634165 (ISBN)
Conference
2018 IEEE/IFIP Network Operations and Management Symposium, NOMS 2018, 23 April 2018 through 27 April 2018
Note

Conference code: 137784; Export Date: 30 October 2018; Conference Paper

QC 20190111

Available from: 2019-01-11 Created: 2019-01-11 Last updated: 2019-01-11Bibliographically approved
Bogdanov, K., Reda, W., Kostic, D., Maguire Jr., G. Q. & Canini, M. (2018). Kurma: Fast and Efficient Load Balancing for Geo-Distributed Storage Systems: Evaluation of Convergence and Scalability.
Open this publication in new window or tab >>Kurma: Fast and Efficient Load Balancing for Geo-Distributed Storage Systems: Evaluation of Convergence and Scalability
Show others...
2018 (English)Report (Other academic)
Abstract [en]

This report provides an extended evaluation of Kurma, a practical implementation of a geo-distributed load balancer for backend storage systems. In this report we demonstrate the ability of distributed Kurma instances to accurately converge to the same solutions within 1% of the total datacenter’s capacity and the ability of Kurma to scale up to 8 datacenters using a single CPU core at each datacenter.

National Category
Communication Systems
Identifiers
urn:nbn:se:kth:diva-222289 (URN)
Note

QR 20180212

Available from: 2018-02-05 Created: 2018-02-05 Last updated: 2018-02-12Bibliographically approved
Roozbeh, A., Soares, J., Maguire Jr., G. Q., Wuhib, F., Padala, C., Mahloo, M., . . . Kostic, D. (2018). Software-Defined "Hardware" Infrastructures: A Survey on Enabling Technologies and Open Research Directions. IEEE Communications Surveys and Tutorials, 20(3), 2454-2485
Open this publication in new window or tab >>Software-Defined "Hardware" Infrastructures: A Survey on Enabling Technologies and Open Research Directions
Show others...
2018 (English)In: IEEE Communications Surveys and Tutorials, ISSN 1553-877X, E-ISSN 1553-877X, Vol. 20, no 3, p. 2454-2485Article in journal (Refereed) Published
Abstract [en]

This paper provides an overview of software-defined "hardware" infrastructures (SDHI). SDHI builds upon the concept of hardware (HW) resource disaggregation. HW resource disaggregation breaks today's physical server-oriented model where the use of a physical resource (e.g., processor or memory) is constrained to a physical server's chassis. SDHI extends the definition of of software-defined infrastructures (SDI) and brings greater modularity, flexibility, and extensibility to cloud infrastructures, thus allowing cloud operators to employ resources more efficiently and allowing applications not to be bounded by the physical infrastructure's layout. This paper aims to be an initial introduction to SDHI and its associated technological advancements. This paper starts with an overview of the cloud domain and puts into perspective some of the most prominent efforts in the area. Then, it presents a set of differentiating use-cases that SDHI enables. Next, we state the fundamentals behind SDI and SDHI, and elaborate why SDHI is of great interest today. Moreover, it provides an overview of the functional architecture of a cloud built on SDHI, exploring how the impact of this transformation goes far beyond the cloud infrastructure level in its impact on platforms, execution environments, and applications. Finally, an in-depth assessment is made of the technologies behind SDHI, the impact of these technologies, and the associated challenges and potential future directions of SDHI.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2018
Keywords
CR-software-defined infrastructure, resource disaggregation, cloud infrastructure, rack-scale, hyperscale computing, disaggregated DC
National Category
Communication Systems
Identifiers
urn:nbn:se:kth:diva-235270 (URN)10.1109/COMST.2018.2834731 (DOI)000443030500033 ()2-s2.0-85046804138 (Scopus ID)
Funder
Swedish Foundation for Strategic Research Knut and Alice Wallenberg Foundation
Note

QC 20180919

Available from: 2018-09-19 Created: 2018-09-19 Last updated: 2018-11-23Bibliographically approved
Peresini, P., Kuzniar, M. & Kostic, D. (2015). Monocle: Dynamic, Fine-Grained Data Plane Monitoring. In: Proceedings of the 11th International Conference on emerging Networking EXperiments and Technologies (ACM CoNEXT): . Paper presented at The 11th International Conference on emerging Networking EXperiments and Technologies (ACM CoNEXT). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Monocle: Dynamic, Fine-Grained Data Plane Monitoring
2015 (English)In: Proceedings of the 11th International Conference on emerging Networking EXperiments and Technologies (ACM CoNEXT), Association for Computing Machinery (ACM), 2015Conference paper, Published paper (Refereed)
Abstract [en]

Ensuring network reliability is important for satisfying service-level objectives. However, diagnosing network anomalies in a timely fashion is difficult due to the complex nature of network configurations. We present Monocle — a system that uncovers forwarding problems due to hardware or software failures in switches, by verifying that the data plane corresponds to the view that an SDN controller installs via the control plane. Monocle works by systematically probing the switch data plane; the probes are constructed by formulating the switch forwarding table logic as a Boolean satisfiability (SAT) problem. Our SAT formulation quickly generates probe packets targeting a particular rule considering both existing and new rules. Monocle can monitor not only static flow tables (as is currently typically the case), but also dynamic networks with frequent flow table changes. Our evaluation shows that Monocle is capable of finegrained monitoring for the majority of rules, and it can identify a rule suddenly missing from the data plane or misbehaving in a matter of seconds. Also, during network updates Monocle helps controllers cope with switches that exhibit transient inconsistencies between their control and data plane states

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2015
National Category
Communication Systems Computer Sciences
Identifiers
urn:nbn:se:kth:diva-176478 (URN)978-1-4503-3412-9 (ISBN)
Conference
The 11th International Conference on emerging Networking EXperiments and Technologies (ACM CoNEXT)
Funder
EU, European Research Council, 259110
Note

QC 20151110

Available from: 2015-11-05 Created: 2015-11-05 Last updated: 2018-01-10Bibliographically approved
Peresini, P., Kuzniar, M. & Kostic, D. (2015). Rule-Level Data Plane Monitoring With Monocle. Computer communication review, 45(4), 595-596
Open this publication in new window or tab >>Rule-Level Data Plane Monitoring With Monocle
2015 (English)In: Computer communication review, ISSN 0146-4833, E-ISSN 1943-5819, Vol. 45, no 4, p. 595-596Article in journal (Refereed) Published
Abstract [en]

We present Monocle, a system that systematically monitors the network data plane, and verifies that it corresponds to the view that the SDN controller builds and tries to enforce in the switches. Our evaluation shows that Monocle is capable of fine-grained per-rule monitoring for the majority of rules. In addition, it can help controllers to cope with switches that exhibit transient inconsistencies between their control plane and data plane states.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2015
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-184991 (URN)10.1145/2829988.2790012 (DOI)000370556200077 ()2-s2.0-84962326807 (Scopus ID)
Funder
EU, European Research Council
Note

QC 20160407

Available from: 2016-04-07 Created: 2016-04-07 Last updated: 2018-01-10Bibliographically approved
Peresini, P., Kuzniar, M., Canini, M., Venzano, D., Kostic, D. & Rexford, J. (2015). Systematically Testing OpenFlow Controller Applications. Computer Networks, 92
Open this publication in new window or tab >>Systematically Testing OpenFlow Controller Applications
Show others...
2015 (English)In: Computer Networks, ISSN 1389-1286, E-ISSN 1872-7069, Vol. 92Article in journal (Refereed) Published
Abstract [en]

The emergence of OpenFlow-capable switches enables exciting new network functionality, at the risk of programming errors that make communication less reliable. The centralized programming model, where a single controller program manages the network, seems to reduce the likelihood of bugs. However, the system is inherently distributed and asynchronous, with events happening at different switches and end hosts, and inevitable delays affecting communication with the controller. In this paper, we present efficient, systematic techniques for testing unmodified controller programs. Our NICE tool applies model checking to explore the state space of the entire system—the controller, the switches, and the hosts. Scalability is the main challenge, given the diversity of data packets, the large system state, and the many possible event orderings. To address this, we propose a novel way to augment model checking with symbolic execution of event handlers (to identify representative packets that exercise code paths on the controller). We also present a simplified OpenFlow switch model (to reduce the state space), and effective strategies for generating event interleavings likely to uncover bugs. Our prototype tests Python applications on the popular NOX platform. In testing three real applications—a MAC-learning switch, in-network server load balancing, and energy-efficient traffic engineering—we uncover thirteen bugs

Place, publisher, year, edition, pages
Elsevier, 2015
National Category
Communication Systems Computer Sciences
Identifiers
urn:nbn:se:kth:diva-176481 (URN)10.1016/j.comnet.2015.03.019 (DOI)000366785500007 ()2-s2.0-84948569552 (Scopus ID)
Funder
EU, European Research Council, 259110
Note

QC 20151110

Available from: 2015-11-05 Created: 2015-11-05 Last updated: 2018-01-10Bibliographically approved
Bogdanov, K., Peón-Quirós, M., Maguire Jr., G. Q. & Kostic, D. (2015). The Nearest Replica Can Be Farther Than You Think. In: Proceedings of the ACM Symposium on Cloud Computing 2015: . Paper presented at ACM Symposium on Cloud Computing August 27 - 29, 2015,Hawaii (pp. 16-29). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>The Nearest Replica Can Be Farther Than You Think
2015 (English)In: Proceedings of the ACM Symposium on Cloud Computing 2015, Association for Computing Machinery (ACM), 2015, p. 16-29Conference paper, Published paper (Refereed)
Abstract [en]

Modern distributed systems are geo-distributed for reasons of increased performance, reliability, and survivability. At the heart of many such systems, e.g., the widely used Cassandra and MongoDB data stores, is an algorithm for choosing a closest set of replicas to service a client request. Suboptimal replica choices due to dynamically changing network conditions result in reduced performance as a result of increased response latency. We present GeoPerf, a tool that tries to automate the process of systematically testing the performance of replica selection algorithms for geodistributed storage systems. Our key idea is to combine symbolic execution and lightweight modeling to generate a set of inputs that can expose weaknesses in replica selection. As part of our evaluation, we analyzed network round trip times between geographically distributed Amazon EC2 regions, and showed a significant number of daily changes in nearestK replica orders. We tested Cassandra and MongoDB using our tool, and found bugs in each of these systems. Finally, we use our collected Amazon EC2 latency traces to quantify the time lost due to these bugs. For example due to the bug in Cassandra, the median wasted time for 10% of all requests is above 50 ms.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2015
Keywords
Geo-Distributed Systems, Replica Selection Algorithms, Symbolic Execution
National Category
Communication Systems Computer Systems Computer Sciences
Identifiers
urn:nbn:se:kth:diva-171434 (URN)10.1145/2806777.2806939 (DOI)000380606400002 ()2-s2.0-84958960133 (Scopus ID)
External cooperation:
Conference
ACM Symposium on Cloud Computing August 27 - 29, 2015,Hawaii
Funder
EU, European Research Council, 259110
Note

To obtain the data used in this work please contact dmk@kth.se and kirillb@kth.se.

QC 20150812

Available from: 2015-08-03 Created: 2015-08-03 Last updated: 2018-10-07Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-1256-1070

Search in DiVA

Show all publications