kth.sePublications
Change search
Link to record
Permanent link

Direct link
Publications (8 of 8) Show all publications
Reda, W. (2022). Accelerating Distributed Storage in Heterogeneous Settings. (Doctoral dissertation). Kista, Stockholm, Sweden: KTH Royal Institute of Technology
Open this publication in new window or tab >>Accelerating Distributed Storage in Heterogeneous Settings
2022 (English)Doctoral thesis, monograph (Other academic)
Abstract [en]

Heterogeneity in cloud environments is a fact of life—from workload skews and network path changes, to the diversity of server hardware components, these are all factors that impact the performance of distributed storage. In this dissertation, we identify that heterogeneity can in fact be one of the primary causes of service degradation for storage systems. We then tackle this challenge by building next-generation distributed storage systems that can operate amidst heterogeneity while providing fast and predictable response times. First, we study skews in cloud workloads and propose scheduling strategies for key-value stores that seek to optimize latency. We then conduct a measurements study in one of the largest cloud provider networks to quantify variations in network latencies, and possible implications for storage services. Next, with fast non-volatile RAM (NVRAM) now becoming commercially available, we look into how storage systems can deal with the increasing diversity of storage technologies. We design and evaluate a distributed file system that can manage data across NVRAM and other types of storage, while providing low latency and high scalability. Lastly, we build a framework that transforms commodity Remote Direct Memory Access (RDMA) NICs into Turing machines—capable of performing arbitrary computations. This provides yet another compute resource on server machines, and we show how we can leverage it to accelerate common storage tasks as well as real storage applications.

Abstract [sv]

Heterogenitet i molnmiljöer är ett livsfaktum—allt från skevheter i arbetsbelastningen och nätverksvägsförändringar till diversitet av serverhårdvarukomponenter, dessa är faktorer som påverkar prestanda för distribuerad lagring. I denna avhandling identifierar vi att heterogenitet faktiskt kan vara en av de främsta orsakerna till tjänste-degradering för lagringssystem. Vi tacklar sedan denna utmaning genom att bygga nästa generations distribuerade lagringssystem som kan fungera mitt i heterogenitet medan de ger snabba och förutsägbara svarstider. Först studerar vi skevheter i molnarbetsbelastningar och föreslår schemaläggningsstrategier för nyckelvärdelagring som försöker optimera latens. Vi utför sedan en mätstudie i ett av de största molnleverantörsnätverken för att kvantifiera variationer i nätverkslatenser och möjliga implikationer för lagringstjänster. Därefter, i och med att snabbt non-volatile RAM (NVRAM) nu blir kommersiellt tillgängligt, undersöker vi hur lagringssystem kan hantera den ökande diversiteten av lagringsteknik. Vi designar och utvärderar ett distribuerat filsystem som kan hantera data tvärs över NVRAM och andra typer av lagring, medan det ger låg latens och hög skalbarhet. Slutligen bygger vi ett ramverk som transformerar lättillgänglig Remote Direct Memory Access (RDMA) NICs till Turingmaskiner—som kan utföra godtyckliga beräkningar. Detta ger ännu en beräkningsresurs på servrar, och vi visar hur vi kan utnyttja den för att accelerera gemensamma lagringsuppgifter såväl som verkliga lagringsapplikationer.

Place, publisher, year, edition, pages
Kista, Stockholm, Sweden: KTH Royal Institute of Technology, 2022. p. 177
Series
TRITA-EECS-AVL ; 2022:32
Keywords
Distributed storage, File systems, Hardware offload, Persistent memory, Distribuerad lagring, Dilsystem, Maskinvaruavlastning, Beständigt minne
National Category
Computer Systems Communication Systems Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-311963 (URN)978-91-8040-230-9 (ISBN)
Public defence
2022-05-30, https://kth-se.zoom.us/meeting/register/u5Iqd-yhqTorGNLFoosJXIJmTXZl3rsxS55J, Sal C, Electrum, Kungliga Tekniska Högskolan, Kistagången 16, Kista, Stockholm, 15:00 (English)
Opponent
Supervisors
Funder
EU, European Research Council, 770889
Note

This work was also supported by a fellowship from the Erasmus Mundus Joint Doctorate in Distributed Computing (EMJD-DC), funded by the European Commission (EACEA) (FPA 2012-0030). QC 20220509

Available from: 2022-05-09 Created: 2022-05-05 Last updated: 2022-06-25Bibliographically approved
Reda, W., Canini, M., Kostic, D. & Peter, S. (2022). RDMA is Turing complete, we just did not know it yet!. In: Proceedings of the 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI'22): . Paper presented at 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI), APR 04-06, 2022, Renton, WA (pp. 71-85). Renton, WA, USA: USENIX - The Advanced Computing Systems Association
Open this publication in new window or tab >>RDMA is Turing complete, we just did not know it yet!
2022 (English)In: Proceedings of the 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI'22), Renton, WA, USA: USENIX - The Advanced Computing Systems Association, 2022, p. 71-85Conference paper, Published paper (Refereed)
Abstract [en]

It is becoming increasingly popular for distributed systems to exploit offload to reduce load on the CPU. Remote Direct Memory Access (RDMA) offload, in particular, has become popular. However, RDMA still requires CPU intervention for complex offloads that go beyond simple remote memory access. As such, the offload potential is limited and RDMA-based systems usually have to work around such limitations.

We present RedN, a principled, practical approach to implementing complex RDMA offloads, without requiring any hardware modifications. Using self-modifying RDMA chains, we lift the existing RDMA verbs interface to a Turing complete set of programming abstractions. We explore what is possible in terms of offload complexity and performance with a commodity RDMA NIC. We show how to integrate these RDMA chains into applications, such as the Memcached key-value store, allowing us to offload complex tasks such as key lookups. RedN can reduce the latency of key-value get operations by up to 2.6× compared to state-of-the-art KV designs that use one-sided RDMA primitives (e.g., FaRM-KV), as well as traditional RPC-over-RDMA approaches. Moreover, compared to these baselines, RedN provides performance isolation and, in the presence of contention,can reduce latency by up to 35× while providing applications with failure resiliency to OS and process crashes.

Place, publisher, year, edition, pages
Renton, WA, USA: USENIX - The Advanced Computing Systems Association, 2022
Keywords
distributed systems, storage systems, RDMA, network hardware
National Category
Computer Systems Communication Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-309404 (URN)000876762200005 ()2-s2.0-85138055291 (Scopus ID)
Conference
19th USENIX Symposium on Networked Systems Design and Implementation (NSDI), APR 04-06, 2022, Renton, WA
Funder
EU, European Research Council, 770889
Note

Part of proceedings: ISBN 978-1-939133-27-4

QC 20220317

Available from: 2022-03-02 Created: 2022-03-02 Last updated: 2024-03-18Bibliographically approved
Kim, J., Jang, I., Reda, W., Im, J., Canini, M., Kostic, D., . . . Witchel, E. (2021). LineFS: Efficient SmartNIC Offload of a Distributed File System with Pipeline Parallelism. In: ACM SIGOPS 28th Symposium on Operating Systems Principles: . Paper presented at SOSP.
Open this publication in new window or tab >>LineFS: Efficient SmartNIC Offload of a Distributed File System with Pipeline Parallelism
Show others...
2021 (English)In: ACM SIGOPS 28th Symposium on Operating Systems Principles, 2021Conference paper, Published paper (Refereed)
Abstract [en]

In multi-tenant systems, the CPU overhead of distributed file systems (DFSes) is increasingly a burden to application performance. CPU and memory interference cause degraded and unstable application and storage performance, in particular for operation latency. Recent client-local DFSes for persistent memory (PM) accelerate this trend. DFS offload to SmartNICs is a promising solution to these problems, but it is challenging to fit the complex demands of a DFS onto simple SmartNIC processors located across PCIe.

We present LineFS, a SmartNIC-offloaded, high-performance DFS with support for client-local PM. To fully leverage the SmartNIC architecture, we decompose DFS operations into execution stages that can be offloaded to a parallel datapath execution pipeline on the SmartNIC. LineFS offloads CPU-intensive DFS tasks, like replication, compression, data publication, index and consistency management to a Smart-NIC. We implement LineFS on the Mellanox BlueField Smart-NIC and compare it to Assise, a state-of-the-art PM DFS. LineFS improves latency in LevelDB up to 80% and throughput in Filebench up to 79%, while providing extended DFS availability during host system failures.

Keywords
cloud computing, data centers, distributed file systems, storage systems, operating systems, networking
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-304425 (URN)10.1145/3477132.3483565 (DOI)2-s2.0-85118308655 (Scopus ID)
Conference
SOSP
Funder
EU, European Research Council, 770889
Note

Part of proceedings: ISBN 9781450387095, QC 20230117

Available from: 2021-11-04 Created: 2021-11-04 Last updated: 2024-03-18Bibliographically approved
Anderson, T., Canini, M., Kim, J., Kostic, D., Kwon, Y., Peter, S., . . . Witchel, E. (2020). Assise: Performance and Availability via Client-local NVM in a Distributed File System. In: USENIX Association (Ed.), : . Paper presented at 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20) November 4–6, 2020 (pp. 1011--1027). USENIX - The Advanced Computing Systems Association
Open this publication in new window or tab >>Assise: Performance and Availability via Client-local NVM in a Distributed File System
Show others...
2020 (English)In: / [ed] USENIX Association, USENIX - The Advanced Computing Systems Association, 2020, p. 1011--1027Conference paper, Published paper (Refereed)
Abstract [en]

The adoption of low latency persistent memory modules (PMMs) upends the long-established model of remote storage for distributed file systems. Instead, by colocating computation with PMM storage, we can provide applications with much higher IO performance, sub-second application failover, and strong consistency. To demonstrate this, we built the Assise distributed file system, based on a persistent, replicated coherence protocol that manages client-local PMM as a linearizable and crash-recoverable cache between applications and slower (and possibly remote) storage. Assise maximizes locality for all file IO by carrying out IO on process-local, socket-local, and client-local PMM whenever possible. Assise minimizes coherence overhead by maintaining consistency at IO operation granularity, rather than at fixed block sizes.

We compare Assise to Ceph/BlueStore, NFS, and Octopus on a cluster with Intel Optane DC PMMs and SSDs for common cloud applications and benchmarks, such as LevelDB, Postfix, and FileBench. We find that Assise improves write latency up to 22x, throughput up to 56x, fail-over time up to 103x, and scales up to 6x better than its counterparts, while providing stronger consistency semantics.

Place, publisher, year, edition, pages
USENIX - The Advanced Computing Systems Association, 2020
Keywords
non-volatile memory, distributed storage, file systems, data centers, reliability, fault tolerance, scalability
National Category
Computer Systems Computer Sciences
Identifiers
urn:nbn:se:kth:diva-285598 (URN)000668979500057 ()
Conference
14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20) November 4–6, 2020
Funder
EU, European Research Council, 770889
Note

QC 20201109

Available from: 2020-11-07 Created: 2020-11-07 Last updated: 2024-03-18Bibliographically approved
Reda, W., Bogdanov, K., Milolidakis, A., Ghasemirahni, H., Chiesa, M., Maguire Jr., G. Q. & Kostic, D. (2020). Path Persistence in the Cloud: A Study of the Effects of Inter-Region Traffic Engineering in a Large Cloud Provider's Network. Computer communication review, 50(2), 11-23
Open this publication in new window or tab >>Path Persistence in the Cloud: A Study of the Effects of Inter-Region Traffic Engineering in a Large Cloud Provider's Network
Show others...
2020 (English)In: Computer communication review, ISSN 0146-4833, E-ISSN 1943-5819, Vol. 50, no 2, p. 11-23Article in journal, Editorial material (Refereed) Published
Abstract [en]

A commonly held belief is that traffic engineering and routing changes are infrequent. However, based on our measurements over a number of years of traffic between data centers in one of the largest cloud provider's networks, we found that it is common for flows to change paths at ten-second intervals or even faster. These frequent path and, consequently, latency variations can negatively impact the performance of cloud applications, specifically, latency-sensitive and geo-distributed applications.

Our recent measurements and analysis focused on observing path changes and latency variations between different Amazon aws regions. To this end, we devised a path change detector that we validated using both ad hoc experiments and feedback from cloud networking experts. The results provide three main insights: (1) Traffic Engineering (TE) frequently moves (TCP and UDP) flows among network paths of different latency, (2) Flows experience unfair performance, where a subset of flows between two machines can suffer large latency penalties (up to 32% at the 95th percentile) or excessive number of latency changes, and (3) Tenants may have incentives to selfishly move traffic to low latency classes (to boost the performance of their applications). We showcase this third insight with an example using rsync synchronization.

To the best of our knowledge, this is the first paper to reveal the high frequency of TE activity within a large cloud provider's network. Based on these observations, we expect our paper to spur discussions and future research on how cloud providers and their tenants can ultimately reconcile their independent and possibly conflicting objectives. Our data is publicly available for reproducibility and further analysis at http://goo.gl/25BKte.

Place, publisher, year, edition, pages
New York, NY, United States: Association for Computing Machinery (ACM), 2020
Keywords
Traffic engineering, Cloud provider networks, Inter-datacenter traffic, Latency
National Category
Communication Systems Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-273758 (URN)10.1145/3402413.3402416 (DOI)000582604500005 ()2-s2.0-85086379414 (Scopus ID)
Funder
Swedish Foundation for Strategic Research , TCCEU, European Research Council, 770889
Note

QC 20200602

Available from: 2020-05-28 Created: 2020-05-28 Last updated: 2024-03-18Bibliographically approved
Bogdanov, K., Reda, W., Maguire Jr., G. Q., Kostic, D. & Canini, M. (2018). Fast and accurate load balancing for geo-distributed storage systems. In: SoCC 2018 - Proceedings of the 2018 ACM Symposium on Cloud Computing: . Paper presented at 2018 ACM Symposium on Cloud Computing, SoCC 2018, Carlsbad, United States, 11 October 2018 through 13 October 2018 (pp. 386-400). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Fast and accurate load balancing for geo-distributed storage systems
Show others...
2018 (English)In: SoCC 2018 - Proceedings of the 2018 ACM Symposium on Cloud Computing, Association for Computing Machinery (ACM), 2018, p. 386-400Conference paper, Published paper (Refereed)
Abstract [en]

The increasing density of globally distributed datacenters reduces the network latency between neighboring datacenters and allows replicated services deployed across neighboring locations to share workload when necessary, without violating strict Service Level Objectives (SLOs). We present Kurma, a practical implementation of a fast and accurate load balancer for geo-distributed storage systems. At run-time, Kurma integrates network latency and service time distributions to accurately estimate the rate of SLO violations for requests redirected across geo-distributed datacenters. Using these estimates, Kurma solves a decentralized rate-based performance model enabling fast load balancing (in the order of seconds) while taming global SLO violations. We integrate Kurma with Cassandra, a popular storage system. Using real-world traces along with a geo-distributed deployment across Amazon EC2, we demonstrate Kurma’s ability to effectively share load among datacenters while reducing SLO violations by up to a factor of 3 in high load settings or reducing the cost of running the service by up to 17%.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2018
Keywords
Cloud Computing, Distributed Systems, Server Load Balancing, Service Level Objectives, Wide Area Networks
National Category
Communication Systems
Identifiers
urn:nbn:se:kth:diva-241481 (URN)10.1145/3267809.3267820 (DOI)000458692200031 ()2-s2.0-85059006718 (Scopus ID)9781450360111 (ISBN)
Conference
2018 ACM Symposium on Cloud Computing, SoCC 2018, Carlsbad, United States, 11 October 2018 through 13 October 2018
Funder
EU, Horizon 2020, 770889Swedish Foundation for Strategic Research
Note

QC 20190123

Available from: 2019-01-23 Created: 2019-01-23 Last updated: 2024-03-18Bibliographically approved
Bogdanov, K., Reda, W., Kostic, D., Maguire Jr., G. Q. & Canini, M. (2018). Kurma: Fast and Efficient Load Balancing for Geo-Distributed Storage Systems: Evaluation of Convergence and Scalability.
Open this publication in new window or tab >>Kurma: Fast and Efficient Load Balancing for Geo-Distributed Storage Systems: Evaluation of Convergence and Scalability
Show others...
2018 (English)Report (Other academic)
Abstract [en]

This report provides an extended evaluation of Kurma, a practical implementation of a geo-distributed load balancer for backend storage systems. In this report we demonstrate the ability of distributed Kurma instances to accurately converge to the same solutions within 1% of the total datacenter’s capacity and the ability of Kurma to scale up to 8 datacenters using a single CPU core at each datacenter.

National Category
Communication Systems
Identifiers
urn:nbn:se:kth:diva-222289 (URN)
Note

QR 20180212

Available from: 2018-02-05 Created: 2018-02-05 Last updated: 2022-06-26Bibliographically approved
Reda, W., Canini, M., Suresh, L., Kostic, D. & Braithwaite, S. (2017). Rein: Taming Tail Latency in Key-ValueStores via Multiget Scheduling. In: : . Paper presented at The Twelfth European Conference on Computer Systems (EuroSys).
Open this publication in new window or tab >>Rein: Taming Tail Latency in Key-ValueStores via Multiget Scheduling
Show others...
2017 (English)Conference paper, Published paper (Refereed)
Abstract [en]

We tackle the problem of reducing tail latencies in distributed key-value stores, such as the popular Cassandra database. We focus on workloads of multiget requests, which batch together access to several data elements and parallelize read operations across the data store machines. We first analyze a production trace of a real system and quantify the skew due to multiget sizes, key popularity, and other factors. We then proceed to identify opportunities for reduction of tail latencies by recognizing the composition of aggregate requests and by carefully scheduling bottleneck operations that can otherwise create excessive queues. We design and implement a system called Rein, which reduces latency via inter-multiget scheduling using low overhead techniques. We extensively evaluate Rein via experiments in Amazon Web Services (AWS) and simulations. Our scheduling algorithms reduce the median, 95th, and 99th percentile latencies by factors of 1.5, 1.5, and 1.9, respectively.

Keywords
key-value, distributed, storage, multiget, scheduling
National Category
Computer Sciences Communication Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-204646 (URN)10.1145/3064176.3064209 (DOI)000626240500007 ()2-s2.0-85019245076 (Scopus ID)
Conference
The Twelfth European Conference on Computer Systems (EuroSys)
Projects
TCC, WASP
Funder
Swedish Foundation for Strategic Research
Note

QC 20170502

Available from: 2017-03-30 Created: 2017-03-30 Last updated: 2024-03-15Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0001-5890-9629

Search in DiVA

Show all publications