kth.sePublications
Change search
Link to record
Permanent link

Direct link
Publications (3 of 3) Show all publications
Girondi, M., Scazzariello, M., Maguire Jr., G. Q. & Kostic, D. (2024). Toward GPU-centric Networking on Commodity Hardware. In: 7th International Workshop on Edge Systems, Analytics and Networking (EdgeSys 2024),  April 22, 2024, Athens, Greece: . Paper presented at 7th International Workshop on Edge Systems, Analytics and Networking (EdgeSys 2024), April 22, 2024, Athens, Greece . New York: ACM Digital Library
Open this publication in new window or tab >>Toward GPU-centric Networking on Commodity Hardware
2024 (English)In: 7th International Workshop on Edge Systems, Analytics and Networking (EdgeSys 2024),  April 22, 2024, Athens, Greece, New York: ACM Digital Library, 2024Conference paper, Published paper (Refereed)
Abstract [en]

GPUs are emerging as the most popular accelerator for many applications, powering the core of machine learning applications. In networked GPU-accelerated applications input & output data typically traverse the CPU and the OS network stack multiple times, getting copied across the system’s main memory. These transfers increase application latency and require expensive CPU cycles, reducing the system’s efficiency, and increasing the overall response times. These inefficiencies become of greater importance in latency-bounded deployments, or with high throughput, where copy times could quickly inflate the response time of modern GPUs.We leverage the efficiency and kernel-bypass benefits of RDMA to transfer data in and out of GPUs without using any CPU cycles or synchronization. We demonstrate the ability of modern GPUs to saturate a 100-Gbps link, and evaluate the network processing timein the context of an inference serving application.

Place, publisher, year, edition, pages
New York: ACM Digital Library, 2024
Keywords
GPUs, Commodity Hardware, Inference Serving, RDMA
National Category
Communication Systems Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-345624 (URN)10.1145/3642968.3654820 (DOI)001234771200008 ()2-s2.0-85192024363 (Scopus ID)
Conference
7th International Workshop on Edge Systems, Analytics and Networking (EdgeSys 2024), April 22, 2024, Athens, Greece 
Note

QC 20240415

Part of ISBN 979-8-4007-0539-7

Available from: 2024-04-15 Created: 2024-04-15 Last updated: 2024-08-28Bibliographically approved
Ghasemirahni, H., Barbette, T., Katsikas, G. P., Farshin, A., Roozbeh, A., Girondi, M., . . . Kostic, D. (2022). Packet Order Matters! Improving Application Performance by Deliberately Delaying Packets. In: Proceedings of the 19th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2022: . Paper presented at 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI), APR 04-06, 2022, Renton, WA (pp. 807-827). USENIX - The Advanced Computing Systems Association
Open this publication in new window or tab >>Packet Order Matters! Improving Application Performance by Deliberately Delaying Packets
Show others...
2022 (English)In: Proceedings of the 19th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2022, USENIX - The Advanced Computing Systems Association, 2022, p. 807-827Conference paper, Published paper (Refereed)
Abstract [en]

Data centers increasingly deploy commodity servers with high-speed network interfaces to enable low-latency communication. However, achieving low latency at high data rates crucially depends on how the incoming traffic interacts with the system's caches. When packets that need to be processed in the same way are consecutive, i.e., exhibit high temporal and spatial locality, caches deliver great benefits.

In this paper, we systematically study the impact of temporal and spatial traffic locality on the performance of commodity servers equipped with high-speed network interfaces. Our results show that (i) the performance of a variety of widely deployed applications degrades substantially with even the slightest lack of traffic locality, and (ii) a traffic trace from our organization reveals poor traffic locality as networking protocols, drivers, and the underlying switching/routing fabric spread packets out in time (reducing locality). To address these issues, we built Reframer, a software solution that deliberately delays packets and reorders them to increase traffic locality. Despite introducing μs-scale delays of some packets, we show that Reframer increases the throughput of a network service chain by up to 84% and reduces the flow completion time of a web server by 11% while improving its throughput by 20%.

Place, publisher, year, edition, pages
USENIX - The Advanced Computing Systems Association, 2022
Keywords
packet ordering, spatial and temporal locality, packet scheduling, batch processing, high-speed networking
National Category
Communication Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-304656 (URN)000876762200046 ()2-s2.0-85140983450 (Scopus ID)
Conference
19th USENIX Symposium on Networked Systems Design and Implementation (NSDI), APR 04-06, 2022, Renton, WA
Projects
ULTRAWASPTime-Critical Clouds
Funder
Swedish Foundation for Strategic ResearchKnut and Alice Wallenberg FoundationEU, European Research Council
Note

QC 20230619

Available from: 2021-11-09 Created: 2021-11-09 Last updated: 2023-06-19Bibliographically approved
Girondi, M., Chiesa, M. & Barbette, T. (2021). High-speed Connection Tracking in Modern Servers. In: 2021 IEEE 22nd International Conference on High Performance Switching and Routing (HPSR) (IEEE HPSR'21): . Paper presented at IEEE HPSR 2021.
Open this publication in new window or tab >>High-speed Connection Tracking in Modern Servers
2021 (English)In: 2021 IEEE 22nd International Conference on High Performance Switching and Routing (HPSR) (IEEE HPSR'21), 2021Conference paper, Published paper (Refereed)
Abstract [en]

The rise of commodity servers equipped with high-speed network interface cards poses increasing demands on the efficient implementation of connection tracking, i.e., the task of associating the connection identifier of an incoming packet to the state stored for that connection. In this work, we thoroughly investigate and compare the performance obtainable by different implementations of connection tracking using high-speed real traffic traces. Based on a load balancer use case, our results show that connection tracking is an expensive operation, achieving at most 24 Gbps on a single core. Core-sharding and lock-free hash tables emerge as the only suitable multi-thread approaches for enabling 100 Gbps packet processing. In contrast to recent beliefs, we observe that newly proposed techniques to "lazily" delete connection states are not more effective than properly tuned traditional deletion techniques based on timer wheels.

Keywords
connection tracking, load balancer, fastclick, hash tables, packet classification
National Category
Communication Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-295413 (URN)10.1109/HPSR52026.2021.9481841 (DOI)000806753600048 ()2-s2.0-85113858387 (Scopus ID)
Conference
IEEE HPSR 2021
Projects
Time Critical Cloud
Funder
Swedish Foundation for Strategic Research, TCCEuropean Commission, 770889
Note

QC 20220627

Available from: 2021-05-20 Created: 2021-05-20 Last updated: 2022-06-27Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-9400-324X

Search in DiVA

Show all publications