kth.sePublications
Change search
Link to record
Permanent link

Direct link
Publications (10 of 13) Show all publications
Barbette, T., Wu, E., Kostic, D., Maguire Jr., G. Q., Papadimitratos, P. & Chiesa, M. (2022). Cheetah: A High-Speed Programmable Load-Balancer Framework with Guaranteed Per-Connection-Consistency. IEEE/ACM Transactions on Networking, 30(1), 354-367
Open this publication in new window or tab >>Cheetah: A High-Speed Programmable Load-Balancer Framework with Guaranteed Per-Connection-Consistency
Show others...
2022 (English)In: IEEE/ACM Transactions on Networking, ISSN 1063-6692, E-ISSN 1558-2566, Vol. 30, no 1, p. 354-367Article in journal (Refereed) Published
Abstract [en]

Large service providers use load balancers to dispatch millions of incoming connections per second towards thousands of servers. There are two basic yet critical requirements for a load balancer: uniform load distribution of the incoming connections across the servers, which requires to support advanced load balancing mechanisms, and per-connection-consistency (PCC), i.e, the ability to map packets belonging to the same connection to the same server even in the presence of changes in the number of active servers and load balancers. Yet, simultaneously meeting these requirements has been an elusive goal. Today's load balancers minimize PCC violations at the price of non-uniform load distribution. This paper presents Cheetah, a load balancer that supports advanced load balancing mechanisms and PCC while being scalable, memory efficient, fast at processing packets, and offers comparable resilience to clogging attacks as with today's load balancers. The Cheetah LB design guarantees PCC for any realizable server selection load balancing mechanism and can be deployed in both stateless and stateful manners, depending on operational needs. We implemented Cheetah on both a software and a Tofino-based hardware switch. Our evaluation shows that a stateless version of Cheetah guarantees PCC, has negligible packet processing overheads, and can support load balancing mechanisms that reduce the flow completion time by a factor of 2-3 ×.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2022
Keywords
Cloud networks, Layer 4 load balancing, P4, Per-connection-consistency, Programmable networks, QUIC, Stateful classification, Stateless load balancing, TCP, Electric power plant loads, Network layers, Servers, Load modeling, Load-Balancing, Programmable network, QUIC., Resilience, Hash functions
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-312304 (URN)10.1109/TNET.2021.3113370 (DOI)000732385800001 ()2-s2.0-85116873307 (Scopus ID)
Note

QC 20220530

Available from: 2022-05-30 Created: 2022-05-30 Last updated: 2022-06-25Bibliographically approved
Ghasemirahni, H., Barbette, T., Katsikas, G. P., Farshin, A., Roozbeh, A., Girondi, M., . . . Kostic, D. (2022). Packet Order Matters! Improving Application Performance by Deliberately Delaying Packets. In: Proceedings of the 19th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2022: . Paper presented at 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI), APR 04-06, 2022, Renton, WA (pp. 807-827). USENIX - The Advanced Computing Systems Association
Open this publication in new window or tab >>Packet Order Matters! Improving Application Performance by Deliberately Delaying Packets
Show others...
2022 (English)In: Proceedings of the 19th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2022, USENIX - The Advanced Computing Systems Association, 2022, p. 807-827Conference paper, Published paper (Refereed)
Abstract [en]

Data centers increasingly deploy commodity servers with high-speed network interfaces to enable low-latency communication. However, achieving low latency at high data rates crucially depends on how the incoming traffic interacts with the system's caches. When packets that need to be processed in the same way are consecutive, i.e., exhibit high temporal and spatial locality, caches deliver great benefits.

In this paper, we systematically study the impact of temporal and spatial traffic locality on the performance of commodity servers equipped with high-speed network interfaces. Our results show that (i) the performance of a variety of widely deployed applications degrades substantially with even the slightest lack of traffic locality, and (ii) a traffic trace from our organization reveals poor traffic locality as networking protocols, drivers, and the underlying switching/routing fabric spread packets out in time (reducing locality). To address these issues, we built Reframer, a software solution that deliberately delays packets and reorders them to increase traffic locality. Despite introducing μs-scale delays of some packets, we show that Reframer increases the throughput of a network service chain by up to 84% and reduces the flow completion time of a web server by 11% while improving its throughput by 20%.

Place, publisher, year, edition, pages
USENIX - The Advanced Computing Systems Association, 2022
Keywords
packet ordering, spatial and temporal locality, packet scheduling, batch processing, high-speed networking
National Category
Communication Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-304656 (URN)000876762200046 ()2-s2.0-85140983450 (Scopus ID)
Conference
19th USENIX Symposium on Networked Systems Design and Implementation (NSDI), APR 04-06, 2022, Renton, WA
Projects
ULTRAWASPTime-Critical Clouds
Funder
Swedish Foundation for Strategic ResearchKnut and Alice Wallenberg FoundationEU, European Research Council
Note

QC 20230619

Available from: 2021-11-09 Created: 2021-11-09 Last updated: 2023-06-19Bibliographically approved
Barbette, T., Soldani, C. & Mathy, L. (2021). Combined stateful classification and session splicing for high-speed NFV service chaining. IEEE/ACM Transactions on Networking, 29(6), 2560-2573
Open this publication in new window or tab >>Combined stateful classification and session splicing for high-speed NFV service chaining
2021 (English)In: IEEE/ACM Transactions on Networking, ISSN 1063-6692, E-ISSN 1558-2566, Vol. 29, no 6, p. 2560-2573Article in journal (Refereed) Published
Abstract [en]

Network functions such as firewalls, NAT, DPI, content-aware optimizers, and load-balancers are increasingly realized as software to reduce costs and enable outsourcing. To meet performance requirements these virtual network functions (VNFs) often bypass the kernel and use their own user-space networking stack. A naïve realization of a chain of VNFs will exchange raw packets, leading to many redundant operations, wasting resources. In this work, we design a system to execute a pipeline of VNFs. We provide the user facilities to define (i) a traffic class of interest for the VNF, (ii) a session to group the packets (such as the TCP 4-tuple), and (iii) the amount of space per session. The system synthesizes a classifier and builds an efficient flow table that when possible will automatically be partially offloaded and accelerated by the network interface. We utilize an abstract view of flows to support seamless inspection and modification of the content of any flow (such as TCP or HTTP). By applying only surgical modifications to the protocol headers, we avoid the need for a complex, hard-to-maintain user-space TCP stack and can chain multiple VNFs without re-constructing the stream multiple times, allowing up to 5x improvement over standard approaches.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2021
Keywords
Middleboxes, Protocols, Monitoring, Software, Payloads, Technological innovation, Splicing, Computer networks, network function virtualization, internet, middleboxes
National Category
Communication Systems Computer Systems
Research subject
Telecommunication; Information and Communication Technology
Identifiers
urn:nbn:se:kth:diva-299200 (URN)10.1109/TNET.2021.3099240 (DOI)000731147300017 ()2-s2.0-85112647277 (Scopus ID)
Projects
ePIULTRA
Funder
European Commission, 770889
Note

QC 20250429

Available from: 2021-08-04 Created: 2021-08-04 Last updated: 2025-04-29Bibliographically approved
Girondi, M., Chiesa, M. & Barbette, T. (2021). High-speed Connection Tracking in Modern Servers. In: 2021 IEEE 22nd International Conference on High Performance Switching and Routing (HPSR) (IEEE HPSR'21): . Paper presented at IEEE HPSR 2021.
Open this publication in new window or tab >>High-speed Connection Tracking in Modern Servers
2021 (English)In: 2021 IEEE 22nd International Conference on High Performance Switching and Routing (HPSR) (IEEE HPSR'21), 2021Conference paper, Published paper (Refereed)
Abstract [en]

The rise of commodity servers equipped with high-speed network interface cards poses increasing demands on the efficient implementation of connection tracking, i.e., the task of associating the connection identifier of an incoming packet to the state stored for that connection. In this work, we thoroughly investigate and compare the performance obtainable by different implementations of connection tracking using high-speed real traffic traces. Based on a load balancer use case, our results show that connection tracking is an expensive operation, achieving at most 24 Gbps on a single core. Core-sharding and lock-free hash tables emerge as the only suitable multi-thread approaches for enabling 100 Gbps packet processing. In contrast to recent beliefs, we observe that newly proposed techniques to "lazily" delete connection states are not more effective than properly tuned traditional deletion techniques based on timer wheels.

Keywords
connection tracking, load balancer, fastclick, hash tables, packet classification
National Category
Communication Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-295413 (URN)10.1109/HPSR52026.2021.9481841 (DOI)000806753600048 ()2-s2.0-85113858387 (Scopus ID)
Conference
IEEE HPSR 2021
Projects
Time Critical Cloud
Funder
Swedish Foundation for Strategic Research, TCCEuropean Commission, 770889
Note

QC 20220627

Available from: 2021-05-20 Created: 2021-05-20 Last updated: 2022-06-27Bibliographically approved
Katsikas, G. P., Barbette, T., Kostic, D., Maguire Jr., G. Q. & Steinert, R. (2021). Metron: High-Performance NFV Service Chaining Even in the Presence of Blackboxes. ACM Transactions on Computer Systems, 38(1-2), 1-45, Article ID 3.
Open this publication in new window or tab >>Metron: High-Performance NFV Service Chaining Even in the Presence of Blackboxes
Show others...
2021 (English)In: ACM Transactions on Computer Systems, ISSN 0734-2071, E-ISSN 1557-7333, Vol. 38, no 1-2, p. 1-45, article id 3Article in journal (Refereed) Published
Abstract [en]

Deployment of 100 Gigabit Ethernet (GbE) links challenges the packet processing limits of commodity hardware used for Network Functions Virtualization (NFV). Moreover, realizing chained network functions (i.e., service chains) necessitates the use of multiple CPU cores, or even multiple servers, to process packets from such high speed links.

Our system Metron jointly exploits the underlying network and commodity servers' resources: (i) to offload part of the packet processing logic to the network, (ii) by using smart tagging to setup and exploit the affinity of traffic classes, and (iii) by using tag-based hardware dispatching to carry out the remaining packet processing at the speed of the servers' cores, with zero inter-core communication. Moreover, Metron transparently integrates, manages, and load balances proprietary "blackboxes" together with Metron service chains.

Metron realizes stateful network functions at the speed of 100 GbE network cards on a single server, while elastically and rapidly adapting to changing workload volumes. Our experiments demonstrate that Metron service chains can coexist with heterogeneous blackboxes, while still leveraging Metron's accurate dispatching and load balancing. In summary, Metron has (i) 2.75-8× better efficiency, up to (ii) 4.7× lower latency, and (iii) 7.8× higher throughput than OpenBox, a state-of-the-art NFV system.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2021
Keywords
elasticity, service chains, hardware offloading, accurate dispatching, 100 GbE, load balancing, tagging, blackboxes, NFV
National Category
Communication Systems Computer Sciences
Identifiers
urn:nbn:se:kth:diva-298691 (URN)10.1145/3465628 (DOI)000679809300003 ()2-s2.0-85111657554 (Scopus ID)
Projects
European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 770889)Swedish Foundation for Strategic Research (SSF)
Note

QC 20210712

Available from: 2021-07-11 Created: 2021-07-11 Last updated: 2024-03-15
Farshin, A., Barbette, T., Roozbeh, A., Maguire Jr., G. Q. & Kostic, D. (2021). PacketMill: Toward Per-Core 100-Gbps Networking. In: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS): . Paper presented at 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’21), 19–23 April, 2021, Virtual/Online. ACM Digital Library
Open this publication in new window or tab >>PacketMill: Toward Per-Core 100-Gbps Networking
Show others...
2021 (English)In: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), ACM Digital Library, 2021Conference paper, Published paper (Refereed)
Abstract [en]

We present PacketMill, a system for optimizing software packet processing, which (i) introduces a new model to efficiently manage packet metadata and (ii) employs code-optimization techniques to better utilize commodity hardware. PacketMill grinds the whole packet processing stack, from the high-level network function configuration file to the low-level userspace network (specifically DPDK) drivers, to mitigate inefficiencies and produce a customized binary for a given network function. Our evaluation results show that PacketMill increases throughput (up to 36.4 Gbps -- 70%) & reduces latency (up to 101 us -- 28%) and enables nontrivial packet processing (e.g., router) at ~100 Gbps, when new packets arrive >10× faster than main memory access times, while using only one processing core.

Place, publisher, year, edition, pages
ACM Digital Library, 2021
Keywords
PacketMill, X-Change, Packet Processing, Metadata Management, 100-Gbps Networking, Middleboxes, Commodity Hardware, LLVM, Compiler Optimizations, Full-Stack Optimization, FastClick, DPDK.
National Category
Communication Systems
Identifiers
urn:nbn:se:kth:diva-289665 (URN)10.1145/3445814.3446724 (DOI)000829871000001 ()2-s2.0-85104694209 (Scopus ID)
Conference
26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’21), 19–23 April, 2021, Virtual/Online
Projects
Time-Critical CloudsULTRAWASP
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)Swedish Foundation for Strategic ResearchEU, Horizon 2020, 770889
Note

Part of proceedings: ISBN 978-1-4503-8317-2

QC 20210210

Available from: 2021-02-10 Created: 2021-02-10 Last updated: 2024-03-15Bibliographically approved
Katsikas, G. P., Barbette, T., Chiesa, M., Kostic, D. & Maguire Jr., G. Q. (2021). What you need to know about (Smart) Network Interface Cards. In: Springer International Publishing (Ed.), Proceedings Passive and Active Measurement - 22nd International Conference, PAM 2021: . Paper presented at Passive and Active Measurement - 22nd International Conference, PAM 2021, Virtual Event, March 29 - April 1, 2021. Springer Nature
Open this publication in new window or tab >>What you need to know about (Smart) Network Interface Cards
Show others...
2021 (English)In: Proceedings Passive and Active Measurement - 22nd International Conference, PAM 2021 / [ed] Springer International Publishing, Springer Nature , 2021Conference paper, Published paper (Refereed)
Abstract [en]

Network interface cards (NICs) are fundamental componentsof modern high-speed networked systems, supporting multi-100 Gbpsspeeds and increasing programmability. Offloading computation from aserver’s CPU to a NIC frees a substantial amount of the server’s CPU resources, making NICs key to offer competitive cloud services.

Therefore, understanding the performance benefits and limitations of offloading anetworking application to a NIC is of paramount importance.In this paper, we measure the performance of four different NICs fromone of the largest NIC vendors worldwide, supporting 100 Gbps and200 Gbps. We show that while today’s NICs can easily support multihundred-gigabit throughputs, performing frequent update operations ofa NIC’s packet classifier — as network address translators (NATs) andload balancers would do for each incoming connection — results in adramatic throughput reduction of up to 70 Gbps or complete denial ofservice. Our conclusion is that all tested NICs cannot support high-speednetworking applications that require keeping track of a large number offrequently arriving incoming connections. Furthermore, we show a variety of counter-intuitive performance artefacts including the performanceimpact of using multiple tables to classify flows of packets.

Place, publisher, year, edition, pages
Springer Nature, 2021
Series
Lecture Notes in Computer Science ; 12671
Keywords
Network interface cards, hardware classifier, offloading, rule operations, performance, benchmarking, 100 GbE
National Category
Computer Systems Communication Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-292353 (URN)10.1007/978-3-030-72582-2_19 (DOI)000788003900019 ()2-s2.0-85107297942 (Scopus ID)
Conference
Passive and Active Measurement - 22nd International Conference, PAM 2021, Virtual Event, March 29 - April 1, 2021
Funder
European Commission, 770889Swedish Foundation for Strategic Research, TCC
Note

QC 20220524

Available from: 2021-03-30 Created: 2021-03-30 Last updated: 2022-06-25
Barbette, T., Tang, C., Yao, H., Kostic, D., Maguire Jr., G. Q., Papadimitratos, P. & Chiesa, M. (2020). A High-Speed Load-Balancer Design with Guaranteed Per-Connection-Consistency. In: USENIX Association (Ed.), Proceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2020: . Paper presented at 17th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2020, Santa Clara, 25-27 February 2020 (pp. 667-683). Santa Clara, CA, USA: USENIX Association
Open this publication in new window or tab >>A High-Speed Load-Balancer Design with Guaranteed Per-Connection-Consistency
Show others...
2020 (English)In: Proceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2020 / [ed] USENIX Association, Santa Clara, CA, USA: USENIX Association , 2020, p. 667-683Conference paper, Published paper (Refereed)
Abstract [en]

Large service providers use load balancers to dispatch millions of incoming connections per second towards thousands of servers. There are two basic yet critical requirements for a load balancer: uniform load distribution of the incoming connections across the servers and per-connection-consistency (PCC), i.e., the ability to map packets belonging to the same connection to the same server even in the presence of changes in the number of active servers and load balancers. Yet, meeting both these requirements at the same time has been an elusive goal. Today's load balancers minimize PCC violations at the price of non-uniform load distribution.

This paper presents Cheetah, a load balancer that supports uniform load distribution and PCC while being scalable, memory efficient, resilient to clogging attacks, and fast at processing packets. The Cheetah LB design guarantees PCC for any realizable server selection load balancing mechanism and can be deployed in both a stateless and stateful manner, depending on the operational needs. We implemented Cheetah on both a software and a Tofino-based hardware switch. Our evaluation shows that a stateless version of Cheetah guarantees PCC, has negligible packet processing overheads, and can support load balancing mechanisms that reduce the flow completion time by a factor of 2–3×.

Place, publisher, year, edition, pages
Santa Clara, CA, USA: USENIX Association, 2020
Keywords
load-balancer, cheetah, high-speed, connection consistency, pcc, p4, fastclick
National Category
Communication Systems Computer Systems Computer Sciences
Research subject
Computer Science; Telecommunication
Identifiers
urn:nbn:se:kth:diva-268968 (URN)000570979600040 ()2-s2.0-85091845586 (Scopus ID)
Conference
17th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2020, Santa Clara, 25-27 February 2020
Funder
Swedish Foundation for Strategic Research , TCCEU, European Research Council, 770889
Note

Part of proceedings: ISBN 978-1-939133-13-7

QC 20200302

Available from: 2020-03-01 Created: 2020-03-01 Last updated: 2022-06-26Bibliographically approved
Barbette, T., Chiesa, M., Maguire Jr., G. Q. & Kostic, D. (2020). Stateless CPU-aware datacenter load-balancing. In: Poster: Stateless CPU-aware datacenter load-balancing: . Paper presented at International Conference on emerging Networking EXperiments and Technologies (pp. 548-549). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Stateless CPU-aware datacenter load-balancing
2020 (English)In: Poster: Stateless CPU-aware datacenter load-balancing, Association for Computing Machinery (ACM) , 2020, p. 548-549Conference paper, Poster (with or without abstract) (Refereed)
Abstract [en]

Today, datacenter operators deploy Load-balancers (LBs) to efficiently utilize server resources, but must over-provision server resources (by up to 30%) because of load imbalances and the desire to bound tail service latency. We posit one of the reasons for these imbalances is the lack of per-core load statistics in existing LBs. As a first step, we designed CrossRSS, a CPU core-aware LB that dynamically assigns incoming connections to the least loaded cores in the server pool. CrossRSS leverages knowledge of the dispatching by each server's Network Interface Card (NIC) to specific cores to reduce imbalances by more than an order of magnitude compared to existing LBs in a proof-of-concept datacenter environment, processing 12% more packets with the same number of cores.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2020
Series
CoNEXT '20
Keywords
networking, load-balancing, packet scheduling, high-speed networking, intra-server load-balancing, receive side scaling, network function virtualization, RSS++
National Category
Computer Systems Communication Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-286811 (URN)10.1145/3386367.3431672 (DOI)2-s2.0-85097602615 (Scopus ID)
Conference
International Conference on emerging Networking EXperiments and Technologies
Funder
EU, European Research Council, ULTRASwedish Foundation for Strategic Research , TCC
Note

QC 20210824

Available from: 2020-11-30 Created: 2020-11-30 Last updated: 2022-06-25Bibliographically approved
Katsikas, G. P., Barbette, T., Kostic, D., Steinert, R. & Maguire Jr., G. Q. (2019). Metron: NFV service chains at the true speed of the underlying hardware. In: : . Paper presented at Proceedings of the 15th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2018 (pp. 171-186).
Open this publication in new window or tab >>Metron: NFV service chains at the true speed of the underlying hardware
Show others...
2019 (English)Conference paper, Published paper (Refereed)
Abstract [en]

In this paper we present Metron, a Network Functions Virtualization (NFV) platform that achieves high resource utilization by jointly exploiting the underlying network and commodity servers’ resources. This synergy allows Metron to: (i) offload part of the packet processing logic to the network, (ii) use smart tagging to setup and exploit the affinity of traffic classes, and (iii) use tag-based hardware dispatching to carry out the remaining packet processing at the speed of the servers’ fastest cache(s), with zero intercore communication. Metron also introduces a novel resource allocation scheme that minimizes the resource allocation overhead for large-scale NFV deployments. With commodity hardware assistance, Metron deeply inspects traffic at 40 Gbps and realizes stateful network functions at the speed of a 100 GbE network card on a single server. Metron has 2.75-6.5x better efficiency than OpenBox, a state of the art NFV system, while ensuring key requirements such as elasticity, fine-grained load balancing, and flexible traffic steering

National Category
Communication Systems
Identifiers
urn:nbn:se:kth:diva-268276 (URN)000471023700012 ()2-s2.0-85076796090 (Scopus ID)
Conference
Proceedings of the 15th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2018
Note

QC 20200324

Available from: 2020-03-24 Created: 2020-03-24 Last updated: 2022-06-26Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0003-1269-2190

Search in DiVA

Show all publications