kth.sePublications KTH
Change search
Link to record
Permanent link

Direct link
Katsikas, Georgios P.ORCID iD iconorcid.org/0000-0002-3890-6583
Publications (8 of 8) Show all publications
Ghasemirahni, H., Barbette, T., Katsikas, G. P., Farshin, A., Roozbeh, A., Girondi, M., . . . Kostic, D. (2022). Packet Order Matters! Improving Application Performance by Deliberately Delaying Packets. In: Proceedings of the 19th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2022: . Paper presented at 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI), APR 04-06, 2022, Renton, WA (pp. 807-827). USENIX - The Advanced Computing Systems Association
Open this publication in new window or tab >>Packet Order Matters! Improving Application Performance by Deliberately Delaying Packets
Show others...
2022 (English)In: Proceedings of the 19th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2022, USENIX - The Advanced Computing Systems Association, 2022, p. 807-827Conference paper, Published paper (Refereed)
Abstract [en]

Data centers increasingly deploy commodity servers with high-speed network interfaces to enable low-latency communication. However, achieving low latency at high data rates crucially depends on how the incoming traffic interacts with the system's caches. When packets that need to be processed in the same way are consecutive, i.e., exhibit high temporal and spatial locality, caches deliver great benefits.

In this paper, we systematically study the impact of temporal and spatial traffic locality on the performance of commodity servers equipped with high-speed network interfaces. Our results show that (i) the performance of a variety of widely deployed applications degrades substantially with even the slightest lack of traffic locality, and (ii) a traffic trace from our organization reveals poor traffic locality as networking protocols, drivers, and the underlying switching/routing fabric spread packets out in time (reducing locality). To address these issues, we built Reframer, a software solution that deliberately delays packets and reorders them to increase traffic locality. Despite introducing μs-scale delays of some packets, we show that Reframer increases the throughput of a network service chain by up to 84% and reduces the flow completion time of a web server by 11% while improving its throughput by 20%.

Place, publisher, year, edition, pages
USENIX - The Advanced Computing Systems Association, 2022
Keywords
packet ordering, spatial and temporal locality, packet scheduling, batch processing, high-speed networking
National Category
Communication Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-304656 (URN)000876762200046 ()2-s2.0-85140983450 (Scopus ID)
Conference
19th USENIX Symposium on Networked Systems Design and Implementation (NSDI), APR 04-06, 2022, Renton, WA
Projects
ULTRAWASPTime-Critical Clouds
Funder
Swedish Foundation for Strategic ResearchKnut and Alice Wallenberg FoundationEU, European Research Council
Note

QC 20230619

Available from: 2021-11-09 Created: 2021-11-09 Last updated: 2023-06-19Bibliographically approved
Katsikas, G. P., Barbette, T., Kostic, D., Maguire Jr., G. Q. & Steinert, R. (2021). Metron: High-Performance NFV Service Chaining Even in the Presence of Blackboxes. ACM Transactions on Computer Systems, 38(1-2), 1-45, Article ID 3.
Open this publication in new window or tab >>Metron: High-Performance NFV Service Chaining Even in the Presence of Blackboxes
Show others...
2021 (English)In: ACM Transactions on Computer Systems, ISSN 0734-2071, E-ISSN 1557-7333, Vol. 38, no 1-2, p. 1-45, article id 3Article in journal (Refereed) Published
Abstract [en]

Deployment of 100 Gigabit Ethernet (GbE) links challenges the packet processing limits of commodity hardware used for Network Functions Virtualization (NFV). Moreover, realizing chained network functions (i.e., service chains) necessitates the use of multiple CPU cores, or even multiple servers, to process packets from such high speed links.

Our system Metron jointly exploits the underlying network and commodity servers' resources: (i) to offload part of the packet processing logic to the network, (ii) by using smart tagging to setup and exploit the affinity of traffic classes, and (iii) by using tag-based hardware dispatching to carry out the remaining packet processing at the speed of the servers' cores, with zero inter-core communication. Moreover, Metron transparently integrates, manages, and load balances proprietary "blackboxes" together with Metron service chains.

Metron realizes stateful network functions at the speed of 100 GbE network cards on a single server, while elastically and rapidly adapting to changing workload volumes. Our experiments demonstrate that Metron service chains can coexist with heterogeneous blackboxes, while still leveraging Metron's accurate dispatching and load balancing. In summary, Metron has (i) 2.75-8× better efficiency, up to (ii) 4.7× lower latency, and (iii) 7.8× higher throughput than OpenBox, a state-of-the-art NFV system.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2021
Keywords
elasticity, service chains, hardware offloading, accurate dispatching, 100 GbE, load balancing, tagging, blackboxes, NFV
National Category
Communication Systems Computer Sciences
Identifiers
urn:nbn:se:kth:diva-298691 (URN)10.1145/3465628 (DOI)000679809300003 ()2-s2.0-85111657554 (Scopus ID)
Projects
European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 770889)Swedish Foundation for Strategic Research (SSF)
Note

QC 20210712

Available from: 2021-07-11 Created: 2021-07-11 Last updated: 2024-03-15
Katsikas, G. P., Barbette, T., Chiesa, M., Kostic, D. & Maguire Jr., G. Q. (2021). What you need to know about (Smart) Network Interface Cards. In: Springer International Publishing (Ed.), Proceedings Passive and Active Measurement - 22nd International Conference, PAM 2021: . Paper presented at Passive and Active Measurement - 22nd International Conference, PAM 2021, Virtual Event, March 29 - April 1, 2021. Springer Nature
Open this publication in new window or tab >>What you need to know about (Smart) Network Interface Cards
Show others...
2021 (English)In: Proceedings Passive and Active Measurement - 22nd International Conference, PAM 2021 / [ed] Springer International Publishing, Springer Nature , 2021Conference paper, Published paper (Refereed)
Abstract [en]

Network interface cards (NICs) are fundamental componentsof modern high-speed networked systems, supporting multi-100 Gbpsspeeds and increasing programmability. Offloading computation from aserver’s CPU to a NIC frees a substantial amount of the server’s CPU resources, making NICs key to offer competitive cloud services.

Therefore, understanding the performance benefits and limitations of offloading anetworking application to a NIC is of paramount importance.In this paper, we measure the performance of four different NICs fromone of the largest NIC vendors worldwide, supporting 100 Gbps and200 Gbps. We show that while today’s NICs can easily support multihundred-gigabit throughputs, performing frequent update operations ofa NIC’s packet classifier — as network address translators (NATs) andload balancers would do for each incoming connection — results in adramatic throughput reduction of up to 70 Gbps or complete denial ofservice. Our conclusion is that all tested NICs cannot support high-speednetworking applications that require keeping track of a large number offrequently arriving incoming connections. Furthermore, we show a variety of counter-intuitive performance artefacts including the performanceimpact of using multiple tables to classify flows of packets.

Place, publisher, year, edition, pages
Springer Nature, 2021
Series
Lecture Notes in Computer Science ; 12671
Keywords
Network interface cards, hardware classifier, offloading, rule operations, performance, benchmarking, 100 GbE
National Category
Computer Systems Communication Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-292353 (URN)10.1007/978-3-030-72582-2_19 (DOI)000788003900019 ()2-s2.0-85107297942 (Scopus ID)
Conference
Passive and Active Measurement - 22nd International Conference, PAM 2021, Virtual Event, March 29 - April 1, 2021
Funder
European Commission, 770889Swedish Foundation for Strategic Research, TCC
Note

QC 20220524

Available from: 2021-03-30 Created: 2021-03-30 Last updated: 2022-06-25
Katsikas, G. P., Barbette, T., Kostic, D., Steinert, R. & Maguire Jr., G. Q. (2019). Metron: NFV service chains at the true speed of the underlying hardware. In: : . Paper presented at Proceedings of the 15th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2018 (pp. 171-186).
Open this publication in new window or tab >>Metron: NFV service chains at the true speed of the underlying hardware
Show others...
2019 (English)Conference paper, Published paper (Refereed)
Abstract [en]

In this paper we present Metron, a Network Functions Virtualization (NFV) platform that achieves high resource utilization by jointly exploiting the underlying network and commodity servers’ resources. This synergy allows Metron to: (i) offload part of the packet processing logic to the network, (ii) use smart tagging to setup and exploit the affinity of traffic classes, and (iii) use tag-based hardware dispatching to carry out the remaining packet processing at the speed of the servers’ fastest cache(s), with zero intercore communication. Metron also introduces a novel resource allocation scheme that minimizes the resource allocation overhead for large-scale NFV deployments. With commodity hardware assistance, Metron deeply inspects traffic at 40 Gbps and realizes stateful network functions at the speed of a 100 GbE network card on a single server. Metron has 2.75-6.5x better efficiency than OpenBox, a state of the art NFV system, while ensuring key requirements such as elasticity, fine-grained load balancing, and flexible traffic steering

National Category
Communication Systems
Identifiers
urn:nbn:se:kth:diva-268276 (URN)000471023700012 ()2-s2.0-85076796090 (Scopus ID)
Conference
Proceedings of the 15th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2018
Note

QC 20200324

Available from: 2020-03-24 Created: 2020-03-24 Last updated: 2022-06-26Bibliographically approved
Barbette, T., Katsikas, G. P., Maguire Jr., G. Q. & Kostic, D. (2019). RSS++: load and state-aware receive side scaling. In: ACM (Ed.), Proceedings of the 15th International Conference on emerging Networking EXperiments and Technologies: . Paper presented at CoNEXT '19: The 15th International Conference on Emerging Networking Experiments And Technologies, Orlando, United States, 9-12 December 2019. Orlando, FL, USA: Association for Computing Machinery (ACM)
Open this publication in new window or tab >>RSS++: load and state-aware receive side scaling
2019 (English)In: Proceedings of the 15th International Conference on emerging Networking EXperiments and Technologies / [ed] ACM, Orlando, FL, USA: Association for Computing Machinery (ACM), 2019Conference paper, Published paper (Refereed)
Abstract [en]

While the current literature typically focuses on load-balancing among multiple servers, in this paper, we demonstrate the importance of load-balancing within a single machine (potentially with hundreds of CPU cores). In this context, we propose a new load-balancing technique (RSS++) that dynamically modifies the receive side scaling (RSS) indirection table to spread the load across the CPU cores in a more optimal way. RSS++ incurs up to 14x lower 95th percentile tail latency and orders of magnitude fewer packet drops compared to RSS under high CPU utilization. RSS++ allows higher CPU utilization and dynamic scaling of the number of allocated CPU cores to accommodate the input load, while avoiding the typical 25% over-provisioning. RSS++ has been implemented for both (i) DPDK and (ii) the Linux kernel. Additionally, we implement a new state migration technique, which facilitates sharding and reduces contention between CPU cores accessing per-flow data. RSS++ keeps the flow-state by groups that can be migrated at once, leading to a 20% higher efficiency than a state of the art shared flow table.

Place, publisher, year, edition, pages
Orlando, FL, USA: Association for Computing Machinery (ACM), 2019
Keywords
networking, load-balancing, packet scheduling, high-speed networking, intra-server load-balancing, receive side scaling, network function virtualization, RSS++
National Category
Communication Systems Computer Systems Computer Sciences
Research subject
Information and Communication Technology; Computer Science
Identifiers
urn:nbn:se:kth:diva-263941 (URN)10.1145/3359989.3365412 (DOI)000526082300028 ()2-s2.0-85077231875 (Scopus ID)
Conference
CoNEXT '19: The 15th International Conference on Emerging Networking Experiments And Technologies, Orlando, United States, 9-12 December 2019
Funder
Swedish Foundation for Strategic Research, TCCEU, European Research Council, 770889
Note

QC 20191126

Part of ISBN 978-1-4503-6998-5

Available from: 2019-11-20 Created: 2019-11-20 Last updated: 2024-10-23Bibliographically approved
Katsikas, G. P., Barbette, T., Kostic, D., Steinert, R. & Maguire Jr., G. Q. (2018). Metron: NFV Service Chains at the True Speed of the Underlying Hardware. In: : . Paper presented at The 15th USENIX Symposium on Networked Systems Design and Implementation.
Open this publication in new window or tab >>Metron: NFV Service Chains at the True Speed of the Underlying Hardware
Show others...
2018 (English)Conference paper, Published paper (Refereed)
Abstract [en]

In this paper we present Metron, a Network Functions Virtualization (NFV) platform that achieves high resource utilization by jointly exploiting the underlying network and commodity servers’ resources. This synergy allows Metron to: (i) offload part of the packet processing logic to the network, (ii) use smart tagging to setup and exploit the affinity of traffic classes, and (iii) use tag-based hardware dispatching to carry out the remaining packet processing at the speed of the servers’ fastest cache(s), with zero inter-core communication. Metron also introduces a novel resource allocation scheme that minimizes the resource allocation overhead for large-scale NFV deployments. With commodity hardware assistance, Metron deeply inspects traffic at 40 Gbps and realizes stateful network functions at the speed of a 100 GbE network card on a single server. Metron has 2.75-6.5x better efficiency than OpenBox, a state of the art NFV system, while ensuring key requirements such as elasticity, fine-grained load balancing, and flexible traffic steering.

Keywords
NFV, service chains, offloading, hardware dispatching, high performance
National Category
Computer Sciences Communication Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-223543 (URN)
Conference
The 15th USENIX Symposium on Networked Systems Design and Implementation
Projects
Time-Critical CloudsWASP
Funder
Swedish Foundation for Strategic Research Knut and Alice Wallenberg Foundation
Available from: 2018-02-22 Created: 2018-02-22 Last updated: 2024-03-15Bibliographically approved
Katsikas, G. P., Maguire Jr., G. Q. & Kostic, D. (2017). Profiling and accelerating commodity NFV service chains with SCC. Journal of Systems and Software, 127(C), 12-27
Open this publication in new window or tab >>Profiling and accelerating commodity NFV service chains with SCC
2017 (English)In: Journal of Systems and Software, ISSN 0164-1212, E-ISSN 1873-1228, Vol. 127, no C, p. 12-27Article in journal (Refereed) Published
Abstract [en]

Recent approaches to network functions virtualization (NFV) have shown that commodity network stacks and drivers struggle to keep up with increasing hardware speed. Despite this, popular cloud networking services still rely on commodity operating systems (OSs) and device drivers.

 

Taking into account the hardware underlying of commodity servers, we built an NFV profiler that tracks the movement of packets across the system’s memory hierarchy by collecting key hardware and OS-level performance counters.

 

Leveraging the profiler’s data, our Service Chain Coordinator’s (SCC) runtime accelerates user-space NFV service chains, based on commodity drivers. To do so, SCC combines multiplexing of system calls with scheduling strategies, taking time, priority, and processing load into account.

 

By granting longer time quanta to chained network functions (NFs), combined with I/O multiplexing, SCC reduces unnecessary scheduling and I/O overheads, resulting in three-fold latency reduction due to cache and main memory utilization improvements. More importantly, SCC reduces the latency variance of NFV service chains by up to 40x compared to standard FastClick chains by making the average case for an NFV chain to perform as well as the best case. These improvements are possible because of our profiler’s accuracy.

Place, publisher, year, edition, pages
Elsevier, 2017
Keywords
NFV, service chains, profiler, scheduling, I/O multiplexing.
National Category
Engineering and Technology
Research subject
Computer Science; Information and Communication Technology
Identifiers
urn:nbn:se:kth:diva-199894 (URN)10.1016/j.jss.2017.01.005 (DOI)000397689000002 ()2-s2.0-85012296964 (Scopus ID)
Projects
European Research Council (ERC) PROPHETEuropean Union Horizon 2020 BEhavioural BAsed forwarding (BEBA)
Funder
EU, European Research Council, 259110EU, Horizon 2020, 644122
Note

QC 20170316

Available from: 2017-02-02 Created: 2017-01-17 Last updated: 2024-03-15Bibliographically approved
Katsikas, G. P., Enguehard, M., Kuźniar, M., Maguire Jr, G. Q. & Kostic, D. (2016). SNF: synthesizing high performance NFV service chains. PeerJ Computer Science, 1-30
Open this publication in new window or tab >>SNF: synthesizing high performance NFV service chains
Show others...
2016 (English)In: PeerJ Computer Science, ISSN 2376-5992, p. 1-30Article in journal (Refereed) Published
Abstract [en]

In this paper we introduce SNF, a framework that synthesizes (S) network function (NF) service chains by eliminating redundant I/O and repeated elements, while consolidating stateful cross layer packet operations across the chain. SNF uses graph composition and set theory to determine traffic classes handled by a service chain composed of multiple elements. It then synthesizes each traffic class using a minimal set of new elements that apply single-read-single-write and early-discard operations. Our SNF prototype takes a baseline state of the art network functions virtualization (NFV) framework to the level of performance required for practical NFV service deployments. Software-based SNF realizes long (up to 10 NFs) and stateful service chains that achieve line-rate 40 Gbps throughput (up to 8.5x greater than the baseline NFV framework). Hardware-assisted SNF, using a commodity OpenFlow switch, shows that our approach scales at 40 Gbps for Internet Service Provider-level NFV deployments.

Place, publisher, year, edition, pages
PeerJ, Inc. San Diego CA 92191, San Francisco, USA: , 2016
Keywords
NFV, Service chains, Synthesis, Single-read-single-write, Line-rate, 40 Gbps
National Category
Computer Systems Communication Systems
Research subject
Computer Science; Information and Communication Technology
Identifiers
urn:nbn:se:kth:diva-196219 (URN)10.7717/peerj-cs.98 (DOI)000437460800004 ()2-s2.0-85015574186 (Scopus ID)
Projects
European Union Horizon 2020 BEhavioural BAsed forwarding (BEBA)European Research Council (ERC) PROPHET
Funder
EU, Horizon 2020, 644122EU, European Research Council, 259110
Note

QC 20170626

Available from: 2016-11-14 Created: 2016-11-14 Last updated: 2024-03-15Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-3890-6583

Search in DiVA

Show all publications