kth.sePublications
Change search
Link to record
Permanent link

Direct link
Publications (6 of 6) Show all publications
Sheikholeslami, S., Ghasemirahni, H., Payberah, A. H., Wang, T., Dowling, J. & Vlassov, V. (2025). Utilizing Large Language Models for Ablation Studies in Machine Learning and Deep Learning. In: : . Paper presented at The 5th Workshop on Machine Learning and Systems (EuroMLSys), co-located with the 20th European Conference on Computer Systems (EuroSys). ACM Digital Library
Open this publication in new window or tab >>Utilizing Large Language Models for Ablation Studies in Machine Learning and Deep Learning
Show others...
2025 (English)Conference paper, Published paper (Refereed)
Abstract [en]

In Machine Learning (ML) and Deep Learning (DL) research, ablation studies are typically performed to provide insights into the individual contribution of different building blocks and components of an ML/DL system (e.g., a deep neural network), as well as to justify that certain additions or modifications to an existing ML/DL system can result in the proposed improved performance. Although dedicated frameworks for performing ablation studies have been introduced in recent years, conducting such experiments is still associated with requiring tedious, redundant work, typically involving maintaining redundant and nearly identical versions of code that correspond to different ablation trials. Inspired by the recent promising performance of Large Language Models (LLMs) in the generation and analysis of ML/DL code, in this paper we discuss the potential of LLMs as facilitators of ablation study experiments for scientific research projects that involve or deal with ML and DL models. We first discuss the different ways in which LLMs can be utilized for ablation studies and then present the prototype of a tool called AblationMage, that leverages LLMs to semi-automate the overall process of conducting ablation study experiments. We showcase the usability of AblationMage as a tool through three experiments, including one in which we reproduce the ablation studies from a recently published applied DL paper.

Place, publisher, year, edition, pages
ACM Digital Library, 2025
Keywords
Ablation Studies, Deep Learning, Feature Ablation, Model Ablation, Large Language Models
National Category
Computer Sciences
Research subject
Computer Science; Computer Science
Identifiers
urn:nbn:se:kth:diva-360719 (URN)10.1145/3721146.3721957 (DOI)001477868300025 ()2-s2.0-105003634645 (Scopus ID)
Conference
The 5th Workshop on Machine Learning and Systems (EuroMLSys), co-located with the 20th European Conference on Computer Systems (EuroSys)
Funder
Vinnova, 2016–05193
Note

QC 20250303

Available from: 2025-02-28 Created: 2025-02-28 Last updated: 2025-07-01
Ghasemirahni, H., Farshin, A., Scazzariello, M., Maguire Jr., G. Q., Kostic, D. & Chiesa, M. (2024). FAJITA: Stateful Packet Processing at 100 Million pps. Proceedings of the ACM on Networking, 2(CoNEXT3), 1-22
Open this publication in new window or tab >>FAJITA: Stateful Packet Processing at 100 Million pps
Show others...
2024 (English)In: Proceedings of the ACM on Networking, E-ISSN 2834-5509, Vol. 2, no CoNEXT3, p. 1-22Article in journal (Refereed) Published
Abstract [en]

Data centers increasingly utilize commodity servers to deploy low-latency Network Functions (NFs). However, the emergence of multi-hundred-gigabit-per-second network interface cards (NICs) has drastically increased the performance expected from commodity servers. Additionally, recently introduced systems that store packet payloads in temporary off-CPU locations (e.g., programmable switches, NICs, and RDMA servers) further increase the load on NF servers, making packet processing even more challenging. This paper demonstrates existing bottlenecks and challenges of state-of-the-art stateful packet processing frameworks and proposes a system, called FAJITA, to tackle these challenges & accelerate stateful packet processing on commodity hardware. FAJITA proposes an optimized processing pipeline for stateful network functions to minimize memory accesses and overcome the overheads of accessing shared data structures while ensuring efficient batch processing at every stage of the pipeline. Furthermore, FAJITA provides a performant architecture to deploy high-performance network functions service chains containing stateful elements with different state granularities. FAJITA improves the throughput and latency of high-speed stateful network functions by ~2.43x compared to the most performant state-of-the-art solutions, enabling commodity hardware to process up to ~178 Million 64-B packets per second (pps) using 16 cores.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2024
Keywords
packet processing frameworks, stateful network functions
National Category
Communication Systems Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-357087 (URN)10.1145/3676861 (DOI)
Projects
ULTRA
Funder
EU, Horizon 2020, 770889Swedish Research Council, 2021-04212Vinnova, 2023-03003
Note

QC 20241206

Available from: 2024-12-04 Created: 2024-12-04 Last updated: 2024-12-06Bibliographically approved
Ghasemirahni, H. (2024). Realizing High-Performance Stateful Network Function Chains on Commodity Hardware: Improving Packet Processing Frameworks by Minimizing Memory Access Overheads. (Doctoral dissertation). Stockholm, Sweden: KTH Royal Institute of Technology
Open this publication in new window or tab >>Realizing High-Performance Stateful Network Function Chains on Commodity Hardware: Improving Packet Processing Frameworks by Minimizing Memory Access Overheads
2024 (English)Doctoral thesis, monograph (Other academic)
Alternative title[sv]
Realisering av högpresterande tillståndsbaserade nätverksfunktionskedjor på standardhårdvara : Förbättra ramverk för paketbearbetning genom att minimera minnesåtkomstoverhead
Abstract [en]

Data centers increasingly deploy commodity servers with high-speed network interface cards to enable network services with low latency and high throughput capabilities. However, existing packet processing systems still suffer from high I/O and memory access overheads, especially when deploying stateful network functions, resulting in degraded performance.

This doctoral dissertation describes our attempts to improve the performance of stateful network functions deployed on commodity servers by carefully studying traffic properties in heterogeneous data centers, low-level analysis of existing systems' bottlenecks, and finally proposing solutions to optimize stateful packet processing and alleviating systems' memory and I/O overheads. 

The first contribution of this dissertation studies the impact of temporal and spatial traffic locality on the performance of commodity servers. Accordingly, we propose Reframer as a system that deliberately delays packets and reorders them to increase traffic locality. By deploying Reframer in front of a network function chain the system achieves up to 84% improvement in throughput and reduces the flow completion time of a web server by 11%.

The second contribution of this dissertation focuses on optimizing packet processing frameworks when deploying a chain of stateful network functions with various flow definitions. We identify three commonly practiced principles that are essential to achieve high performance. We propose FAJITA as a cache-friendly stateful packet processing framework that improves the performance of stateful network function service chains compared to existing state-of-the-art solutions, by at least 2.4× & 1.5× when using shared-nothing & shared architectures, respectively.

The third contribution of this dissertation takes one step further in optimizing packet processing frameworks by automatically configuring RSS before deploying a stateful network function chain on a commodity server. We propose FlowMage, a system that leverages Large Language Models (LLMs) to perform code analysis and extract essential information from stateful network functions. FlowMage uses this data to find an efficient configuration of a network function chain while preserving the semantics of the NF chain. FlowMage achieves a significant performance improvement (up to 11×) in comparison to the default configuration of the system.

Abstract [sv]

Datacenter använder i allt högre grad vanliga servrar med högpresterande nätverkskort för att möjliggöra nätverkstjänster med låg latens och hög genomströmningskapacitet. Dock lider befintliga paketbehandlingssystem fortfarande av höga I/O och minnesåtkomstkostnader, särskilt vid implementering av tillståndsberoende nätverksfunktioner, vilket resulterar i försämrad prestanda.

Denna doktorsavhandling beskriver våra försök att förbättra prestandan för tillståndsberoende nätverksfunktioner implementerade på vanliga servrar genom att noggrant studera trafikegenskaper i heterogena datacenter, göra låg-nivå analyser av befintliga systemflaskhalsar och slutligen föreslå lösningar för att optimera tillståndsberoende paketbehandling och minska systemens minnes- och I/O-kostnader.

Den första bidraget i denna avhandling undersöker påverkan av temporal och spatial trafiklokalitet på prestandan hos vanliga servrar. Därmed föreslår vi Reframer som ett system som medvetet fördröjer paket och omordnar dem för att öka trafikens lokalitet. Genom att implementera Reframer framför en nätverksfunktionskedja uppnår systemet upp till 84% förbättring i genomströmning och minskar flödesavslutningstiden för en webbserver med 11%.

Det andra bidraget i denna avhandling fokuserar på att optimera paketbehandlingsramverk vid implementering av en kedja av tillståndsberoende nätverksfunktioner med olika flödesdefinitioner. Vi identifierar tre vanligt förekommande principer som är avgörande för att uppnå hög prestanda. Vi föreslår FAJITA som ett cache-vänligt ramverk för tillståndsberoende paketbehandling som förbättrar prestandan för tjänstekedjor med tillståndsberoende nätverksfunktioner jämfört med befintliga toppmoderna lösningar, med minst 2,4× & 1,5× när man använder respektive icke-delad och delad arkitektur.

Det tredje bidraget i denna avhandling tar ett steg längre i att optimera paketbehandlingsramverk genom att automatiskt konfigurera RSS innan man implementerar en kedja av tillståndsberoende nätverksfunktioner på en vanlig server. Vi föreslår FlowMage, ett system som använder stora språkmodeller (LLMs) för att utföra kodanalys och extrahera väsentlig information från tillståndsberoende nätverksfunktioner. FlowMage använder denna data för att hitta en effektiv konfiguration av en nätverksfunktionskedja samtidigt som den bevarar NF-kedjans semantik. FlowMage uppnår en betydande prestandaförbättring (upp till 11×) jämfört med systemets standardkonfiguration.

Place, publisher, year, edition, pages
Stockholm, Sweden: KTH Royal Institute of Technology, 2024. p. 158
Series
TRITA-EECS-AVL ; 2024:82
Keywords
Stateful Network Functions, Packet Processing Frameworks, Traffic Locality, Cache Utilization, Tillståndsberoende nätverksfunktioner, Paketbehandlingsramverk, Trafiklokalitet, Cacheanvändning
National Category
Communication Systems Computer Systems
Research subject
Information and Communication Technology; Computer Science
Identifiers
urn:nbn:se:kth:diva-355029 (URN)978-91-8106-083-6 (ISBN)
Public defence
2024-11-18, Sal C (Sven-Olof Öhrvik), Zoom seminar: https://kth-se.zoom.us/j/65526555288, Kistagången 16, plan 2, KTH Kista, Stockholm, 15:00 (English)
Opponent
Supervisors
Projects
ULTRA
Funder
EU, Horizon 2020, 770889Swedish Research Council, 2021-04212Vinnova, 2023-03003
Note

QC 20241018

Available from: 2024-10-18 Created: 2024-10-18 Last updated: 2024-10-21Bibliographically approved
Scazzariello, M., Caiazzi, T., Ghasemirahni, H., Barbette, T., Kostic, D. & Chiesa, M. (2023). A High-Speed Stateful Packet Processing Approach for Tbps Programmable Switches. In: 20th USENIX Symposium on Networked Systems Designand Implementation (NSDI ’23): . Paper presented at NSDI'23 - 20th USENIX Symposium on Networked Systems Design and Implementation, April 17–19, 2023, Boston, MA, USA (pp. 1237-1255). The USENIX Association
Open this publication in new window or tab >>A High-Speed Stateful Packet Processing Approach for Tbps Programmable Switches
Show others...
2023 (English)In: 20th USENIX Symposium on Networked Systems Designand Implementation (NSDI ’23), The USENIX Association , 2023, p. 1237-1255Conference paper, Oral presentation with published abstract (Refereed)
Abstract [en]

High-speed ASIC switches hold great promise for offloading complex packet processing pipelines directly in the highspeed data-plane. Yet, a large variety of today’s packet processing pipelines, including stateful network functions andpacket schedulers, require storing some (or all the) packetsfor short amount of times in a programmatic manner. Such aprogrammable buffer feature is missing on today’s high-speedASIC switches.In this work, we present RIBOSOME, a system that extendsprogrammable switches with external memory (to store packets) and external general-purpose packet processing devicessuch as CPUs or FPGAs (to perform stateful operations). Astoday’s packet processing devices are bottlenecked by theirnetwork interface speeds, RIBOSOME carefully transmits onlythe relevant bits to these devices. RIBOSOME leverages sparebandwidth from any directly connected servers to store theincoming payloads through RDMA. Our evaluation showsthat RIBOSOME can process 300G of traffic through a stateful packet processing pipeline (e.g., firewall, load balancer,packet scheduler) by running the pipeline logic on a singleserver equipped with one 100G interface.

Place, publisher, year, edition, pages
The USENIX Association, 2023
National Category
Computer Systems Communication Systems
Identifiers
urn:nbn:se:kth:diva-326619 (URN)001066630000065 ()2-s2.0-85159326513 (Scopus ID)
Conference
NSDI'23 - 20th USENIX Symposium on Networked Systems Design and Implementation, April 17–19, 2023, Boston, MA, USA
Funder
Swedish Research Council, 2021-04212EU, European Research Council, 770889
Note

Part of proceedings ISBN 978-1-939133-33-5

QC 20230807

Available from: 2023-05-07 Created: 2023-05-07 Last updated: 2023-10-16Bibliographically approved
Ghasemirahni, H., Barbette, T., Katsikas, G. P., Farshin, A., Roozbeh, A., Girondi, M., . . . Kostic, D. (2022). Packet Order Matters! Improving Application Performance by Deliberately Delaying Packets. In: Proceedings of the 19th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2022: . Paper presented at 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI), APR 04-06, 2022, Renton, WA (pp. 807-827). USENIX - The Advanced Computing Systems Association
Open this publication in new window or tab >>Packet Order Matters! Improving Application Performance by Deliberately Delaying Packets
Show others...
2022 (English)In: Proceedings of the 19th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2022, USENIX - The Advanced Computing Systems Association, 2022, p. 807-827Conference paper, Published paper (Refereed)
Abstract [en]

Data centers increasingly deploy commodity servers with high-speed network interfaces to enable low-latency communication. However, achieving low latency at high data rates crucially depends on how the incoming traffic interacts with the system's caches. When packets that need to be processed in the same way are consecutive, i.e., exhibit high temporal and spatial locality, caches deliver great benefits.

In this paper, we systematically study the impact of temporal and spatial traffic locality on the performance of commodity servers equipped with high-speed network interfaces. Our results show that (i) the performance of a variety of widely deployed applications degrades substantially with even the slightest lack of traffic locality, and (ii) a traffic trace from our organization reveals poor traffic locality as networking protocols, drivers, and the underlying switching/routing fabric spread packets out in time (reducing locality). To address these issues, we built Reframer, a software solution that deliberately delays packets and reorders them to increase traffic locality. Despite introducing μs-scale delays of some packets, we show that Reframer increases the throughput of a network service chain by up to 84% and reduces the flow completion time of a web server by 11% while improving its throughput by 20%.

Place, publisher, year, edition, pages
USENIX - The Advanced Computing Systems Association, 2022
Keywords
packet ordering, spatial and temporal locality, packet scheduling, batch processing, high-speed networking
National Category
Communication Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-304656 (URN)000876762200046 ()2-s2.0-85140983450 (Scopus ID)
Conference
19th USENIX Symposium on Networked Systems Design and Implementation (NSDI), APR 04-06, 2022, Renton, WA
Projects
ULTRAWASPTime-Critical Clouds
Funder
Swedish Foundation for Strategic ResearchKnut and Alice Wallenberg FoundationEU, European Research Council
Note

QC 20230619

Available from: 2021-11-09 Created: 2021-11-09 Last updated: 2023-06-19Bibliographically approved
Reda, W., Bogdanov, K., Milolidakis, A., Ghasemirahni, H., Chiesa, M., Maguire Jr., G. Q. & Kostic, D. (2020). Path Persistence in the Cloud: A Study of the Effects of Inter-Region Traffic Engineering in a Large Cloud Provider's Network. Computer communication review, 50(2), 11-23
Open this publication in new window or tab >>Path Persistence in the Cloud: A Study of the Effects of Inter-Region Traffic Engineering in a Large Cloud Provider's Network
Show others...
2020 (English)In: Computer communication review, ISSN 0146-4833, E-ISSN 1943-5819, Vol. 50, no 2, p. 11-23Article in journal, Editorial material (Refereed) Published
Abstract [en]

A commonly held belief is that traffic engineering and routing changes are infrequent. However, based on our measurements over a number of years of traffic between data centers in one of the largest cloud provider's networks, we found that it is common for flows to change paths at ten-second intervals or even faster. These frequent path and, consequently, latency variations can negatively impact the performance of cloud applications, specifically, latency-sensitive and geo-distributed applications.

Our recent measurements and analysis focused on observing path changes and latency variations between different Amazon aws regions. To this end, we devised a path change detector that we validated using both ad hoc experiments and feedback from cloud networking experts. The results provide three main insights: (1) Traffic Engineering (TE) frequently moves (TCP and UDP) flows among network paths of different latency, (2) Flows experience unfair performance, where a subset of flows between two machines can suffer large latency penalties (up to 32% at the 95th percentile) or excessive number of latency changes, and (3) Tenants may have incentives to selfishly move traffic to low latency classes (to boost the performance of their applications). We showcase this third insight with an example using rsync synchronization.

To the best of our knowledge, this is the first paper to reveal the high frequency of TE activity within a large cloud provider's network. Based on these observations, we expect our paper to spur discussions and future research on how cloud providers and their tenants can ultimately reconcile their independent and possibly conflicting objectives. Our data is publicly available for reproducibility and further analysis at http://goo.gl/25BKte.

Place, publisher, year, edition, pages
New York, NY, United States: Association for Computing Machinery (ACM), 2020
Keywords
Traffic engineering, Cloud provider networks, Inter-datacenter traffic, Latency
National Category
Communication Systems Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-273758 (URN)10.1145/3402413.3402416 (DOI)000582604500005 ()2-s2.0-85086379414 (Scopus ID)
Funder
Swedish Foundation for Strategic Research , TCCEU, European Research Council, 770889
Note

QC 20200602

Available from: 2020-05-28 Created: 2020-05-28 Last updated: 2024-03-18Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-0034-5098

Search in DiVA

Show all publications