kth.sePublications KTH
Change search
Link to record
Permanent link

Direct link
Alternative names
Publications (10 of 96) Show all publications
Verardo, G., Boman, M., Bruchfeld, S., Chiesa, M., Koch, S., Maguire Jr., G. Q. & Kostic, D. (2025). FMM-Head: Enhancing Autoencoder-Based ECG Anomaly Detection with Prior Knowledge. In: Pattern Recognition and Artificial Intelligence - 4th International Conference, ICPRAI 2024, Proceedings: . Paper presented at 4th International Conference on Pattern Recognition and Artificial Intelligence, ICPRAI 2024, Jeju Island, Korea, Jul 3 2024 - Jul 6 2024 (pp. 18-32). Springer Nature
Open this publication in new window or tab >>FMM-Head: Enhancing Autoencoder-Based ECG Anomaly Detection with Prior Knowledge
Show others...
2025 (English)In: Pattern Recognition and Artificial Intelligence - 4th International Conference, ICPRAI 2024, Proceedings, Springer Nature , 2025, p. 18-32Conference paper, Published paper (Refereed)
Abstract [en]

Detecting anomalies in electrocardiogram data is crucial to identify deviations from normal heartbeat patterns and provide timely intervention to at-risk patients. Various AutoEncoder models (AE) have been proposed to tackle the anomaly detection task with machine learning (ML). However, these models do not explicitly consider the specific patterns of ECG leads, thus compromising learning efficiency. In contrast, we replace the decoding part of the AE with a reconstruction head (namely, FMM-Head) based on prior knowledge of the ECG shape. Our model consistently achieves higher anomaly detection capabilities than state-of-the-art models, up to 0.31 increase in area under the ROC curve (AUROC), with as little as half the original model size and explainable extracted features. The processing time of our model is four orders of magnitude lower than solving an optimization problem to obtain the same parameters, thus making it suitable for real-time ECG parameters extraction and anomaly detection. The code is available at: https://github.com/giacomoverardo/FMM-Head.

Place, publisher, year, edition, pages
Springer Nature, 2025
Keywords
AutoEncoders, ECG anomaly detection, Machine Learning
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-361152 (URN)10.1007/978-981-97-8702-9_2 (DOI)2-s2.0-85219192392 (Scopus ID)
Conference
4th International Conference on Pattern Recognition and Artificial Intelligence, ICPRAI 2024, Jeju Island, Korea, Jul 3 2024 - Jul 6 2024
Note

Part of ISBN 9789819787012

QC 20250313

Available from: 2025-03-12 Created: 2025-03-12 Last updated: 2025-03-13Bibliographically approved
Siavashi, M., Dindarloo, F. K., Kostic, D. & Chiesa, M. (2025). Priority-Aware Preemptive Scheduling for Mixed-Priority Workloads in MoE Inference. In: Proceedings of the 2025 The 5th Workshop On Machine Learning And Systems, EUROMLSYS 2025: . Paper presented at 5th Workshop on Machine Learning and Systems-EUROMLSYS-Annual, MAR 30-APR 03, 2025, Rotterdam, NETHERLANDS (pp. 132-138). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Priority-Aware Preemptive Scheduling for Mixed-Priority Workloads in MoE Inference
2025 (English)In: Proceedings of the 2025 The 5th Workshop On Machine Learning And Systems, EUROMLSYS 2025, Association for Computing Machinery (ACM) , 2025, p. 132-138Conference paper, Published paper (Refereed)
Abstract [en]

Large Language Models have revolutionized natural language processing, yet serving them efficiently in data centers remains challenging due to mixed workloads comprising latency-sensitive (LS) and best-effort (BE) jobs. Existing inference systems employ iteration-level first-come-first-served scheduling, causing head-of-line blocking when BE jobs delay LS jobs. We introduce QLLM, a novel inference system designed for Mixture of Experts (MoE) models, featuring a fine-grained, priority-aware preemptive scheduler. QLLM enables expert-level preemption, deferring BE job execution while minimizing LS time-to-first-token (TTFT). Our approach removes iteration-level scheduling constraints, enabling the scheduler to preempt jobs at any layer based on priority. Evaluations on an Nvidia A100 GPU show that QLLM significantly improves performance. It reduces LS TTFT by an average of 65.5x and meets the SLO at up to 7 requests/sec, whereas the baseline fails to do so under the tested workload. Additionally, it cuts LS turnaround time by up to 12.8x without impacting throughput. QLLM is modular, extensible, and seamlessly integrates with Hugging Face MoE models.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2025
Keywords
Large Language Models, Mixture-of-Experts, Preemptive Scheduling, Latency-Sensitive Inference, GPU Acceleration, Priority-Aware Scheduling
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-364688 (URN)10.1145/3721146.3721956 (DOI)001477868300014 ()2-s2.0-105003631563 (Scopus ID)979-8-4007-1538-9 (ISBN)
Conference
5th Workshop on Machine Learning and Systems-EUROMLSYS-Annual, MAR 30-APR 03, 2025, Rotterdam, NETHERLANDS
Note

QC 20250701

Available from: 2025-07-01 Created: 2025-07-01 Last updated: 2025-07-01Bibliographically approved
Verardo, G., Perez-Ramirez, D. F., Bruchfeld, S., Boman, M., Chiesa, M., Koch, S., . . . Kostic, D. (2025). Reducing the Number of Leads for ECG Imaging with Graph Neural Networks and Meaningful Latent Space. In: Statistical Atlases and Computational Models of the Heart. Workshop, CMRxRecon and MBAS Challenge Papers. - 15th International Workshop, STACOM 2024, Held in Conjunction with MICCAI 2024, Revised Selected Papers: . Paper presented at 15th International Workshop on Statistical Atlases and Computational Models of the Heart, STACOM 2024, Held in Conjunction with MICCAI 2024, Marrakesh, Morocco, October 10, 2024 (pp. 301-312). Springer Nature
Open this publication in new window or tab >>Reducing the Number of Leads for ECG Imaging with Graph Neural Networks and Meaningful Latent Space
Show others...
2025 (English)In: Statistical Atlases and Computational Models of the Heart. Workshop, CMRxRecon and MBAS Challenge Papers. - 15th International Workshop, STACOM 2024, Held in Conjunction with MICCAI 2024, Revised Selected Papers, Springer Nature , 2025, p. 301-312Conference paper, Published paper (Refereed)
Abstract [en]

ECG Imaging (ECGI) is a technique for cardiac electrophysiology that allows reconstructing the electrical propagation through different parts of the heart using electrodes on the body surface. Although ECGI is non-invasive, it has not become clinically routine due to the large number of leads required to produce a fine-grained estimate of the cardiac activation map. Using fewer leads could make ECGI practical for clinical patient care. We propose to tackle the lead reduction problem by enhancing Neural Network (NN) models with Graph Neural Network (GNN)-enhanced gating. Our approach encodes the leads into a meaningful representation and then gates the latent space with a GNN. In our evaluation with a state-of-the-art dataset, we show that keeping only the most important leads does not increase the cardiac reconstruction and onset detection error. Despite dropping almost 140 leads out of 260, our model achieves the same performance as another NN baseline while reducing the number of leads. Our code is available at github.com/giacomoverardo/ecg-imaging.

Place, publisher, year, edition, pages
Springer Nature, 2025
Keywords
Deep Learning, ECG Imaging, Graph Neural Networks
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-363463 (URN)10.1007/978-3-031-87756-8_30 (DOI)2-s2.0-105004252914 (Scopus ID)
Conference
15th International Workshop on Statistical Atlases and Computational Models of the Heart, STACOM 2024, Held in Conjunction with MICCAI 2024, Marrakesh, Morocco, October 10, 2024
Note

Part of ISBN 9783031877551

QC 20250516

Available from: 2025-05-15 Created: 2025-05-15 Last updated: 2025-05-16Bibliographically approved
Ghasemirahni, H., Farshin, A., Scazzariello, M., Chiesa, M. & Kostic, D. (2024). Deploying Stateful Network Functions Efficiently using Large Language Models. In: EuroMLSys 2024 - Proceedings of the 2024 4th Workshop on Machine Learning and Systems: . Paper presented at 4th Workshop on Machine Learning and Systems, EuroMLSys 2024, held in conjunction with ACM EuroSys 2024, Athens, Greece, Apr 22 2024 (pp. 28-38). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Deploying Stateful Network Functions Efficiently using Large Language Models
Show others...
2024 (English)In: EuroMLSys 2024 - Proceedings of the 2024 4th Workshop on Machine Learning and Systems, Association for Computing Machinery (ACM) , 2024, p. 28-38Conference paper, Published paper (Refereed)
Abstract [en]

Stateful network functions are increasingly used in data centers. However, their scalability remains a significant challenge since parallelizing packet processing across multiple cores requires careful configuration t o avoid compromising the application’s semantics or performance. This challenge is particularly important when deploying multiple stateful functions on multi-core servers. This paper proposes FlowMage, a system that leverages Large Language Models (LLMs) to perform code analysis and extract essential information from stateful network functions (NFs) prior to their deployment on a server. FlowMage uses this data to find an efficient configuration of an NF chain that maximizes performance while preserving the semantics of the NF chain. Our evaluation shows that, utilizing GPT-4, FlowMage is able to find and apply optimized configuration when deploying stateful NFs chain on a server, resulting in significant p erformance i mprovement (up t o 1 1×) in comparison to the default configuration of the system.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2024
Keywords
Intra-Server Load Balancing, LLMs, RSS Configuration, Stateful Network Functions, Static Code Analysis
National Category
Computer Systems Communication Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-346539 (URN)10.1145/3642970.3655836 (DOI)001221134800004 ()2-s2.0-85192276579 (Scopus ID)
Conference
4th Workshop on Machine Learning and Systems, EuroMLSys 2024, held in conjunction with ACM EuroSys 2024, Athens, Greece, Apr 22 2024
Projects
ULTRA
Funder
EU, Horizon 2020, 770889
Note

Part of ISBN 979-840070541-0

QC 20240520

Available from: 2024-05-16 Created: 2024-05-16 Last updated: 2024-12-06Bibliographically approved
Ghasemirahni, H., Farshin, A., Scazzariello, M., Maguire Jr., G. Q., Kostic, D. & Chiesa, M. (2024). FAJITA: Stateful Packet Processing at 100 Million pps. Proceedings of the ACM on Networking, 2(CoNEXT3), 1-22
Open this publication in new window or tab >>FAJITA: Stateful Packet Processing at 100 Million pps
Show others...
2024 (English)In: Proceedings of the ACM on Networking, E-ISSN 2834-5509, Vol. 2, no CoNEXT3, p. 1-22Article in journal (Refereed) Published
Abstract [en]

Data centers increasingly utilize commodity servers to deploy low-latency Network Functions (NFs). However, the emergence of multi-hundred-gigabit-per-second network interface cards (NICs) has drastically increased the performance expected from commodity servers. Additionally, recently introduced systems that store packet payloads in temporary off-CPU locations (e.g., programmable switches, NICs, and RDMA servers) further increase the load on NF servers, making packet processing even more challenging. This paper demonstrates existing bottlenecks and challenges of state-of-the-art stateful packet processing frameworks and proposes a system, called FAJITA, to tackle these challenges & accelerate stateful packet processing on commodity hardware. FAJITA proposes an optimized processing pipeline for stateful network functions to minimize memory accesses and overcome the overheads of accessing shared data structures while ensuring efficient batch processing at every stage of the pipeline. Furthermore, FAJITA provides a performant architecture to deploy high-performance network functions service chains containing stateful elements with different state granularities. FAJITA improves the throughput and latency of high-speed stateful network functions by ~2.43x compared to the most performant state-of-the-art solutions, enabling commodity hardware to process up to ~178 Million 64-B packets per second (pps) using 16 cores.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2024
Keywords
packet processing frameworks, stateful network functions
National Category
Communication Systems Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-357087 (URN)10.1145/3676861 (DOI)
Projects
ULTRA
Funder
EU, Horizon 2020, 770889Swedish Research Council, 2021-04212Vinnova, 2023-03003
Note

QC 20241206

Available from: 2024-12-04 Created: 2024-12-04 Last updated: 2024-12-06Bibliographically approved
Wang, C., Scazzariello, M., Farshin, A., Ferlin, S., Kostic, D. & Chiesa, M. (2024). NetConfEval: Can LLMs Facilitate Network Configuration?. Proceedings of the ACM on Networking, 2(CoNEXT2), Article ID 7.
Open this publication in new window or tab >>NetConfEval: Can LLMs Facilitate Network Configuration?
Show others...
2024 (English)In: Proceedings of the ACM on Networking, ISSN 2834-5509, Vol. 2, no CoNEXT2, article id 7Article in journal (Refereed) Published
Abstract [en]

This paper explores opportunities to utilize Large Language Models (LLMs) to make network configuration human-friendly, simplifying the configuration of network devices & development of routing algorithms and minimizing errors. We design a set of benchmarks (NetConfEval) to examine the effectiveness of different models in facilitating and automating network configuration. More specifically, we focus on the scenarios where LLMs translate high-level policies, requirements, and descriptions (i.e., specified in natural language) into low-level network configurations & Python code. NetConfEval considers four tasks that could potentially facilitate network configuration, such as (i) generating high-level requirements into a formal specification format, (ii) generating API/function calls from high-level requirements, (iii) developing routing algorithms based on high-level descriptions, and (iv) generating low-level configuration for existing and new protocols based on input documentation. Learning from the results of our study, we propose a set of principles to design LLM-based systems to configure networks. Finally, we present two GPT-4-based prototypes to (i) automatically configure P4-enabled devices from a set of high-level requirements and (ii) integrate LLMs into existing network synthesizers.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2024
Keywords
benchmark, code generation, function calling, large language models (llms), network configuration, network synthesizer, p4, rag, routing algorithms
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-357124 (URN)10.1145/3656296 (DOI)
Projects
Digital Futures
Funder
Vinnova, 2023-03003EU, European Research Council, 770889Swedish Research Council, 2021-0421
Note

QC 20241211

Available from: 2024-12-04 Created: 2024-12-04 Last updated: 2024-12-11Bibliographically approved
Girondi, M., Scazzariello, M., Maguire Jr., G. Q. & Kostic, D. (2024). Toward GPU-centric Networking on Commodity Hardware. In: 7th International Workshop on Edge Systems, Analytics and Networking (EdgeSys 2024),  April 22, 2024, Athens, Greece: . Paper presented at 7th International Workshop on Edge Systems, Analytics and Networking (EdgeSys 2024), April 22, 2024, Athens, Greece . New York: ACM Digital Library
Open this publication in new window or tab >>Toward GPU-centric Networking on Commodity Hardware
2024 (English)In: 7th International Workshop on Edge Systems, Analytics and Networking (EdgeSys 2024),  April 22, 2024, Athens, Greece, New York: ACM Digital Library, 2024Conference paper, Published paper (Refereed)
Abstract [en]

GPUs are emerging as the most popular accelerator for many applications, powering the core of machine learning applications. In networked GPU-accelerated applications input & output data typically traverse the CPU and the OS network stack multiple times, getting copied across the system’s main memory. These transfers increase application latency and require expensive CPU cycles, reducing the system’s efficiency, and increasing the overall response times. These inefficiencies become of greater importance in latency-bounded deployments, or with high throughput, where copy times could quickly inflate the response time of modern GPUs.We leverage the efficiency and kernel-bypass benefits of RDMA to transfer data in and out of GPUs without using any CPU cycles or synchronization. We demonstrate the ability of modern GPUs to saturate a 100-Gbps link, and evaluate the network processing timein the context of an inference serving application.

Place, publisher, year, edition, pages
New York: ACM Digital Library, 2024
Keywords
GPUs, Commodity Hardware, Inference Serving, RDMA
National Category
Communication Systems Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-345624 (URN)10.1145/3642968.3654820 (DOI)001234771200008 ()2-s2.0-85192024363 (Scopus ID)
Conference
7th International Workshop on Edge Systems, Analytics and Networking (EdgeSys 2024), April 22, 2024, Athens, Greece 
Note

QC 20240415

Part of ISBN 979-8-4007-0539-7

Available from: 2024-04-15 Created: 2024-04-15 Last updated: 2024-08-28Bibliographically approved
Scazzariello, M., Caiazzi, T., Ghasemirahni, H., Barbette, T., Kostic, D. & Chiesa, M. (2023). A High-Speed Stateful Packet Processing Approach for Tbps Programmable Switches. In: 20th USENIX Symposium on Networked Systems Designand Implementation (NSDI ’23): . Paper presented at NSDI'23 - 20th USENIX Symposium on Networked Systems Design and Implementation, April 17–19, 2023, Boston, MA, USA (pp. 1237-1255). The USENIX Association
Open this publication in new window or tab >>A High-Speed Stateful Packet Processing Approach for Tbps Programmable Switches
Show others...
2023 (English)In: 20th USENIX Symposium on Networked Systems Designand Implementation (NSDI ’23), The USENIX Association , 2023, p. 1237-1255Conference paper, Oral presentation with published abstract (Refereed)
Abstract [en]

High-speed ASIC switches hold great promise for offloading complex packet processing pipelines directly in the highspeed data-plane. Yet, a large variety of today’s packet processing pipelines, including stateful network functions andpacket schedulers, require storing some (or all the) packetsfor short amount of times in a programmatic manner. Such aprogrammable buffer feature is missing on today’s high-speedASIC switches.In this work, we present RIBOSOME, a system that extendsprogrammable switches with external memory (to store packets) and external general-purpose packet processing devicessuch as CPUs or FPGAs (to perform stateful operations). Astoday’s packet processing devices are bottlenecked by theirnetwork interface speeds, RIBOSOME carefully transmits onlythe relevant bits to these devices. RIBOSOME leverages sparebandwidth from any directly connected servers to store theincoming payloads through RDMA. Our evaluation showsthat RIBOSOME can process 300G of traffic through a stateful packet processing pipeline (e.g., firewall, load balancer,packet scheduler) by running the pipeline logic on a singleserver equipped with one 100G interface.

Place, publisher, year, edition, pages
The USENIX Association, 2023
National Category
Computer Systems Communication Systems
Identifiers
urn:nbn:se:kth:diva-326619 (URN)001066630000065 ()2-s2.0-85159326513 (Scopus ID)
Conference
NSDI'23 - 20th USENIX Symposium on Networked Systems Design and Implementation, April 17–19, 2023, Boston, MA, USA
Funder
Swedish Research Council, 2021-04212EU, European Research Council, 770889
Note

Part of proceedings ISBN 978-1-939133-33-5

QC 20230807

Available from: 2023-05-07 Created: 2023-05-07 Last updated: 2023-10-16Bibliographically approved
Perez Ramirez, D. F., Pérez Penichet, C., Tsiftes, N., Voigt, T., Kostic, D. & Boman, M. (2023). DeepGANTT: A Scalable Deep Learning Scheduler for Backscatter Networks. In: IPSN 2023 - Proceedings of the 2023 22nd International Conference on Information Processing in Sensor Networks: . Paper presented at 22nd ACM/IEEE International Conference on Information Processing in Sensor Networks, IPSN 2023, San Antonio, United States of America, May 9 2023 - May 12 2023 (pp. 163-176). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>DeepGANTT: A Scalable Deep Learning Scheduler for Backscatter Networks
Show others...
2023 (English)In: IPSN 2023 - Proceedings of the 2023 22nd International Conference on Information Processing in Sensor Networks, Association for Computing Machinery (ACM) , 2023, p. 163-176Conference paper, Published paper (Refereed)
Abstract [en]

Novel backscatter communication techniques enable battery-free sensor tags to interoperate with unmodified standard IoT devices, extending a sensor network's capabilities in a scalable manner. Without requiring additional dedicated infrastructure, the battery-free tags harvest energy from the environment, while the IoT devices provide them with the unmodulated carrier they need to communicate. A schedule coordinates the provision of carriers for the communications of battery-free devices with IoT nodes. Optimal carrier scheduling is an NP-hard problem that limits the scalability of network deployments. Thus, existing solutions waste energy and other valuable resources by scheduling the carriers suboptimally. We present DeepGANTT, a deep learning scheduler that leverages graph neural networks to efficiently provide near-optimal carrier scheduling. We train our scheduler with optimal schedules of relatively small networks obtained from a constraint optimization solver, achieving a performance within 3% of the optimum. Without the need to retrain, our scheduler generalizes to networks 6 × larger in the number of nodes and 10 × larger in the number of tags than those used for training. DeepGANTT breaks the scalability limitations of the optimal scheduler and reduces carrier utilization by up to compared to the state-of-the-art heuristic. As a consequence, our scheduler efficiently reduces energy and spectrum utilization in backscatter networks.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2023
Keywords
combinatorial optimization, machine learning, scheduling, wireless backscatter communications
National Category
Communication Systems
Identifiers
urn:nbn:se:kth:diva-338648 (URN)10.1145/3583120.3586957 (DOI)001112123000013 ()2-s2.0-85160025874 (Scopus ID)
Conference
22nd ACM/IEEE International Conference on Information Processing in Sensor Networks, IPSN 2023, San Antonio, United States of America, May 9 2023 - May 12 2023
Note

Part of ISBN 9798400701184

QC 20231023

Available from: 2023-10-23 Created: 2023-10-23 Last updated: 2024-03-12Bibliographically approved
Verardo, G., Barreira, D., Chiesa, M., Kostic, D. & Maguire Jr., G. Q. (2023). Fast Server Learning Rate Tuning for Coded Federated Dropout. In: Goebel, R Yu, H Faltings, B Fan, L Xiong, Z (Ed.), FL 2022: Trustworthy Federated Learning. Paper presented at 1st International Workshop on Trustworthy Federated Learning (FL), JUL 23, 2022, Vienna, AUSTRIA (pp. 84-99). Springer Nature, 13448
Open this publication in new window or tab >>Fast Server Learning Rate Tuning for Coded Federated Dropout
Show others...
2023 (English)In: FL 2022: Trustworthy Federated Learning / [ed] Goebel, R Yu, H Faltings, B Fan, L Xiong, Z, Springer Nature , 2023, Vol. 13448, p. 84-99Conference paper, Published paper (Refereed)
Abstract [en]

In Federated Learning (FL), clients with low computational power train a common machine model by exchanging parameters via updates instead of transmitting potentially private data. Federated Dropout (FD) is a technique that improves the communication efficiency of a FL session by selecting a subset of model parameters to be updated in each training round. However, compared to standard FL, FD produces considerably lower accuracy and faces a longer convergence time. In this chapter, we leverage coding theory to enhance FD by allowing different sub-models to be used at each client. We also show that by carefully tuning the server learning rate hyper-parameter, we can achieve higher training speed while also reaching up to the same final accuracy as the no dropout case. Evaluations on the EMNIST dataset show that our mechanism achieves 99.6% of the final accuracy of the no dropout case while requiring 2.43x less bandwidth to achieve this level of accuracy.

Place, publisher, year, edition, pages
Springer Nature, 2023
Series
Lecture Notes in Artificial Intelligence, ISSN 2945-9133
Keywords
Federated Learning, Hyper-parameters tuning, Coding Theory
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-330513 (URN)10.1007/978-3-031-28996-5_7 (DOI)000999818400007 ()2-s2.0-85152560522 (Scopus ID)
Conference
1st International Workshop on Trustworthy Federated Learning (FL), JUL 23, 2022, Vienna, AUSTRIA
Note

QC 20230630

Available from: 2023-06-30 Created: 2023-06-30 Last updated: 2023-06-30Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-1256-1070

Search in DiVA

Show all publications