kth.sePublikationer KTH
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Towards Efficient Distributed Intelligence: Cost-Aware Sensing and Offloading for Inference at the Edge
KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Teknisk informationsvetenskap.ORCID-id: 0000-0002-2739-5060
2025 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

The ongoing proliferation of intelligent systems, driven by artificial intelligence (AI) and 6G, is leading to a surge in closed-loop inference tasks performed on distributed compute nodes.These systems operate under strict latency and energy constraints, extending the challenge beyond achieving high accuracy to enabling timely and energy-efficient inference.This thesis examines how distributed inference can be optimised through two key decisions: when to sample the environment and when to offload computation to a more accurate remote model.These decisions are guided by the semantics of the underlying environment and its associated costs.The semantics are kept abstract, and pre-trained inference models are employed, ensuring a platform-independent formulation adaptable to the rapid evolution of distributed intelligence and wireless technologies.

Regarding sampling, we studied the trade-off between sampling cost and detection delay in event-detection systems without sufficient local inference capabilities. The problem was posed as an optimisation over sampling instants under a stochastic event sequence and analysed at different levels of modelling complexity, ranging from periodic to aperiodic sampling. Closed-form, algorithmic, and approximate solutions were developed, with some results of independent mathematical interest.Simulations in realistic settings showed marked gains in efficiency over systems that neglect event semantics. In particular, aperiodic sampling achieved a stable improvement of ~10% over optimised periodic policies across parameter variations.

Regarding offloading, we introduced a novel Hierarchical Inference (HI) framework, which makes sequential offload decisions between a low-latency, energy-efficient local model and a high-accuracy remote model using locally available confidence measures. We proposed HI algorithms based on thresholds and ambiguity regions learned online by suitably extending the Prediction with Expert Advice (PEA) approaches to continuous expert spaces and partial feedback. HI algorithms minimise the expected cost across inference rounds, combining offloading and misclassification costs, and are shown to achieve a uniformly sublinear regret of O(T2/3).The proposed algorithms are agnostic to model architecture and communication systems, do not alter model training, and support model updates during operation. Benchmarks on standard classification tasks using the softmax output as a confidence measure showed that HI adaptively distributes inference based on offloading costs, achieving results close to the offline optimum. HI is shown to add resilience to distribution changes and model mismatches, especially when asymmetric misclassification costs are present.

In summary, this thesis presents efficient approaches for sampling and offloading of inference tasks, where various performance metrics are combined into a single cost structure. The work extends beyond conventional inference problems to areas with similar trade-offs, advancing toward efficient distributed intelligence that infers at the right time and in the right place. Future work includes conceptual extensions like joint sampling-offloading design, and integration with collaborative model-training architectures.

Abstract [sv]

Den pågående spridningen av intelligenta system, drivna av artificiell intelligens (AI) och 6G, leder till en ökning av återkopplade inferensuppgifter som utförs på distribuerade beräkningsnoder. Dessa system verkar under strikta krav på latens och energiförbrukning, vilket gör att utmaningen inte enbart handlar om att uppnå hög noggrannhet utan också om att möjliggöra snabb och energieffektiv inferens. Denna avhandling undersöker hur distribuerad inferens kan optimeras genom två centrala beslut: när miljön ska samplas och när beräkningar ska avlastas till en mer exakt, fjärrbelägen modell. Dessa beslut styrs av miljöns semantiska egenskaper och de kostnader som är förknippade med dessa. Semantiken hålls på en abstrakt nivå, och förtränade inferensmodeller används, vilket möjliggör en plattformsoberoende formulering som är anpassningsbar till den snabba utvecklingen inom distribuerad intelligens och trådlös kommunikation.

Angående sampling studerades avvägningen mellan samplingskostnad och detektionsfördröjning i händelsedetekteringssystem som saknar tillräcklig lokal inferenskapacitet. Ett optimeringsproblem över samplingstidpunkter formuleras för stokastiska händelser och analyserades på olika nivåer av modelleringskomplexitet, från periodisk till aperiodisk sampling. Slutna, algoritmiska, och approximativa lösningar utvecklades, varav vissa resultat även är av allmänt matematiskt intresse. Simuleringar i realistiska system visade tydliga effektivitetsvinster jämfört med system som bortser från händelsernas semantik. Särskilt aperiodisk sampling uppnådde en stabil förbättring på cirka 10% jämfört med periodiska strategier över olika systemparametrar.

Angående avlastning introducerades ett nytt ramverk för hierarkisk inferens (HI), som fattar sekventiella avlastningsbeslut mellan en lokal modell med låg fördröjning och energiförbrukning, och en fjärrmodell med högre noggrannhet, baserat på lokala konfidensmått. Vi föreslog HI-algoritmer baserade på tröskelvärden och ambiguitetsregioner som lärs in online genom att utvidga metoder för expertbaserad prediktion (Prediction with Expert Advice, PEA) till kontinuerliga expertrum med partiell återkoppling. HI-algoritmerna minimerar den förväntade kostnaden över flera inferensomgångar genom att kombinera kostnader för avlastning och felklassificering, och uppnår O(T2/3) sublinjär ånger. De föreslagna algoritmerna är oberoende av modellarkitektur och kommunikationssystem, kräver ingen ändring av modellträningen, och stödjer modelluppdateringar under drift. Jämförelser på standardiserade klassificeringsuppgifter med softmax-värde som konfidensmått visade att HI fördelar inferens adaptivt beroende på avlastningskostnader och når resultat nära det offline-optimum som beräknats i efterhand. HI visade sig dessutom öka robustheten mot distributionsförändringar och modellavvikelser, särskilt i fall med asymmetriska felklassificeringskostnader.

Sammanfattningsvis presenterar avhandlingen effektiva metoder för sampling och avlastning av inferensuppgifter där olika prestandamått kombineras i en gemensam kostnadsstruktur. Arbetet sträcker sig bortom konventionella inferensproblem till områden med liknande avvägningar, och bidrar till utvecklingen av effektiv distribuerad intelligens som tar beslut vid rätt tidpunkt och på rätt plats. Framtida arbete inkluderar konceptuella utvidgningar såsom gemensam design av sampling och avlastning, samt integration med kollaborativa modellträningsarkitekturer.

Ort, förlag, år, upplaga, sidor
Stockholm: KTH Royal Institute of Technology, 2025. , s. xiii, 87
Serie
TRITA-EECS-AVL ; 2026:4
Nyckelord [en]
Artificial intelligence, communication, distributed intelligence, inference offloading
Nyckelord [sv]
Artificiell intelligens, kommunikation, distribuerad intelligens, inferensavlastning
Nationell ämneskategori
Systemvetenskap, informationssystem och informatik
Forskningsämne
Elektro- och systemteknik
Identifikatorer
URN: urn:nbn:se:kth:diva-373298ISBN: 978-91-8106-482-7 (tryckt)OAI: oai:DiVA.org:kth-373298DiVA, id: diva2:2017058
Disputation
2026-01-16, https://kth-se.zoom.us/s/61617488895, Salongen, Osquars backe 31, KTH Campus, Stockholm, 10:00 (Engelska)
Opponent
Handledare
Anmärkning

QC 20251127

Tillgänglig från: 2025-11-27 Skapad: 2025-11-27 Senast uppdaterad: 2025-12-09Bibliografiskt granskad
Delarbeten
1. Energy-Optimal Sampling of Edge-Based Feedback Systems
Öppna denna publikation i ny flik eller fönster >>Energy-Optimal Sampling of Edge-Based Feedback Systems
2021 (Engelska)Ingår i: 2021 IEEE International Conference on Communications Workshops, ICC Workshops 2021 - Proceedings, Institute of Electrical and Electronics Engineers (IEEE) , 2021, artikel-id 9473894Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

We study a problem of optimizing the sampling interval in an edge-based feedback system, where sensor samples are offloaded to a back-end server which process them and generates a feedback that is fed-back to a user. Sampling the system at maximum frequency results in the detection of events of interest with minimum delay but incurs higher energy costs due to the communication and processing of some redundant samples. On the other hand, lower sampling frequency results in a higher delay in detecting an event of interest thus increasing the idle energy usage and degrading the quality of experience. We propose a method to quantify this trade-off and compute the optimal sampling interval, and use simulation to demonstrate the energy savings.

Ort, förlag, år, upplaga, sidor
Institute of Electrical and Electronics Engineers (IEEE), 2021
Nyckelord
Energy conservation, event detection, feedback system, mobile edge computing, optimal sampling, Economic and social effects, Quality of service, Back-end servers, Feedback systems, Maximum frequency, Quality of experience (QoE), Redundant samples, Sampling frequencies, Sampling interval, Feedback control
Nationell ämneskategori
Reglerteknik
Identifikatorer
urn:nbn:se:kth:diva-311170 (URN)10.1109/ICCWorkshops50388.2021.9473894 (DOI)000848412200354 ()2-s2.0-85112856823 (Scopus ID)
Konferens
2021 IEEE International Conference on Communications Workshops, ICC Workshops 2021, 14 June 2021 through 23 June 2021
Anmärkning

QC 20221110

Part of proceedings: ISBN 978-1-7281-9441-7

Tillgänglig från: 2022-05-17 Skapad: 2022-05-17 Senast uppdaterad: 2025-11-27Bibliografiskt granskad
2. Energy Efficient Sampling Policies for Edge Computing Feedback Systems
Öppna denna publikation i ny flik eller fönster >>Energy Efficient Sampling Policies for Edge Computing Feedback Systems
2023 (Engelska)Ingår i: IEEE Transactions on Mobile Computing, ISSN 1536-1233, E-ISSN 1558-0660, Vol. 22, nr 8, s. 4634-4647Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

We study the problem of finding efficient sampling policies in an edge-based feedback system, where sensor samples are offloaded to a back-end server that processes them and generates feedback to a user. Sampling the system at maximum frequency results in the detection of events of interest with minimum delay but incurs higher energy costs due to the communication and processing of redundant samples. On the other hand, lower sampling frequency results in higher delay in detecting the event, thus increasing the idle energy usage and degrading the quality of experience. We quantify this trade-off as a weighted function between the number of samples and the sampling interval. We solve the minimisation problem for exponential and Rayleigh distributions, for the random time to the event of interest. We prove the convexity of the objective functions by using novel techniques, which can be of independent interest elsewhere. We argue that adding an initial offset to the periodic sampling can further reduce the energy consumption and jointly compute the optimum offset and sampling interval. We apply our framework to two practically relevant applications and show energy savings of up to 36% when compared to an existing periodic scheme. 

Ort, förlag, år, upplaga, sidor
Institute of Electrical and Electronics Engineers (IEEE), 2023
Nyckelord
cyber physical systems, Delays, Edge computing, Energy consumption, Energy minimisation, Event detection, feedback systems, Image edge detection, Monitoring, optimal sampling, video analytics systems, Visual analytics, Cyber Physical System, Economic and social effects, Edge detection, Embedded systems, Energy efficiency, Feedback, Green computing, Quality of service, Analytics systems, Cybe-physical systems, Cyber-physical systems, Delay, Energy minimization, Energy-consumption, Events detection, Video analytic system, Video analytics, Energy utilization
Nationell ämneskategori
Elektroteknik och elektronik
Identifikatorer
urn:nbn:se:kth:diva-322988 (URN)10.1109/TMC.2022.3165852 (DOI)001022084500019 ()2-s2.0-85128254606 (Scopus ID)
Anmärkning

QC 20251222

Tillgänglig från: 2023-01-11 Skapad: 2023-01-11 Senast uppdaterad: 2025-12-22Bibliografiskt granskad
3. Energy-Optimal Sampling for Edge Computing Feedback Systems: Aperiodic Case
Öppna denna publikation i ny flik eller fönster >>Energy-Optimal Sampling for Edge Computing Feedback Systems: Aperiodic Case
2022 (Engelska)Ingår i: 2022 IEEE/ACM 7TH SYMPOSIUM ON EDGE COMPUTING (SEC 2022), Institute of Electrical and Electronics Engineers (IEEE) , 2022, s. 322-328Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

We study the problem of optimal sampling in an edge-based video analytics system (VAS), where sensor samples collected at a terminal device are offloaded to a back-end server that processes them and generates feedback for a user. Sampling the system with the maximum allowed frequency results in the timely detection of relevant events with minimum delay. However, it incurs high energy costs and causes unnecessary usage of network and compute resources via communication and processing of redundant samples. On the other hand, an infrequent sampling result in a higher delay in detecting the relevant event, thus increasing the idle energy usage and degrading the quality of experience in terms of responsiveness of the system. We quantify this sampling frequency trade-off as a weighted function between the number of samples and the responsiveness. We propose an energy-optimal aperiodic sampling policy that improves over the state-of-the-art optimal periodic sampling policy. Numerically, we show the proposed policy provides a consistent improvement of more than 10% over the state-of-the-art.

Ort, förlag, år, upplaga, sidor
Institute of Electrical and Electronics Engineers (IEEE), 2022
Nyckelord
Event detection, energy minimisation, edge computing, optimal sampling, aperiodic sampling, feedback systems
Nationell ämneskategori
Kommunikationssystem
Identifikatorer
urn:nbn:se:kth:diva-324345 (URN)10.1109/SEC54971.2022.00047 (DOI)000918607200040 ()2-s2.0-85144782948 (Scopus ID)
Konferens
IEEE/ACM 7th Symposium on Edge Computing (SEC), DEC 05-08, 2022, Seattle, WA
Anmärkning

QC 20230228

Tillgänglig från: 2023-02-28 Skapad: 2023-02-28 Senast uppdaterad: 2025-11-27Bibliografiskt granskad
4. Getting the Best Out of Both Worlds: Algorithms for Hierarchical Inference at the Edge
Öppna denna publikation i ny flik eller fönster >>Getting the Best Out of Both Worlds: Algorithms for Hierarchical Inference at the Edge
2024 (Engelska)Ingår i: IEEE Transactions on Machine Learning in Communications and Networking, E-ISSN 2831-316X, Vol. 2, s. 280-297Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

We consider a resource-constrained Edge Device (ED), such as an IoT sensor or a microcontroller unit, embedded with a small-size ML model (S-ML) for a generic classification application and an Edge Server (ES) that hosts a large-size ML model (L-ML). Since the inference accuracy of S-ML is lower than that of the L-ML, offloading all the data samples to the ES results in high inference accuracy, but it defeats the purpose of embedding S-ML on the ED and deprives the benefits of reduced latency, bandwidth savings, and energy efficiency of doing local inference. In order to get the best out of both worlds, i.e., the benefits of doing inference on the ED and the benefits of doing inference on ES, we explore the idea of Hierarchical Inference (HI), wherein S-ML inference is only accepted when it is correct, otherwise the data sample is offloaded for L-ML inference. However, the ideal implementation of HI is infeasible as the correctness of the S-ML inference is not known to the ED. We thus propose an online meta-learning framework that the ED can use to predict the correctness of the S-ML inference. In particular, we propose to use the probability corresponding to the maximum probability class output by S-ML for a data sample and decide whether to offload it or not. The resulting online learning problem turns out to be a Prediction with Expert Advice (PEA) problem with continuous expert space. For a full feedback scenario, where the ED receives feedback on the correctness of the S-ML once it accepts the inference, we propose the HIL-F algorithm and prove a sublinear regret bound √ n ln(1/λ min )/2 without any assumption on the smoothness of the loss function, where n is the number of data samples and λ min is the minimum difference between any two distinct maximum probability values across the data samples. For a no-local feedback scenario, where the ED does not receive the ground truth for the classification, we propose the HIL-N algorithm and prove that it has O ( n 2/3 ln 1/3 (1/λ min )) regret bound. We evaluate and benchmark the performance of the proposed algorithms for image classification application using four datasets, namely, Imagenette and Imagewoof [1], MNIST [2], and CIFAR-10 [3].

Ort, förlag, år, upplaga, sidor
Institute of Electrical and Electronics Engineers (IEEE), 2024
Nationell ämneskategori
Elektroteknik och elektronik Annan elektroteknik och elektronik
Identifikatorer
urn:nbn:se:kth:diva-343505 (URN)10.1109/tmlcn.2024.3366501 (DOI)001488462500001 ()2-s2.0-105027874517 (Scopus ID)
Forskningsfinansiär
Vinnova, 2019-00031Vetenskapsrådet, 2022-03922
Anmärkning

QC 20260128

Tillgänglig från: 2024-02-15 Skapad: 2024-02-15 Senast uppdaterad: 2026-01-28Bibliografiskt granskad
5. Inference Offloading for Cost-Sensitive Binary Classification at the Edge
Öppna denna publikation i ny flik eller fönster >>Inference Offloading for Cost-Sensitive Binary Classification at the Edge
Visa övriga...
(Engelska)Manuskript (preprint) (Övrigt vetenskapligt)
Abstract [en]

We investigate a binary classification problem in an edge intelligence system where false negatives are more costly than false positives. The system features a compact, locally deployed model, supplemented by a larger, remote model that is accessible via the network, albeit at an offloading cost. For each sample, our system first uses the locally deployed model for inference. Based on the output of the local model, the sample may be offloaded to the remote model. This work aims to understand the fundamental trade-off between classification accuracy and the offloading costs within such a hierarchical inference (HI) system. To optimise this system, we propose an online learning framework that continuously adapts a pair of thresholds on the local model's confidence scores. These thresholds determine the prediction of the local model and whether a sample is classified locally or offloaded to the remote model. We present a closed-form solution for the setting where the local model is calibrated. For the more general case of uncalibrated models, we introduce H2T2, an online two-threshold hierarchical inference policy, and prove it achieves sublinear regret. H2T2 is model-agnostic, requires no training, and learns during the inference phase using limited feedback. Simulations on real-world datasets show that H2T2 consistently outperforms naive and single-threshold HI policies, sometimes even surpassing single-threshold offline optima. The policy also demonstrates robustness to distribution shifts and adapts effectively to mismatched classifiers.

Nationell ämneskategori
Datorseende och lärande system
Identifikatorer
urn:nbn:se:kth:diva-373295 (URN)10.48550/arXiv.2509.15674 (DOI)
Anmärkning

QC 20251127

Tillgänglig från: 2025-11-27 Skapad: 2025-11-27 Senast uppdaterad: 2025-11-27Bibliografiskt granskad

Open Access i DiVA

fulltext(3287 kB)511 nedladdningar
Filinformation
Filnamn FULLTEXT03.pdfFilstorlek 3287 kBChecksumma SHA-512
510509aaa8abb00688312bf2cd0fb57d0e1eaa5edcabc2553a5d94b20edaca6bfbfc1b75217ab4fd0545d37b36f7bdc9d7a669c8125e6f980ab47bff700aa65b
Typ fulltextMimetyp application/pdf

Person

Moothedath, Vishnu Narayanan

Sök vidare i DiVA

Av författaren/redaktören
Moothedath, Vishnu Narayanan
Av organisationen
Teknisk informationsvetenskap
Systemvetenskap, informationssystem och informatik

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 511 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

isbn
urn-nbn

Altmetricpoäng

isbn
urn-nbn
Totalt: 2785 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf