kth.sePublications KTH
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Exploring the Boundaries of On-Device Inference: When Tiny Falls Short, Go Hierarchical
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Information Science and Engineering. (Cyber Physical Networking)ORCID iD: 0000-0001-7220-5353
University of Helsinki, Finland.ORCID iD: 0009-0008-4243-4751
IMDEA Networks Institute, Spain.ORCID iD: 0009-0008-2449-5884
IMDEA Networks Institute, Spain.
Show others and affiliations
2025 (English)In: IEEE Internet of Things Journal, ISSN 2327-4662Article in journal (Refereed) Epub ahead of print
Abstract [en]

On-device inference offers significant benefits in edge ML systems, such as improved energy efficiency, responsiveness, and privacy, compared to traditional centralized approaches. However, the resource constraints of embedded devices limit their use to simple inference tasks, creating a trade-off between efficiency and capability. In this context, the Hierarchical Inference (HI) system has emerged as a promising solution that augments the capabilities of the local ML by offloading selected samples to an edge server/cloud for remote ML inference. Existing works, primarily based on simulations, demonstrate that HI improves accuracy. However, they fail to account for the latency and energy consumption in real-world deployments, nor do they consider three key heterogeneous components that characterize ML-enabled IoT systems: hardware, network connectivity, and models. To bridge this gap, this paper systematically evaluates HI against standalone on-device inference by analyzing accuracy, latency, and energy trade-offs across five devices and three image classification datasets. Our findings show that, for a given accuracy requirement, the HI approach we designed achieved up to 73% lower latency and up to 77% lower device energy consumption than an on-device inference system. Despite these gains, HI introduces a fixed energy and latency overhead from on-device inference for all samples. To address this, we propose a hybrid system called Early Exit with HI (EE-HI) and demonstrate that, compared to HI, EE-HI reduces the latency up to 59.7% and lowers the device’s energy consumption up to 60.4%. These findings demonstrate the potential of HI and EE-HI to enable more efficient ML in IoT systems.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2025.
Keywords [en]
Machine learning, on-device inference, TinyML, Hierarchical Inference, Early Exit, processing time and energy measurements
National Category
Embedded Systems Computer Vision and Learning Systems
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-367344DOI: 10.1109/jiot.2025.3583477Scopus ID: 2-s2.0-105009619553OAI: oai:DiVA.org:kth-367344DiVA, id: diva2:1984565
Note

QC 20250718

Available from: 2025-07-16 Created: 2025-07-16 Last updated: 2025-07-18Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Beherae, Adarsh Prasad

Search in DiVA

By author/editor
Beherae, Adarsh PrasadDaubaris, PauliusBravo, IñakiMorabito, RobertoWidmer, JoergChampati, Jaya Prakash
By organisation
Information Science and Engineering
In the same journal
IEEE Internet of Things Journal
Embedded SystemsComputer Vision and Learning Systems

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 61 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf