Goto

Collaborating Authors

 edge computing


Continual Learning at the Edge: An Agnostic IIoT Architecture

García-Santaclara, Pablo, Fernández-Castro, Bruno, Díaz-Redondo, Rebeca P., Calvo-Moa, Carlos, Mariño-Bodelón, Henar

arXiv.org Machine Learning

The exponential growth of Internet-connected devices has presented challenges to traditional centralized computing systems due to latency and bandwidth limitations. Edge computing has evolved to address these difficulties by bringing computations closer to the data source. Additionally, traditional machine learning algorithms are not suitable for edge-computing systems, where data usually arrives in a dynamic and continual way. However, incremental learning offers a good solution for these settings. We introduce a new approach that applies the incremental learning philosophy within an edge-computing scenario for the industrial sector with a specific purpose: real time quality control in a manufacturing system. Applying continual learning we reduce the impact of catastrophic forgetting and provide an efficient and effective solution.


Achieving Trustworthy Real-Time Decision Support Systems with Low-Latency Interpretable AI Models

Deng, Zechun, Liu, Ziwei, Bi, Ziqian, Song, Junhao, Liang, Chia Xin, Yeong, Joe, Song, Xinyuan, Hao, Junfeng

arXiv.org Artificial Intelligence

This paper investigates real-time decision support systems that leverage low-latency AI models, bringing together recent progress in holistic AI-driven decision tools, integration with Edge-IoT technologies, and approaches for effective human-AI teamwork. It looks into how large language models can assist decision-making, especially when resources are limited. The research also examines the effects of technical developments such as DeLLMa, methods for compressing models, and improvements for analytics on edge devices, while also addressing issues like limited resources and the need for adaptable frameworks. Through a detailed review, the paper offers practical perspectives on development strategies and areas of application, adding to the field by pointing out opportunities for more efficient and flexible AI-supported systems. The conclusions set the stage for future breakthroughs in this fast-changing area, highlighting how AI can reshape real-time decision support.


Collaborative Large Language Model Inference via Resource-Aware Parallel Speculative Decoding

Koh, Jungyeon, Yang, Hyun Jong

arXiv.org Artificial Intelligence

The growing demand for on-device large language model (LLM) inference highlights the need for efficient mobile edge computing (MEC) solutions, especially in resource-constrained settings. Speculative decoding offers a promising solution by partitioning token generation between a lightweight draft model on mobile devices and a powerful target model on edge servers, but suffers from communication overhead and asynchronous delays. This paper is the first to propose a unified framework that jointly optimizes user association and resource allocation (UARA) to support efficient parallel speculative decoding. We solve the UARA problem using a multi-agent deep reinforcement learning algorithm. To evaluate our approach under realistic conditions, we conduct experiments using the Sionna simulator. Results show that our method achieves up to 28.0% and an average of 23.7% reduction in end-to-end latency without compromising inference accuracy, enabling scalable and low-latency LLM services in MEC systems.


AirFed: A Federated Graph-Enhanced Multi-Agent Reinforcement Learning Framework for Multi-UAV Cooperative Mobile Edge Computing

Wang, Zhiyu, Raj, Suman, Buyya, Rajkumar

arXiv.org Artificial Intelligence

Multiple Unmanned Aerial Vehicles (UAVs) cooperative Mobile Edge Computing (MEC) systems face critical challenges in coordinating trajectory planning, task offloading, and resource allocation while ensuring Quality of Service (QoS) under dynamic and uncertain environments. Existing approaches suffer from limited scalability, slow convergence, and inefficient knowledge sharing among UAVs, particularly when handling large-scale IoT device deployments with stringent deadline constraints. This paper proposes AirFed, a novel federated graph-enhanced multi-agent reinforcement learning framework that addresses these challenges through three key innovations. First, we design dual-layer dynamic Graph Attention Networks (GATs) that explicitly model spatial-temporal dependencies among UAVs and IoT devices, capturing both service relationships and collaborative interactions within the network topology. Second, we develop a dual-Actor single-Critic architecture that jointly optimizes continuous trajectory control and discrete task offloading decisions. Third, we propose a reputation-based decentralized federated learning mechanism with gradient-sensitive adaptive quantization, enabling efficient and robust knowledge sharing across heterogeneous UAVs. Extensive experiments demonstrate that AirFed achieves 42.9% reduction in weighted cost compared to state-of-the-art baselines, attains over 99% deadline satisfaction and 94.2% IoT device coverage rate, and reduces communication overhead by 54.5%. Scalability analysis confirms robust performance across varying UAV numbers, IoT device densities, and system scales, validating AirFed's practical applicability for large-scale UAV-MEC deployments.


A Hybrid Proactive And Predictive Framework For Edge Cloud Resource Management

Kumar, Hrikshesh, Garg, Anika, Gupta, Anshul, Agarwal, Yashika

arXiv.org Artificial Intelligence

Old cloud edge workload resource management is too reactive. The problem with relying on static thresholds is that we are either overspending for more resources than needed or have reduced performance because of their lack. This is why we work on proactive solutions. A framework developed for it stops reacting to the problems but starts expecting them. We design a hybrid architecture, combining two powerful tools: the CNN LSTM model for time series forecasting and an orchestrator based on multi agent Deep Reinforcement Learning In fact the novelty is in how we combine them as we embed the predictive forecast from the CNN LSTM directly into the DRL agent state space. That is what makes the AI manager smarter it sees the future, which allows it to make better decisions about a long term plan for where to run tasks That means finding that sweet spot between how much money is saved while keeping the system healthy and apps fast for users That is we have given it eyes in order to see down the road so that it does not have to lurch from one problem to another it finds a smooth path forward Our tests show our system easily beats the old methods It is great at solving tough problems like making complex decisions and juggling multiple goals at once like being cheap fast and reliable


Intelligent Orchestration of Distributed Large Foundation Model Inference at the Edge

Koch, Fernando, Djuhera, Aladin, Binotto, Alecio

arXiv.org Artificial Intelligence

Large Foundation Models (LFMs), including multi-modal and generative models, promise to unlock new capabilities for next-generation Edge AI applications. However, performing inference with LFMs in resource-constrained and heterogeneous edge environments, such as Multi-access Edge Computing (MEC), presents significant challenges for workload orchestration due to time-varying network, compute, and storage conditions. In particular, current split inference strategies, which partition LFM layers across nodes, are not designed to adapt to fluctuating workloads, dynamic bandwidth conditions, or evolving privacy constraints in high-utilization MEC environments. In this work, we propose a novel adaptive split inference orchestration framework that elevates both the placement and partitioning of LFM layers to runtime-tunable variables. Specifically, our framework enables real-time, quality-of-service (QoS)-aware management of inference workloads by extending conventional orchestrators with three key services: (1) Capacity-aware workload distribution, which continuously profiles node resources and selects an optimal subset of MEC nodes; (2) Dynamic partition migration, which transparently relocates pre-cut LFM segments in response to changes in utilization or network conditions; (3) Real-time reconfiguration, which dynamically re-splits LFM layers to balance latency, throughput, and privacy. We formalize the joint placement-partitioning problem, outline a reference architecture and algorithmic workflow, and discuss applicability in representative smart city, V2X, and industrial edge scenarios.


Towards Efficient Federated Learning of Networked Mixture-of-Experts for Mobile Edge Computing

Gao, Song, Jing, Shusen, Zhang, Shuai, Wang, Yue, Zhou, Xiangwei, Zhang, Songyang

arXiv.org Artificial Intelligence

Abstract--Recent advancements in large artificial intelligence models (LAMs) are driving significant innovations in mobile edge computing within next-generation wireless networks. However, the substantial demands for computational resources and large-scale training data required to train LAMs conflict with the limited storage and computational capacity of edge devices, posing significant challenges to training and deploying LAMs at the edge. In this work, we introduce the Networked Mixture-of-Experts (NMoE) system, in which clients infer collaboratively by distributing tasks to suitable neighbors based on their expertise and aggregate the returned results. For training the NMoE, we propose a federated learning framework that integrates both supervised and self-supervised learning to balance per-sonalization and generalization, while preserving communication efficiency and data privacy. We conduct extensive experiments to demonstrate the efficacy of the proposed NMoE system, providing insights and benchmarks for the NMoE training algorithms. The recent wave of progress in large artificial intelligence models (LAMs) has triggered a variety of novel technologies, such as large language models (LLMs), vision-language models (VLMs), and artificial intelligence (AI) agents [1], which present exciting opportunities for next-generation wireless communications.


An Agentic Framework for Rapid Deployment of Edge AI Solutions in Industry 5.0

Martinez-Gil, Jorge, Pichler, Mario, Bountouni, Nefeli, Koussouris, Sotiris, Barreiro, Marielena Márquez, Gusmeroli, Sergio

arXiv.org Artificial Intelligence

We present a novel framework for Industry 5.0 that simplifies the deployment of AI models on edge devices in various industrial settings. The design reduces latency and avoids external data transfer by enabling local inference and real-time processing. Our implementation is agent-based, which means that individual agents, whether human, algorithmic, or collaborative, are responsible for well-defined tasks, enabling flexibility and simplifying integration. Moreover, our framework supports modular integration and maintains low resource requirements. Preliminary evaluations concerning the food industry in real scenarios indicate improved deployment time and system adaptability performance. The source code is publicly available at https://github.com/


Reinforcement Learning-based Task Offloading in the Internet of Wearable Things

Qaim, Waleed Bin, Ometov, Aleksandr, Campolo, Claudia, Molinaro, Antonella, Lohan, Elena Simona, Nurmi, Jari

arXiv.org Artificial Intelligence

Over the years, significant contributions have been made by the research and industrial sectors to improve wearable devices towards the Internet of Wearable Things (IoWT) paradigm. However, wearables are still facing several challenges. Many stem from the limited battery power and insufficient computation resources available on wearable devices. On the other hand, with the popularity of smart wearables, there is a consistent increase in the development of new computationally intensive and latency-critical applications. In such a context, task offloading allows wearables to leverage the resources available on nearby edge devices to enhance the overall user experience. This paper proposes a framework for Reinforcement Learning (RL)-based task offloading in the IoWT. We formulate the task offloading process considering the tradeoff between energy consumption and task accomplishment time. Moreover, we model the task offloading problem as a Markov Decision Process (MDP) and utilize the Q-learning technique to enable the wearable device to make optimal task offloading decisions without prior knowledge. We evaluate the performance of the proposed framework through extensive simulations for various applications and system configurations conducted in the ns-3 network simulator. We also show how varying the main system parameters of the Q-learning algorithm affects the overall performance in terms of average task accomplishment time, average energy consumption, and percentage of tasks offloaded.


Percepta: High Performance Stream Processing at the Edge

Sousa, Clarisse, Fonseca, Tiago, Ferreira, Luis Lino, Venâncio, Ricardo, Severino, Ricardo

arXiv.org Artificial Intelligence

Clarisse Sousa, Tiago Fonseca, Luis Lino Ferreira, Ricardo Venâncio, Ricardo Severino INESC - TEC/ Instituto Superior de Engenharia do Porto Porto, Portugal {cassa, calof, llf, ravrf, sev } @isep .ipp.pt Abstract -- The rise of real - time data and the proliferation of Internet of Things (IoT) devices have highlighted the limitations of cloud - centric solutions, particularly regarding latency, bandwidth, and privacy. These challenges have driven the growth of Edge Computing. Associated with IoT appears a set of other problems, like: d ata rate harmonization between multiple sources, protocol conversion, handling the loss of data and the integration with Artificial Intelligence ( AI) models . This paper presents Percepta, a lightweight D ata S tream P rocess ing (DSP) system tailored to support AI workloads at the edge, with a particular focus on such as Reinforcement Lear ning (RL). It introduces specialized features such as reward function computation, data storage for model retraining, and real - time data preparation to support continuous decision - making. Additional functionalities include data normalization, harmonization across hetero geneous protocols and sampling rates, and robust handling of missing or incomplete data, making it well - suited for the challenges of edge - based AI deployment .