Goto

Collaborating Authors

 latency and energy consumption


RL-Driven Security-Aware Resource Allocation Framework for UAV-Assisted O-RAN

arXiv.org Artificial Intelligence

The integration of Unmanned Aerial Vehicles (UAVs) into Open Radio Access Networks (O-RAN) enhances communication in disaster management and Search and Rescue (SAR) operations by ensuring connectivity when infrastructure fails. However, SAR scenarios demand stringent security and low-latency communication, as delays or breaches can compromise mission success. While UAVs serve as mobile relays, they introduce challenges in energy consumption and resource management, necessitating intelligent allocation strategies. Existing UAV-assisted O-RAN approaches often overlook the joint optimization of security, latency, and energy efficiency in dynamic environments. This paper proposes a novel Reinforcement Learning (RL)-based framework for dynamic resource allocation in UAV relays, explicitly addressing these trade-offs. Our approach formulates an optimization problem that integrates security-aware resource allocation, latency minimization, and energy efficiency, which is solved using RL. Unlike heuristic or static methods, our framework adapts in real-time to network dynamics, ensuring robust communication. Simulations demonstrate superior performance compared to heuristic baselines, achieving enhanced security and energy efficiency while maintaining ultra-low latency in SAR scenarios.


Spatial Computing Communications for Multi-User Virtual Reality in Distributed Mobile Edge Computing Network

arXiv.org Artificial Intelligence

Abstract--Immersive virtual reality (VR) applications impose stringent requirements on latency, energy efficiency, and computational resources, particularly in multi-user interactive scenarios. T o address these challenges, we introduce the concept of spatial computing communications (SCC), a framework designed to meet the latency and energy demands of multi-user VR over distributed mobile edge computing (MEC) networks. SCC jointly represents the physical space, defined by users and base stations, and the virtual space, representing shared immersive environments, using a probabilistic model of user dynamics and resource requirements. The resource deployment task is then formulated as a multi-objective combinatorial optimization (MOCO) problem that simultaneously minimizes system latency and energy consumption across distributed MEC resources. T o solve this problem, we propose MO-CMPO, a multi-objective consistency model with policy optimization that integrates supervised learning and reinforcement learning (RL) fine-tuning guided by preference weights. Leveraging a sparse graph neural network (GNN), MO-CMPO efficiently generates Pareto-optimal solutions. Simulations with real-world New Radio base station datasets demonstrate that MO-CMPO achieves superior hypervolume performance and significantly lower inference latency than baseline methods. Furthermore, the analysis reveals practical deployment patterns: latency-oriented solutions favor local MEC execution to reduce transmission delay, while energy-oriented solutions minimize redundant placements to save energy.


A Model Aware AIGC Task Offloading Algorithm in IIoT Edge Computing

arXiv.org Artificial Intelligence

The integration of the Industrial Internet of Things (IIoT) with Artificial Intelligence-Generated Content (AIGC) offers new opportunities for smart manufacturing, but it also introduces challenges related to computation-intensive tasks and low-latency demands. Traditional generative models based on cloud computing are difficult to meet the real-time requirements of AIGC tasks in IIoT environments, and edge computing can effectively reduce latency through task offloading. However, the dynamic nature of AIGC tasks, model switching delays, and resource constraints impose higher demands on edge computing environments. To address these challenges, this paper proposes an AIGC task offloading framework tailored for IIoT edge computing environments, considering the latency and energy consumption caused by AIGC model switching for the first time. IIoT devices acted as multi-agent collaboratively offload their dynamic AIGC tasks to the most appropriate edge servers deployed with different generative models. A model aware AIGC task offloading algorithm based on Multi-Agent Deep Deterministic Policy Gradient (MADDPG-MATO) is devised to minimize the latency and energy. Experimental results show that MADDPG-MATO outperforms baseline algorithms, achieving an average reduction of 6.98% in latency, 7.12% in energy consumption, and a 3.72% increase in task completion rate across four sets of experiments with model numbers ranging from 3 to 6, it is demonstrated that the proposed algorithm is robust and efficient in dynamic, high-load IIoT environments.


EfficientLLM: Efficiency in Large Language Models

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have driven significant progress, yet their growing parameter counts and context windows incur prohibitive compute, energy, and monetary costs. We introduce EfficientLLM, a novel benchmark and the first comprehensive empirical study evaluating efficiency techniques for LLMs at scale. Conducted on a production-class cluster (48xGH200, 8xH200 GPUs), our study systematically explores three key axes: (1) architecture pretraining (efficient attention variants: MQA, GQA, MLA, NSA; sparse Mixture-of-Experts (MoE)), (2) fine-tuning (parameter-efficient methods: LoRA, RSLoRA, DoRA), and (3) inference (quantization methods: int4, float16). We define six fine-grained metrics (Memory Utilization, Compute Utilization, Latency, Throughput, Energy Consumption, Compression Rate) to capture hardware saturation, latency-throughput balance, and carbon cost. Evaluating over 100 model-technique pairs (0.5B-72B parameters), we derive three core insights: (i) Efficiency involves quantifiable trade-offs: no single method is universally optimal; e.g., MoE reduces FLOPs and improves accuracy but increases VRAM by 40%, while int4 quantization cuts memory/energy by up to 3.9x at a 3-5% accuracy drop. (ii) Optima are task- and scale-dependent: MQA offers optimal memory-latency trade-offs for constrained devices, MLA achieves lowest perplexity for quality-critical tasks, and RSLoRA surpasses LoRA efficiency only beyond 14B parameters. (iii) Techniques generalize across modalities: we extend evaluations to Large Vision Models (Stable Diffusion 3.5, Wan 2.1) and Vision-Language Models (Qwen2.5-VL), confirming effective transferability. By open-sourcing datasets, evaluation pipelines, and leaderboards, EfficientLLM provides essential guidance for researchers and engineers navigating the efficiency-performance landscape of next-generation foundation models.


Intelligent Task Offloading in VANETs: A Hybrid AI-Driven Approach for Low-Latency and Energy Efficiency

arXiv.org Artificial Intelligence

Intelligent Task Offloading in V ANETs: A Hybrid AI-Driven Approach for Low-Latency and Energy Efficiency Tariq Qayyum, Asadullah Tariq, Muhammad Ali, Mohamed Adel Serhani, Zouheir Trabelsi, Maite L opez-S anchez College of Information Technology, United Arab Emirates University, Al Ain, Abu Dhabi, UAE Department of Computer Science, SCIT, Beaconhouse National University, Lahore Pakistan College of Computing and Informatics, University of Sharjah, Sharjah, UAE Department of Mathematics, University of Barcelona, Gran Via de les Corts Catalanes, Barcelona Abstract --V ehicular Ad-hoc Networks (V ANETs) are integral to intelligent transportation systems, enabling vehicles to offload computational tasks to nearby roadside units (RSUs) and mobile edge computing (MEC) servers for real-time processing. However, the highly dynamic nature of V ANETs introduces challenges, such as unpredictable network conditions, high latency, energy inefficiency, and task failure. Extensive simulations demonstrate that the proposed framework achieves significant reductions in latency and energy usage while improving task success rates and network throughput. By offering an efficient, and scalable solution, this framework sets the foundation for enhancing real-time applications in dynamic vehicular environments. Index T erms--V ANET, T ask offloading, vehicular communication, resource allocation, scalable I.


Cooperative Task Offloading through Asynchronous Deep Reinforcement Learning in Mobile Edge Computing for Future Networks

arXiv.org Artificial Intelligence

Cooperative Task Offloading through Asynchronous Deep Reinforcement Learning in Mobile Edge Computing for Future Networks Y uelin Liu, Haiyuan Li, Xenofon V asilakos, Rasheed Hussain, and Dimitra Simeonidou High Performance Networks (HPN) Research Group, Smart Internet Lab, University of Bristol, Bristol, UK Email: { name }. {surname}@bristol.ac.uk Abstract --Future networks (including 6G) are poised to accelerate the realisation of Internet of Everything. The latter will imply a high demand for computational resources to support new services. Mobile Edge Computing (MEC) is a promising solution that enables offloading computation-intensive tasks to nearby edge servers from the end-user devices, thereby reducing latency and energy consumption . Nevertheless, relying solely on a single MEC server for task offloading can lead to uneven resource utilisation and suboptimal performance in complex scenarios. Additionally, traditional task offloading strategies specialise in centralised policy decisions, which unavoidably entails extreme transmission latency and reach computational bottleneck. T o address these gaps, we propose a latency-efficient and energy-efficient Cooperative T ask Offloading framework with Transformer-driven Prediction (CTO-TP), leveraging asynchronous multi-agent deep reinforcement learning to address these challenges. This approach fosters edge-edge cooperation and decreases the synchronous waiting time by performing asynchronous training, optimis-ing task offloading, and resource allocation across distributed networks. The performance evaluation demonstrates that the proposed CTO-TP algorithm reduces up to 80% overall system latency and 87% energy consumption compared to the baseline schemes.


Latenrgy: Model Agnostic Latency and Energy Consumption Prediction for Binary Classifiers

arXiv.org Artificial Intelligence

Machine learning systems increasingly drive innovation across scientific fields and industry, yet challenges in compute overhead, specifically during inference, limit their scalability and sustainability. Responsible AI guardrails, essential for ensuring fairness, transparency, and privacy, further exacerbate these computational demands. This study addresses critical gaps in the literature, chiefly the lack of generalized predictive techniques for latency and energy consumption, limited cross-comparisons of classifiers, and unquantified impacts of RAI guardrails on inference performance. Using Theory Construction Methodology, this work constructed a model-agnostic theoretical framework for predicting latency and energy consumption in binary classification models during inference. The framework synthesizes classifier characteristics, dataset properties, and RAI guardrails into a unified analytical instrument. Two predictive equations are derived that capture the interplay between these factors while offering generalizability across diverse classifiers. The proposed framework provides foundational insights for designing efficient, responsible ML systems. It enables researchers to benchmark and optimize inference performance and assists practitioners in deploying scalable solutions. Finally, this work establishes a theoretical foundation for balancing computational efficiency with ethical AI principles, paving the way for future empirical validation and broader applications.


Exploring the Boundaries of On-Device Inference: When Tiny Falls Short, Go Hierarchical

arXiv.org Artificial Intelligence

On-device inference holds great potential for increased energy efficiency, responsiveness, and privacy in edge ML systems. However, due to less capable ML models that can be embedded in resource-limited devices, use cases are limited to simple inference tasks such as visual keyword spotting, gesture recognition, and predictive analytics. In this context, the Hierarchical Inference (HI) system has emerged as a promising solution that augments the capabilities of the local ML by offloading selected samples to an edge server or cloud for remote ML inference. Existing works demonstrate through simulation that HI improves accuracy. However, they do not account for the latency and energy consumption on the device, nor do they consider three key heterogeneous dimensions that characterize ML systems: hardware, network connectivity, and models. In contrast, this paper systematically compares the performance of HI with on-device inference based on measurements of accuracy, latency, and energy for running embedded ML models on five devices with different capabilities and three image classification datasets. For a given accuracy requirement, the HI systems we designed achieved up to 73% lower latency and up to 77% lower device energy consumption than an on-device inference system. The key to building an efficient HI system is the availability of small-size, reasonably accurate on-device models whose outputs can be effectively differentiated for samples that require remote inference. Despite the performance gains, HI requires on-device inference for all samples, which adds a fixed overhead to its latency and energy consumption. Therefore, we design a hybrid system, Early Exit with HI (EE-HI), and demonstrate that compared to HI, EE-HI reduces the latency by up to 59.7% and lowers the device's energy consumption by up to 60.4%.


Global Big Data Conference

#artificialintelligence

Edge AI devices, systems that combine artificial intelligence (AI) and edge computing techniques, are becoming an essential part of the rapidly growing Internet of Things (IoT) ecosystem. These devices include smart speakers, smart phones, robots, self-driven cars, drones and data-processing surveillance cameras. While these technologies have become increasingly advanced over the past few years, most of them exhibit limited energy efficiencies, inference accuracies, and battery lifetimes. Non-volatile computing-in-memory (nvCIM) architectures, an emerging class of approaches that minimize the movement of data between processors and memory components, could help to significantly reduce the latency and energy consumption associated with complex AI computations. Researchers at the Taiwan Semiconductor Manufacturing Company (TSMC) recently developed a new four-megabit (4Mb) nvCIM approach that could help to improve the overall performance of edge AI devices.