Goto

Collaborating Authors

 continuum




Beyond Log Likelihood: Probability-Based Objectives for Supervised Fine-Tuning across the Model Capability Continuum

Li, Gaotang, Qiu, Ruizhong, Chen, Xiusi, Ji, Heng, Tong, Hanghang

arXiv.org Artificial Intelligence

Supervised fine-tuning (SFT) is the standard approach for post-training large language models (LLMs), yet it often shows limited generalization. We trace this limitation to its default training objective: negative log likelihood (NLL). While NLL is classically optimal when training from scratch, post-training operates in a different paradigm and could violate its optimality assumptions, where models already encode task-relevant priors and supervision can be long and noisy. To this end, we study a general family of probability-based objectives and characterize their effectiveness under different conditions. Through comprehensive experiments and extensive ablation studies across 7 model backbones, 14 benchmarks, and 3 domains, we uncover a critical dimension that governs objective behavior: the model-capability continuum. Our theoretical analysis further elucidates how objectives trade places across the continuum, providing a principled foundation for adapting objectives to model capability. Supervised fine-tuning (SFT) has become a standard approach for post-training large language models (LLMs), widely used to elicit and strengthen their capabilities (Zhang et al., 2023; Chung et al., 2024). Despite its popularity, many existing studies find that SFT often exhibits limited generalization (Ouyang et al., 2022; Chu et al., 2025). Nevertheless, this limitation may not arise from the SFT paradigm itself. Instead, we find that it may stem from its default training objective: negative log likelihood (NLL, log p). We surprisingly find that other objectives significantly outperform NLL on some tasks, as shown in Tab. 1. Table 1: Other objectives can significantly outperform NLL.


Optimistic policy iteration and natural actor-critic: A unifying view and a non-optimality result

Neural Information Processing Systems

Approximate dynamic programming approaches to the reinforcement learning problem are often categorized into greedy value function methods and value-based policy gradient methods. As our first main result, we show that an important subset of the latter methodology is, in fact, a limiting special case of a general formulation of the former methodology; optimistic policy iteration encompasses not only most of the greedy value function methods but also natural actor-critic methods, and permits one to directly interpolate between them. The resulting continuum adjusts the strength of the Markov assumption in policy improvement and, as such, can be seen as dual in spirit to the continuum in TD($\lambda$)-style algorithms in policy evaluation. As our second main result, we show for a substantial subset of soft-greedy value function approaches that, while having the potential to avoid policy oscillation and policy chattering, this subset can never converge toward any optimal policy, except in a certain pathological case. Consequently, in the context of approximations, the majority of greedy value function methods seem to be deemed to suffer either from the risk of oscillation/chattering or from the presence of systematic sub-optimality.




A Novel Compound AI Model for 6G Networks in 3D Continuum

Gravara, Milos, Stanisic, Andrija, Nastic, Stefan

arXiv.org Artificial Intelligence

The 3D continuum presents a complex environment that spans the terrestrial, aerial and space domains, with 6Gnetworks serving as a key enabling technology. Current AI approaches for network management rely on monolithic models that fail to capture cross-domain interactions, lack adaptability,and demand prohibitive computational resources. This paper presents a formal model of Compound AI systems, introducing a novel tripartite framework that decomposes complex tasks into specialized, interoperable modules. The proposed modular architecture provides essential capabilities to address the unique challenges of 6G networks in the 3D continuum, where heterogeneous components require coordinated, yet distributed, intelligence. This approach introduces a fundamental trade-off between model and system performance, which must be carefully addressed. Furthermore, we identify key challenges faced by Compound AI systems within 6G networks operating in the 3D continuum, including cross-domain resource orchestration, adaptation to dynamic topologies, and the maintenance of consistent AI service quality across heterogeneous environments.


DCentNet: Decentralized Multistage Biomedical Signal Classification using Early Exits

Li, Xiaolin, Huang, Binhua, Cardiff, Barry, John, Deepu

arXiv.org Artificial Intelligence

This paper presents DCentNet, a novel decentralized multistage signal classification approach for biomedical data obtained from Internet of Things (IoT) wearable sensors, utilizing early exit point (EEP) to improve both energy e fficiency and processing speed. Traditionally, IoT sensor data is processed in a centralized manner on a single node, Cloud-native or Edge-native, which comes with several restrictions, such as significant energy consumption on the edge sensor and greater latency. To address these limitations, we propose DCentNet, a decentralized method based on Convolutional Neural Network (CNN) classifiers, where a single CNN model is partitioned into several sub-networks using one or more EEPs. Our method introduces encoder-decoder pairs at EEPs, which serve to compress large feature maps before transferring them to the next sub-network, drastically reducing wireless data transmission and power consumption. When the input can be confidently classified at an EEP, the processing can terminate early without traversing the entire network. To minimize sensor energy consumption and overall complexity, the initial sub-networks can be set up in the fog or on the edge. We also explore di fferent EEP locations and demonstrate that the choice of EEP can be altered to achieve a trade-o ff between performance and complexity by employing a genetic algorithm approach. DCentNet addresses the limitations of centralized processing in IoT wearable sensor data analysis, o ff ering improved e fficiency and performance. The experimental results of electrocardiogram (ECG) classification validate the success of our proposed method. With one EEP, the system saves 94.54% of wireless data transmission and a corresponding 21% decrease in complexity, while the classification accuracy and sensitivity remain almost una ffected and stay at their original levels. When employing two EEPs, the system demonstrates a sensitivity of 98.36% and an accuracy of 97.74%, concurrently leading to a 91.86% reduction in wireless data transmission and a reduction in complexity by 22%. DCentNet is implemented on an ARM Cortex-M4 based microcontroller unit (MCU).


Adaptive AI-based Decentralized Resource Management in the Cloud-Edge Continuum

Li, Lanpei, Bell, Jack, Coppola, Massimo, Lomonaco, Vincenzo

arXiv.org Artificial Intelligence

The increasing complexity of application requirements and the dynamic nature of the Cloud-Edge Continuum present significant challenges for efficient resource management. These challenges stem from the ever-changing infrastructure, which is characterized by additions, removals, and reconfigurations of nodes and links, as well as the variability of application workloads. Traditional centralized approaches struggle to adapt to these changes due to their static nature, while decentralized solutions face challenges such as limited global visibility and coordination overhead. This paper proposes a hybrid decentralized framework for dynamic application placement and resource management. The framework utilizes Graph Neural Networks (GNNs) to embed resource and application states, enabling comprehensive representation and efficient decision-making. It employs a collaborative multi-agent reinforcement learning (MARL) approach, where local agents optimize resource management in their neighborhoods and a global orchestrator ensures system-wide coordination. By combining decentralized application placement with centralized oversight, our framework addresses the scalability, adaptability, and accuracy challenges inherent in the Cloud-Edge Continuum. This work contributes to the development of decentralized application placement strategies, the integration of GNN embeddings, and collaborative MARL systems, providing a foundation for efficient, adaptive and scalable resource management.


Optimizing LoRa for Edge Computing with TinyML Pipeline for Channel Hopping

Grunewald, Marla, Bensalem, Mounir, Jukan, Admela

arXiv.org Artificial Intelligence

We propose to integrate long-distance LongRange (LoRa) communication solution for sending the data from IoT to the edge computing system, by taking advantage of its unlicensed nature and the potential for open source implementations that are common in edge computing. We propose a channel hoping optimization model and apply TinyML-based channel hoping model based for LoRa transmissions, as well as experimentally study a fast predictive algorithm to find free channels between edge and IoT devices. In the open source experimental setup that includes LoRa, TinyML and IoT-edge-cloud continuum, we integrate a novel application workflow and cloud-friendly protocol solutions in a case study of plant recommender application that combines concepts of microfarming and urban computing. In a LoRa-optimized edge computing setup, we engineer the application workflow, and apply collaborative filtering and various machine learning algorithms on application data collected to identify and recommend the planting schedule for a specific microfarm in an urban area. In the LoRa experiments, we measure the occurrence of packet loss, RSSI, and SNR, using a random channel hoping scheme to compare with our proposed TinyML method. The results show that it is feasible to use TinyML in microcontrollers for channel hopping, while proving the effectiveness of TinyML in learning to predict the best channel to select for LoRa transmission, and by improving the RSSI by up to 63 %, SNR by up to 44 % in comparison with a random hopping mechanism.