Goto

Collaborating Authors

 tnnl


PPO-BR: Dual-Signal Entropy-Reward Adaptation for Trust Region Policy Optimization

arXiv.org Artificial Intelligence

PPO - BR establishes a new paradigm in adaptive RL by fusing exploration and convergence signals into a single bounded trust region -- a theoretically - grounded innovation (Theorem 1) that outperforms 5 SOTA baselines with <2% overhead (Fig 3). This work bridges a critical gap in phase - aware learning, enabling real - world deployment in safety - critical systems like robotic surgery (Appendix E) within a single theoretically - grounded trust region mechanism (Theorem 1), achieving 29.1% faster convergence: (1) Entropy - driven expansion (ฯต) promotes exploration in high - uncertainty states, while (2) reward - guided contraction (ฯต) enforces stability during convergence (Theorem 1). On 6 diverse benchmarks (MuJoCo/Atari/sparse - reward), PPO - BR achieves: 29.1% fa ster convergence (p < 0.001, Wilcoxon test), 2.3 lower reward variance vs PPO (Fig 3), and <1.8% runtime overhead with just 5 lines of code change (Algorithm 1). PPO - BR's plug - and - play simplicity and theoretical guarantees (Lemma 2) make it ready - to - deplo y in safety - critical systems -- from surgical robotics to autonomous drones -- where adaptive stability is non - negotiable . In contrast to recent methods such as Group Relative Policy Optimization (GRPO), PPO - BR offers a unified entropy - reward adaptive mechanism applicable to both language models and general reinforcement learning environments.


Effective Layer Pruning Through Similarity Metric Perspective

arXiv.org Artificial Intelligence

Deep neural networks have been the predominant paradigm in machine learning for solving cognitive tasks. Such models, however, are restricted by a high computational overhead, limiting their applicability and hindering advancements in the field. Extensive research demonstrated that pruning structures from these models is a straightforward approach to reducing network complexity. In this direction, most efforts focus on removing weights or filters. Studies have also been devoted to layer pruning as it promotes superior computational gains. However, layer pruning often hurts the network predictive ability (i.e., accuracy) at high compression rates. This work introduces an effective layer-pruning strategy that meets all underlying properties pursued by pruning methods. Our method estimates the relative importance of a layer using the Centered Kernel Alignment (CKA) metric, employed to measure the similarity between the representations of the unpruned model and a candidate layer for pruning. We confirm the effectiveness of our method on standard architectures and benchmarks, in which it outperforms existing layer-pruning strategies and other state-of-the-art pruning techniques. Particularly, we remove more than 75% of computation while improving predictive ability. At higher compression regimes, our method exhibits negligible accuracy drop, while other methods notably deteriorate model accuracy. Apart from these benefits, our pruned models exhibit robustness to adversarial and out-of-distribution samples.


Improving the Timing Resolution of Positron Emission Tomography Detectors Using Boosted Learning -- A Residual Physics Approach

arXiv.org Artificial Intelligence

Artificial intelligence (AI) is entering medical imaging, mainly enhancing image reconstruction. Nevertheless, improvements throughout the entire processing, from signal detection to computation, potentially offer significant benefits. This work presents a novel and versatile approach to detector optimization using machine learning (ML) and residual physics. We apply the concept to positron emission tomography (PET), intending to improve the coincidence time resolution (CTR). PET visualizes metabolic processes in the body by detecting photons with scintillation detectors. Improved CTR performance offers the advantage of reducing radioactive dose exposure for patients. Modern PET detectors with sophisticated concepts and read-out topologies represent complex physical and electronic systems requiring dedicated calibration techniques. Traditional methods primarily depend on analytical formulations successfully describing the main detector characteristics. However, when accounting for higher-order effects, additional complexities arise matching theoretical models to experimental reality. Our work addresses this challenge by combining traditional calibration with AI and residual physics, presenting a highly promising approach. We present a residual physics-based strategy using gradient tree boosting and physics-guided data generation. The explainable AI framework SHapley Additive exPlanations (SHAP) was used to identify known physical effects with learned patterns. In addition, the models were tested against basic physical laws. We were able to improve the CTR significantly (more than 20%) for clinically relevant detectors of 19 mm height, reaching CTRs of 185 ps (450-550 keV).


A Survey on Multi-output Learning

arXiv.org Machine Learning

Multi-output learning aims to simultaneously predict multiple outputs given an input. It is an important learning problem due to the pressing need for sophisticated decision making in real-world applications. Inspired by big data, the 4Vs characteristics of multi-output imposes a set of challenges to multi-output learning, in terms of the volume, velocity, variety and veracity of the outputs. Increasing number of works in the literature have been devoted to the study of multi-output learning and the development of novel approaches for addressing the challenges encountered. However, it lacks a comprehensive overview on different types of challenges of multi-output learning brought by the characteristics of the multiple outputs and the techniques proposed to overcome the challenges. This paper thus attempts to fill in this gap to provide a comprehensive review on this area. We first introduce different stages of the life cycle of the output labels. Then we present the paradigm on multi-output learning, including its myriads of output structures, definitions of its different sub-problems, model evaluation metrics and popular data repositories used in the study. Subsequently, we review a number of state-of-the-art multi-output learning methods, which are categorized based on the challenges.