Goto

Collaborating Authors

 amc



Appendix 1 Goal generation for executor training

Neural Information Processing Systems

The pseudo goal generation is introduced for training the executor without coordinator. The scripted policy is allowed to access the grounded state, e.g. the absolute position Note that it is not the optimal policy for the executor, it will fail when two targets are far. The notations used here are defined as follows. The objective is to maximize the number of covered targets. After formulation, we can solve the target coverage problem as an ILP problem with CBC optimizer. Then, the primitive actions for all the sensors can be derived from the results of ILP shown as Tab. 1.



Appendix 1 Goal generation for executor training

Neural Information Processing Systems

The pseudo goal generation is introduced for training the executor without coordinator. The scripted policy is allowed to access the grounded state, e.g. the absolute position Note that it is not the optimal policy for the executor, it will fail when two targets are far. The notations used here are defined as follows. The objective is to maximize the number of covered targets. After formulation, we can solve the target coverage problem as an ILP problem with CBC optimizer. Then, the primitive actions for all the sensors can be derived from the results of ILP shown as Tab. 1.


Attractor-merging Crises and Intermittency in Reservoir Computing

Kabayama, Tempei, Komuro, Motomasa, Kuniyoshi, Yasuo, Aihara, Kazuyuki, Nakajima, Kohei

arXiv.org Artificial Intelligence

Reservoir computing can embed attractors into random neural networks (RNNs), generating a ``mirror'' of a target attractor because of its inherent symmetrical constraints. In these RNNs, we report that an attractor-merging crisis accompanied by intermittency emerges simply by adjusting the global parameter. We further reveal its underlying mechanism through a detailed analysis of the phase-space structure and demonstrate that this bifurcation scenario is intrinsic to a general class of RNNs, independent of training data.


What Fundamental Structure in Reward Functions Enables Efficient Sparse-Reward Learning?

Shihab, Ibne Farabi, Akter, Sanjeda, Sharma, Anuj

arXiv.org Artificial Intelligence

Sparse-reward reinforcement learning (RL) remains fundamentally hard: without structure, any agent needs Ω(|S||A|/p) samples to recover rewards. We introduce Policy-A ware Matrix Completion (P AMC) as a first concrete step toward a structural reward learning framework. Our key idea is to exploit approximate low-rank + sparse structure in the reward matrix, under policy-biased (MNAR) sampling. We prove recovery guarantees with inverse-propensity weighting, and establish a visitation-weighted error-to-regret bound linking completion error to control performance. Importantly, when assumptions weaken, P AMC degrades gracefully: confidence intervals widen and the algorithm abstains, ensuring safe fallback to exploration. Empirically, P AMC improves sample efficiency across Atari-26 (10M steps), DM Control, MetaWorld MT50, D4RL offline RL, and preference-based RL benchmarks, outperforming DrQ-v2, DreamerV3, Agent57, T -REX/D-REX, and PrefPPO under compute-normalized comparisons. Our results highlight P AMC as a practical and principled tool when structural rewards exist, and as a concrete first instantiation of a broader structural reward learning perspective. What fundamental properties of reward functions determine the sample complexity of reinforcement learning?


Temporal Sampling for Forgotten Reasoning in LLMs

Li, Yuetai, Xu, Zhangchen, Jiang, Fengqing, Ramasubramanian, Bhaskar, Niu, Luyao, Lin, Bill Yuchen, Yue, Xiang, Poovendran, Radha

arXiv.org Artificial Intelligence

Fine-tuning large language models (LLMs) is intended to improve their reasoning capabilities, yet we uncover a counterintuitive effect: models often forget how to solve problems they previously answered correctly during training. We term this phenomenon T emporal F orgettingand show that it is widespread across model sizes, fine-tuning methods (both Reinforcement Learning and Supervised Fine-Tuning), and multiple reasoning benchmarks. Our analysis reveals that 6.4% to 56.1% of final errors were once solved correctly at an earlier checkpoint. Inspired by the phenomenon of Temporal Forgetting, we proposed T emporal Sampling, a simple decoding strategy that draws outputs from multiple checkpoints along the training trajectory. This approach recovers forgotten solutions without retraining or ensembling, and leads to significant improvements in reasoning performance, gains from 4 to 19 points in Pass@ k and consistent gains for majority-voting and Best-of-N across several benchmarks. To make Temporal Sampling deployment-friendly, we extend it to LoRA-adapted models. By leveraging the temporal diversity inherent in training, Temporal Sampling offers a practical, compute-efficient way to surface hidden reasoning ability and rethink how we evaluate LLMs.Figure 1: (a) We observed that during RL training process of Deepseek-R1-1.5B model, 76.7% of AIME problems were solved correctly at some intermediate checkpoint, yet only 30% remained correct in the final model.


Revolution of Wireless Signal Recognition for 6G: Recent Advances, Challenges and Future Directions

Zhang, Hao, Zhou, Fuhui, Du, Hongyang, Wu, Qihui, Yuen, Chau

arXiv.org Artificial Intelligence

Wireless signal recognition (WSR) is a crucial technique for intelligent communications and spectrum sharing in the next six-generation (6G) wireless communication networks. It can be utilized to enhance network performance and efficiency, improve quality of service (QoS), and improve network security and reliability. Additionally, WSR can be applied for military applications such as signal interception, signal race, and signal abduction. In the past decades, great efforts have been made for the research of WSR. Earlier works mainly focus on model-based methods, including likelihood-based (LB) and feature-based (FB) methods, which have taken the leading position for many years. With the emergence of artificial intelligence (AI), intelligent methods including machine learning-based (ML-based) and deep learning-based (DL-based) methods have been developed to extract the features of the received signals and perform the classification. In this work, we provide a comprehensive review of WSR from the view of applications, main tasks, recent advances, datasets and evaluation metrics, challenges, and future directions. Specifically, intelligent WSR methods are introduced from the perspective of model, data, learning and implementation. Moreover, we analyze the challenges for WSR from the view of complex, dynamic, and open 6G wireless environments and discuss the future directions for WSR. This survey is expected to provide a comprehensive overview of the state-of-the-art WSR techniques and inspire new research directions for WSR in 6G networks.


The Gradient of Algebraic Model Counting

Maene, Jaron, De Raedt, Luc

arXiv.org Artificial Intelligence

Algebraic model counting unifies many inference tasks on logic formulas by exploiting semirings. Rather than focusing on inference, we consider learning, especially in statistical-relational and neurosymbolic AI, which combine logical, probabilistic and neural representations. Concretely, we show that the very same semiring perspective of algebraic model counting also applies to learning. This allows us to unify various learning algorithms by generalizing gradients and backpropagation to different semirings. Furthermore, we show how cancellation and ordering properties of a semiring can be exploited for more memory-efficient backpropagation. This allows us to obtain some interesting variations of state-of-the-art gradient-based optimisation methods for probabilistic logical models. We also discuss why algebraic model counting on tractable circuits does not lead to more efficient second-order optimization. Empirically, our algebraic backpropagation exhibits considerable speed-ups as compared to existing approaches.


Meta-Learning Guided Label Noise Distillation for Robust Signal Modulation Classification

Hao, Xiaoyang, Feng, Zhixi, Peng, Tongqing, Yang, Shuyuan

arXiv.org Artificial Intelligence

Automatic modulation classification (AMC) is an effective way to deal with physical layer threats of the internet of things (IoT). However, there is often label mislabeling in practice, which significantly impacts the performance and robustness of deep neural networks (DNNs). In this paper, we propose a meta-learning guided label noise distillation method for robust AMC. Specifically, a teacher-student heterogeneous network (TSHN) framework is proposed to distill and reuse label noise. Based on the idea that labels are representations, the teacher network with trusted meta-learning divides and conquers untrusted label samples and then guides the student network to learn better by reassessing and correcting labels. Furthermore, we propose a multi-view signal (MVS) method to further improve the performance of hard-to-classify categories with few-shot trusted label samples. Extensive experimental results show that our methods can significantly improve the performance and robustness of signal AMC in various and complex label noise scenarios, which is crucial for securing IoT applications.