AITopics | multiple importance sampling

Collaborating Authors

multiple importance sampling

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization

Dong, Yihong, Jiang, Xue, Tao, Yongding, Liu, Huanyu, Zhang, Kechi, Mou, Lili, Cao, Rongyu, Ma, Yingwei, Chen, Jue, Li, Binhua, Jin, Zhi, Huang, Fei, Li, Yongbin, Li, Ge

arXiv.org Artificial IntelligenceOct-21-2025

Reinforcement Learning with V erifiable Reward (RL VR) has significantly advanced the complex reasoning abilities of Large Language Models (LLMs). However, it struggles to break through the inherent capability boundaries of the base LLM, due to its essentially on-policy strategy coupled with LLM's immense action space and sparse reward. Critically, RL VR can lead to the capability boundary collapse, narrowing the LLM's problem-solving scope. To address this problem, we propose RL-PLUS, a novel hybrid-policy optimization approach for LLMs that synergizes internal exploitation with external data to achieve stronger reasoning capabilities and surpass the boundaries of base models. RL-PLUS integrates two core components, i.e., Multiple Importance Sampling to address distributional mismatch from external data, and Exploration-Based Advantage Function to guide the model towards high-value, unexplored reasoning paths. We provide both theoretical analysis and extensive experiments to demonstrate the superiority and gen-eralizability of our approach. Compared with existing RL VR methods, RL-PLUS achieves 1) state-of-the-art performance on six math reasoning benchmarks; 2) superior performance on six out-of-distribution reasoning tasks; 3) consistent and significant gains across diverse model families, with average relative improvements up to 69.2%. Moreover, the analysis of Pass@k curves indicates that RL-PLUS effectively resolves the capability boundary collapse problem.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2508.00222

Country: North America > Canada (0.28)

Genre: Research Report (1.00)

Add feedback

Conservative Optimistic Policy Optimization via Multiple Importance Sampling

Azize, Achraf, Gaizi, Othman

arXiv.org Machine LearningMar-4-2021

Reinforcement Learning (RL) has been able to solve hard problems such as playing Atari games or solving the game of Go, with a unified approach. Yet modern deep RL approaches are still not widely used in real-world applications. One reason could be the lack of guarantees on the performance of the intermediate executed policies, compared to an existing (already working) baseline policy. In this paper, we propose an online model-free algorithm that solves conservative exploration in the policy optimization problem. We show that the regret of the proposed approach is bounded by $\tilde{\mathcal{O}}(\sqrt{T})$ for both discrete and continuous parameter spaces.

algorithm, conservative optimistic policy optimization, multiple importance sampling, (8 more...)

arXiv.org Machine Learning

2103.03307

Country: North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games > Computer Games (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)

Add feedback