AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Advantage Shaping as Surrogate Reward Maximization: Unifying Pass@K Policy Gradients

Thrampoulidis, Christos, Mahdavi, Sadegh, Deng, Wenlong

arXiv.org Artificial IntelligenceNov-17-2025

This note reconciles two seemingly distinct approaches to policy gradient optimization for the Pass@K objective in reinforcement learning with verifiable rewards: (1) direct REINFORCE-style methods, and (2) advantage-shaping techniques that directly modify GRPO. We show that these are two sides of the same coin. By reverse-engineering existing advantage-shaping algorithms, we reveal that they implicitly optimize surrogate rewards. We specifically interpret practical "hard-example up-weighting" modifications to GRPO as reward-level regularization. Conversely, starting from surrogate reward objectives, we provide a simple recipe for deriving both existing and new advantage-shaping methods. This perspective provides a lens for RLVR policy gradient optimization beyond our original motivation of Pass@K.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2510.23049

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.35)

Add feedback

DRMD: Deep Reinforcement Learning for Malware Detection under Concept Drift

McFadden, Shae, Foley, Myles, D'Onghia, Mario, Hicks, Chris, Mavroudis, Vasilios, Paoletti, Nicola, Pierazzi, Fabio

arXiv.org Artificial IntelligenceNov-17-2025

Malware detection in real-world settings must deal with evolving threats, limited labeling budgets, and uncertain predictions. Traditional classifiers, without additional mechanisms, struggle to maintain performance under concept drift in malware domains, as their supervised learning formulation cannot optimize when to defer decisions to manual labeling and adaptation. Modern malware detection pipelines combine classifiers with monthly active learning (AL) and rejection mechanisms to mitigate the impact of concept drift. In this work, we develop a novel formulation of malware detection as a one-step Markov Decision Process and train a deep reinforcement learning (DRL) agent, simultaneously optimizing sample classification performance and rejecting high-risk samples for manual labeling. We evaluated the joint detection and drift mitigation policy learned by the DRL-based Malware Detection (DRMD) agent through time-aware evaluations on Android malware datasets subject to realistic drift requiring multi-year performance stability. The policies learned under these conditions achieve a higher Area Under Time (AUT) performance compared to standard classification approaches used in the domain, showing improved resilience to concept drift. Specifically, the DRMD agent achieved an average AUT improvement of 8.66 and 10.90 for the classification-only and classification-rejection policies, respectively. Our results demonstrate for the first time that DRL can facilitate effective malware detection and improved resiliency to concept drift in the dynamic setting of Android malware detection.

machine learning, malware detection, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2508.18839

Country: Europe (0.46)

Genre: Research Report > New Finding (0.87)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Experimental investigation of pose informed reinforcement learning for skid-steered visual navigation

Salvi, Ameya, Krovi, Venkat

arXiv.org Artificial IntelligenceNov-17-2025

Vision-based lane keeping is a topic of significant interest in the robotics and autonomous ground vehicles communities in various on-road and off-road applications. The skid-steered vehicle architecture has served as a useful vehicle platform for human controlled operations. However, systematic modeling, especially of the skid-slip wheel terrain interactions (primarily in off-road settings) has created bottlenecks for automation deployment. End-to-end learning based methods such as imitation learning and deep reinforcement learning, have gained prominence as a viable deployment option to counter the lack of accurate analytical models. However, the systematic formulation and subsequent verification/validation in dynamic operation regimes (particularly for skid-steered vehicles) remains a work in progress. To this end, a novel approach for structured formulation for learning visual navigation is proposed and investigated in this work. Extensive software simulations, hardware evaluations and ablation studies now highlight the significantly improved performance of the proposed approach against contemporary literature.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TFR.2025.3599118

2506.21732

Country:

North America > United States (0.46)
Oceania > Australia (0.28)

Genre: Research Report > Promising Solution (0.34)

Industry:

Automobiles & Trucks (1.00)
Transportation > Ground > Road (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Leveraging Factored Action Spaces for Efficient Offline Reinforcement Learning in Healthcare Shengpu T ang 1 Maggie Makar 1 Michael W. Sjoding 2 Finale Doshi-V elez 3

Neural Information Processing SystemsNov-16-2025, 17:31:56 GMT

For example, in healthcare, an action may correspond to a combination of drugs and treatments.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.69)
Health & Medicine > Pharmaceuticals & Biotechnology (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

d3e2d61af1e9612ddecd099144e50404-Paper-Conference.pdf

Neural Information Processing SystemsNov-16-2025, 09:03:17 GMT

curriculum, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Austria (0.04)
(10 more...)

Genre: Research Report (0.68)

Industry:

Education (1.00)
Leisure & Entertainment (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Efficient Risk-Averse Reinforcement Learning

Neural Information Processing SystemsNov-16-2025, 08:04:07 GMT

In risk-averse reinforcement learning (RL), the goal is to optimize some risk measure of the returns.

cesor, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Oregon (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)

Add feedback

Improving Online Rent-or-Buy Algorithms with Sequential Decision Making and ML Predictions

Neural Information Processing SystemsNov-15-2025, 15:21:31 GMT

In this work we study online rent-buy problems as a sequential decision making problem.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Blue Earth County > Mankato (0.04)
North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.61)

Add feedback

7e0af0d1bc0ec2a90fc294be2e00447e-Paper-Conference.pdf

Neural Information Processing SystemsNov-15-2025, 06:23:59 GMT

agent, arxiv preprint arxiv, matching, (12 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.14)
North America > United States > California > Alameda County > Berkeley (0.14)
North America > United States > Connecticut > New Haven County > New Haven (0.04)
(4 more...)

Industry:

Transportation > Passenger (0.46)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)

Add feedback

Faster Deep Reinforcement Learning with Slower Online Network

Neural Information Processing SystemsNov-15-2025, 06:23:18 GMT

Deep reinforcement learning algorithms often use two networks for value function optimization: an online network, and a target network that tracks the online network with some delay. Using two separate networks enables the agent to hedge against issues that arise when performing bootstrapping.

algorithm, learning, proximal term, (15 more...)

Neural Information Processing Systems

Country: