AITopics | rl task

Collaborating Authors

rl task

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

7bd4a7d0e6773072c2e3c77b11d93065-Paper-Conference.pdf

Neural Information Processing SystemsFeb-15-2026, 10:28:28 GMT

correlation, machine learning, reinforcement learning, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Northampton County > Bethlehem (0.04)
North America > United States > Maryland > Prince George's County > College Park (0.04)
Asia > China > Jilin Province (0.04)

Genre: Research Report > Promising Solution (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
(3 more...)

Add feedback

7c05147f3029c97ce26c0cb0b2469fca-Paper.pdf

Neural Information Processing SystemsFeb-9-2026, 12:05:40 GMT

agent, backprop, map propagation, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Reinforcement Learning with Convex Constraints

Neural Information Processing SystemsDec-25-2025, 16:22:21 GMT

In standard reinforcement learning (RL), a learning agent seeks to optimize the overall reward. However, many key aspects of a desired behavior are more naturally expressed as constraints. For instance, the designer may want to limit the use of unsafe actions, increase the diversity of trajectories to enable exploration, or approximate expert trajectories when rewards are sparse. In this paper, we propose an algorithmic scheme that can handle a wide class of constraints in RL tasks: specifically, any constraints that require expected values of some vector measurements (such as the use of an action) to lie in a convex set. This captures previously studied constraints (such as safety and proximity to an expert), but also enables new classes of constraints (such as diversity). Our approach comes with rigorous theoretical guarantees and only relies on the ability to approximately solve standard RL tasks. As a result, it can be easily adapted to work with any model-free or model-based RL. In our experiments, we show that it matches previous algorithms that enforce safety via constraints, but can also enforce new properties that these algorithms do not incorporate, such as diversity.

constraint, name change, reinforcement learning, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.30)

Add feedback

Spectrum Random Masking for Generalization in Image-based Reinforcement Learning

Neural Information Processing SystemsDec-24-2025, 15:00:46 GMT

Generalization in image-based reinforcement learning (RL) aims to learn a robust policy that could be applied directly on unseen visual environments, which is a challenging task since agents usually tend to overfit to their training environment. To handle this problem, a natural approach is to increase the data diversity by image based augmentations. However, different with most vision tasks such as classification and detection, RL tasks are not always invariant to spatial based augmentations due to the entanglement of environment dynamics and visual appearance. In this paper, we argue with two principles for augmentations in RL: First, the augmented observations should facilitate learning a universal policy, which is robust to various distribution shifts. Second, the augmented data should be invariant to the learning signals such as action and reward. Following these rules, we revisit image-based RL tasks from the view of frequency domain and propose a novel augmentation method, namely Spectrum Random Masking (SRM),which is able to help agents to learn the whole frequency spectrum of observation for coping with various distributions and compatible with the pre-collected action and reward corresponding to original observation. Extensive experiments conducted on DMControl Generalization Benchmark demonstrate the proposed SRM achieves the state-of-the-art performance with strong generalization potentials.

generalization, image-based reinforcement learning, spectrum random masking, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.30)

Add feedback

Transfer Q-learning

Chen, Elynn, Li, Sai, Jordan, Michael I.

arXiv.org Artificial IntelligenceOct-21-2025

Time-inhomogeneous finite-horizon Markov decision processes (MDP) are frequently employed to model decision-making in dynamic treatment regimes and other statistical reinforcement learning (RL) scenarios. These fields, especially healthcare and business, often face challenges such as high-dimensional state spaces and time-inhomogeneity of the MDP process, compounded by insufficient sample availability which complicates informed decision-making. To overcome these challenges, we investigate knowledge transfer within time-inhomogeneous finite-horizon MDP by leveraging data from both a target RL task and several related source tasks. We have developed transfer learning (TL) algorithms that are adaptable for both batch and online $Q$-learning, integrating valuable insights from offline source studies. The proposed transfer $Q$-learning algorithm contains a novel {\em re-targeting} step that enables {\em cross-stage transfer} along multiple stages in an RL task, besides the usual {\em cross-task transfer} for supervised learning. We establish the first theoretical justifications of TL in RL tasks by showing a faster rate of convergence of the $Q^*$-function estimation in the offline RL transfer, and a lower regret bound in the offline-to-online RL transfer under stage-wise reward similarity and mild design similarity across tasks. Empirical evidence from both synthetic and real datasets is presented to evaluate the proposed algorithm and support our theoretical results.

machine learning, reinforcement learning, trajectory, (19 more...)

arXiv.org Artificial Intelligence

2202.04709

Country:

Asia (0.47)
North America > United States > California (0.28)

Genre:

Research Report (1.00)
Workflow (0.67)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.55)

Add feedback

RL in the Wild: Characterizing RLVR Training in LLM Deployment

Zhou, Jiecheng, Hu, Qinghao, Jin, Yuyang, Wang, Zerui, Sun, Peng, Gu, Yuzhe, Zhang, Wenwei, Zhai, Mingshu, Zhang, Xingcheng, Zhang, Weiming

arXiv.org Artificial IntelligenceOct-14-2025

Large Language Models (LLMs) are now widely used across many domains. With their rapid development, Reinforcement Learning with V erifiable Rewards (RL VR) has surged in recent months to enhance their reasoning and understanding abilities. However, its complex data flows, and diverse tasks pose substantial challenges to RL training systems, and there is limited understanding of RL VR from a system perspective. To thoroughly understand the system challenges introduced by RL VR, we present a characterization study of RL VR tasks in our LLM deployment. Specifically, we investigate the distribution and variation trends of workloads across different RL tasks across training steps. We identify issues such as GPU idling caused by skewed sequence length distribution, inefficient parallel strategies in dynamically varying workloads, inefficient data management mechanisms, and load imbalance. We describe our observations and call for further investigation into the remaining open challenges. Furthermore, we propose PolyTrace benchmark suite to conduct evaluation with realistic workloads, a practical use case validates that PolyTrace benchmark suite exhibits 94.7% accuracy.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2509.25279

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > China > Shanghai > Shanghai (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(5 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

7bd4a7d0e6773072c2e3c77b11d93065-Paper-Conference.pdf

Neural Information Processing SystemsOct-8-2025, 23:13:11 GMT

correlation, machine learning, reinforcement learning, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Northampton County > Bethlehem (0.04)
North America > United States > Maryland > Prince George's County > College Park (0.04)
Asia > China > Jilin Province (0.04)

Genre: Research Report > Promising Solution (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
(3 more...)

Add feedback

smaller points and clarity of the equations, use of minus signs and symbols into account when we revise the paper

Neural Information Processing SystemsOct-2-2025, 08:00:45 GMT

We thank the reviewers for their constructive comments. Here we will focus on the main concerns. This is a neuroscientific finding, which has been reviewed in e.g. Feedback alignment fails on simple problems and is known not work at all in deeper networks. AGREL dealt with a single hidden layer.

artificial intelligence, brainprop, machine learning, (16 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Therapeutic Area > Neurology (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.36)

Add feedback