AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

DeepEN: A Deep Reinforcement Learning Framework for Personalized Enteral Nutrition in Critical Care

Tan, Daniel Jason, Chen, Jiayang, Perera, Dilruk, See, Kay Choong, Feng, Mengling

arXiv.org Artificial IntelligenceNov-20-2025

Objective: Current ICU enteral feeding remains sub-optimal due to limited personalization and ongoing uncertainty about appropriate calorie, protein, and fluid targets--particularly in the context of rapidly changing metabolic demands and heterogeneous responses to therapeutic interventions. This study introduces DeepEN, a novel reinforcement learning (RL)-based framework designed to dynamically personalize enteral nutrition (EN) dosing for critically ill patients using electronic health record data. Methods: DeepEN was trained on data from over 11,000 ICU patients in the MIMIC-IV database to generate 4-hourly, patient-specific targets for caloric, protein, and fluid intake. The model's state space integrates demographics, comorbidities, vital signs, laboratory measurements, and recent interventions considered relevant to nutritional management. The reward function was designed with domain expertise to balance short-term physiological and nutrition-related goals with long-term survival outcomes, reflecting real-world clinical priorities. The framework employs a dueling double deep Q-network with Conservative Q-Learning regularization to ensure safe and reliable policy learning from retrospective data. Model performance was benchmarked against both clinician-derived and guideline-based policies. Results: DeepEN outperformed both clinician and guideline-based policies, achieving a 3.7 0.17 percentage-point absolute reduction in estimated morarXiv:2510.08350v2 [cs.LG] 19 Nov 2025 tality compared with the clinician policy (18.8% vs 22.5%) and higher expected returns relative to the gold-standard guideline policy (11.89 vs 8.11). Control of key nutritional biomarkers was also improved under the learned policy.

enteral nutrition, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

2510.0835

Country: Asia > Singapore (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Health Care Providers & Services (0.94)
Health & Medicine > Pharmaceuticals & Biotechnology (0.69)
Health & Medicine > Health Care Technology > Medical Record (0.68)
(5 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Optimal control of the future via prospective learning with control

Bai, Yuxin, Acharyya, Aranyak, De Silva, Ashwin, Shen, Zeyu, Hassett, James, Vogelstein, Joshua T.

arXiv.org Machine LearningNov-20-2025

Optimal control of the future is the next frontier for AI. Current approaches to this problem are typically rooted in either reinforcement learning (RL). While powerful, this learning framework is mathematically distinct from supervised learning, which has been the main workhorse for the recent achievements in AI. Moreover, RL typically operates in a stationary environment with episodic resets, limiting its utility to more realistic settings. Here, we extend supervised learning to address learning to control in non-stationary, reset-free environments. Using this framework, called ''Prospective Learning with Control (PL+C)'', we prove that under certain fairly general assumptions, empirical risk minimization (ERM) asymptotically achieves the Bayes optimal policy. We then consider a specific instance of prospective learning with control, foraging -- which is a canonical task for any mobile agent -- be it natural or artificial. We illustrate that modern RL algorithms fail to learn in these non-stationary reset-free environments, and even with modifications, they are orders of magnitude less efficient than our prospective foraging agents.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

2511.08717

Country: Europe (0.67)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)

Add feedback

Exploiting the Replay Memory Before Exploring the Environment: Enhancing Reinforcement Learning Through Empirical MDP Iteration

Neural Information Processing SystemsNov-19-2025, 23:03:16 GMT

Reinforcement learning (RL) algorithms are typically based on optimizing a Markov Decision Process (MDP) using the optimal Bellman equation.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > Alberta (0.14)
Asia > Middle East > Jordan (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Information Technology (0.67)
Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Direct Preference-based Policy Optimization without Reward Modeling

Neural Information Processing SystemsNov-19-2025, 22:18:48 GMT

Instead, we propose a PbRL algorithm that directly learns from preference without requiring any reward modeling.

large language model, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country: Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Add feedback

Leveraging Separated World Model for Exploration in Visually Distracted Environments

Neural Information Processing SystemsNov-19-2025, 22:12:38 GMT

Despite its prevalence in real-world environments, this challenge has received limited attention in unsupervised reinforcement learning.

information, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

Asia > China > Jiangsu Province > Nanjing (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre:

Research Report > Experimental Study (0.67)
Research Report > Promising Solution (0.67)

Industry:

Leisure & Entertainment > Games > Computer Games (0.67)
Information Technology (0.46)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

918b9487f8ea4661e8ba5a02b2126658-Paper-Conference.pdf

Neural Information Processing SystemsNov-19-2025, 21:28:53 GMT

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States > New Jersey > Middlesex County > New Brunswick (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(3 more...)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
(2 more...)

Add feedback

b0ca717599b7ba84d5e4f4c8b1ef6657-Paper-Conference.pdf

Neural Information Processing SystemsNov-19-2025, 17:07:08 GMT

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Vaud > Lausanne (0.04)
Europe > Belgium > Flanders > Antwerp Province > Antwerp (0.04)

Industry: Health & Medicine (0.94)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Distributional Reinforcement Learning with Regularized Wasserstein Loss Ke Sun

Neural Information Processing SystemsNov-19-2025, 17:06:46 GMT

Empirically, we show that SinkhornDRL consistently outperforms or matches existing algorithms on the Atari games suite and particularly stands out in the multi-dimensional reward setting.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.04)
North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
Asia > China > Heilongjiang Province > Harbin (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Leisure & Entertainment > Games > Computer Games (0.56)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Adjustable Robust Reinforcement Learning for Online 3D Bin Packing

Neural Information Processing SystemsNov-19-2025, 13:24:38 GMT

Designing effective policies for the online 3D bin packing problem (3D-BPP) has been a long-standing challenge, primarily due to the unpredictable nature of incoming box sequences and stringent physical constraints.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country: