AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

884d247c6f65a96a7da4d1105d584ddd-Paper.pdf

Neural Information Processing SystemsFeb-9-2026, 06:33:31 GMT

DDPG [24]extends Q-learning to continuous control based on the Deterministic Policy Gradient [31] algorithm, which learns a deterministic policyπ(s;φ) parameterized byφto maximize the Q-function to approximate themaxoperator.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.05)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.72)

Add feedback

87736972ed2fb48230f1052699dedbe7-Supplemental.pdf

Neural Information Processing SystemsFeb-9-2026, 06:32:24 GMT

algorithm, def, reward function, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

6d7d394c9d0c886e9247542e06ebb705-Supplemental.pdf

Neural Information Processing SystemsFeb-9-2026, 06:31:30 GMT

algorithm, preference vector, probability, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > Maryland > Baltimore (0.04)
Asia > Middle East > Jordan (0.04)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

6d7d394c9d0c886e9247542e06ebb705-Paper.pdf

Neural Information Processing SystemsFeb-9-2026, 06:31:26 GMT

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > Maryland > Baltimore (0.04)
Asia > Middle East > Jordan (0.04)

Industry:

Education (0.46)
Health & Medicine (0.30)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)

Add feedback

8763d72bba4a7ade23f9ae1f09f4efc7-Paper.pdf

Neural Information Processing SystemsFeb-9-2026, 06:23:41 GMT

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.72)
Information Technology > Artificial Intelligence > Robots (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

288b63aa98084366c4536ba0574a0f22-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 06:23:28 GMT

mp-v, psv, target test, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

288b63aa98084366c4536ba0574a0f22-Paper-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 06:23:25 GMT

mp-v, psv, target test, (16 more...)

Neural Information Processing Systems

Genre:

Overview (0.67)
Research Report > New Finding (0.67)

Industry: Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.46)

Add feedback

Deep Recurrent Optimal Stopping

Neural Information Processing SystemsFeb-9-2026, 06:21:59 GMT

Deep neural networks (DNNs) have recently emerged as a powerful paradigm for solving Markovian optimal stopping problems. However, a ready extension of DNN-based methods to non-Markovian settings requires significant state and parameter space expansion, manifesting the curse of dimensionality.

machine learning, reinforcement learning, trajectory, (21 more...)

Neural Information Processing Systems

Country:

Europe > Netherlands > South Holland > Dordrecht (0.04)
North America > United States (0.04)
North America > Canada > Quebec > Montreal (0.04)
(5 more...)

Industry: Banking & Finance (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
(2 more...)

Add feedback

Outcome-DrivenReinforcementLearningvia VariationalInference

Neural Information Processing SystemsFeb-9-2026, 06:14:13 GMT

Standard reinforcement learning (RL) addresses reward maximization in a Markov decision process (MDP) defined by the tuple(S,A,pS0,pd,r,γ) [43, 44], where S and A denote the state and action space, respectively,p0 denotes the initial state distribution,pd is a state transition distribution, r is an immediate reward function, andγ is a discount factor. To sample trajectories, an initial state is sampled according topS0, and successive states are sampled from the state transition distributionSt+1 pd( |st,at) and actions from a policyAt π( |st).

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback