AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

d82118376df344b0010f53909b961db3-Supplemental.pdf

Neural Information Processing SystemsFeb-11-2026, 10:06:32 GMT

bandit game, br 3, nulla, (15 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Industry:

Government (1.00)
Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.46)

Add feedback

d82118376df344b0010f53909b961db3-Paper.pdf

Neural Information Processing SystemsFeb-11-2026, 10:06:28 GMT

algorithm, bandit game, stackelberg equilibrium, (12 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
Asia > Middle East > Jordan (0.04)

Industry:

Government (1.00)
Leisure & Entertainment > Games (0.69)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)

Add feedback

Non-MarkovianRewardModellingfromTrajectory LabelsviaInterpretableMultipleInstanceLearning

Neural Information Processing SystemsFeb-11-2026, 10:04:48 GMT

There is growing consensus around the view that aligned and beneficial AI requires a reframing of objectives as being contingent, uncertain, and learnable via interaction with humans [35].

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.52)

Add feedback

Improved Bayesian Regret Bounds for Thompson Sampling in Reinforcement Learning

Neural Information Processing SystemsFeb-11-2026, 09:57:13 GMT

This work was supported in part by the National Science Foundation under grant CCF-2149588 and Cisco, Inc.

bayesian rl, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
Asia > Middle East > Jordan (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

4a17cd29ced0443bcff689fbb0d32d5e-Paper-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 09:57:10 GMT

bayesian regret, bayesian rl, information ratio, (10 more...)

Neural Information Processing Systems

Country:

North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
Asia > Middle East > Jordan (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

d7e4cdde82a894b8f633e6d61a01ef15-Supplemental.pdf

Neural Information Processing SystemsFeb-11-2026, 09:55:58 GMT

algorithm, cost player, mdp, (14 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Reward is Enough for Convex MDPs

Neural Information Processing SystemsFeb-11-2026, 09:55:54 GMT

Maximising a cumulative reward function that is Markov and stationary, i.e., defined over state-action pairs and independent of time, is sufficient to capture many

artificial intelligence, machine learning, reinforcement learning, (12 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

d7b76edf790923bf7177f7ebba5978df-Paper.pdf

Neural Information Processing SystemsFeb-11-2026, 09:54:43 GMT

arxiv preprint arxiv, entropy, exploration, (14 more...)

Neural Information Processing Systems

Country:

Asia > South Korea > Ulsan > Ulsan (0.04)
Asia > South Korea > Daejeon > Daejeon (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

Quark: ControllableTextGeneration with Reinforced[ Un]learning

Neural Information Processing SystemsFeb-11-2026, 09:47:26 GMT

Generated text may contain offensive or toxic language, contain significant repetition, orbeofadifferent sentiment than desired by the user. We consider thetaskofunlearningthese misalignments byfine-tuning thelanguage model on signals of whatnot to do.

machine learning, natural language, reinforcement learning, (22 more...)

Neural Information Processing Systems

Country: