AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

82ad13ec01f9fe44c01cb91814fd7b8c-Paper-Conference.pdf

Neural Information Processing SystemsAug-16-2025, 13:18:49 GMT

machine learning, natural language, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country:

Europe > Middle East > Malta (0.04)
Asia > Japan > Honshū > Chūbu > Toyama Prefecture > Toyama (0.04)

Industry: Information Technology > Services (0.47)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
(4 more...)

Add feedback

On Reward-Free Reinforcement Learning with Linear Function Approximation

Neural Information Processing SystemsAug-16-2025, 13:18:40 GMT

During the exploration phase, an agent collects samples without using a pre-specified reward function.

algorithm, exploration phase, planning phase, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.41)

Add feedback

On Reward-Free Reinforcement Learning with Linear Function Approximation

Neural Information Processing SystemsAug-16-2025, 13:18:32 GMT

During the exploration phase, an agent collects samples without using a pre-specified reward function.

algorithm, exploration phase, reward function, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.42)

Add feedback

SPD: Synergy Pattern Diversifying Oriented Unsupervised Multi-agent Reinforcement Learning

Neural Information Processing SystemsAug-16-2025, 12:53:43 GMT

As for the single agent, unsupervised learning has been incorporated into RL to acquire diverse skills for the agent without extrinsic reward from the environment, and this scenario is known as unsupervised reinforcement learning (URL).

agent, discrepancy, synergy pattern, (13 more...)

Neural Information Processing Systems

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
Europe > France (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Add feedback

cceff8faa855336ad53b3325914caea2-Paper.pdf

Neural Information Processing SystemsAug-16-2025, 12:26:57 GMT

actor, learning, time step, (14 more...)

Neural Information Processing Systems

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > United Kingdom (0.04)
Asia > China (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

cc9b3c69b56df284846bf2432f1cba90-Supplemental.pdf

Neural Information Processing SystemsAug-16-2025, 12:18:17 GMT

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Asia > Middle East > Jordan (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A Finite-Time Analysis of Two Time-Scale Actor-Critic Methods

Neural Information Processing SystemsAug-16-2025, 12:18:09 GMT

In this work, we provide a non-asymptotic analysis for two timescale actor-critic methods under non-i.i.d.

actor-critic algorithm, algorithm, sample complexity, (11 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.30)
Asia > Middle East > Jordan (0.04)
North America > Canada (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

80b7bec60081f95d900973509744a306-Paper-Conference.pdf

Neural Information Processing SystemsAug-16-2025, 12:05:20 GMT

As efficient exploration in BAMDPs hinges upon the judicious acquisition of information, our complexity measure highlights the worst-case difficulty of gathering information and exhausting epistemic uncertainty.

agent, bamdp, information horizon, (13 more...)

Neural Information Processing Systems

Country: