AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

70a32110fff0f26d301e58ebbca9cb9f-Supplemental.pdf

Neural Information Processing SystemsAug-15-2025, 03:12:22 GMT

algorithm, batch, rare policy switch model, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > Connecticut > New Haven County > New Haven (0.04)

Genre: Research Report > New Finding (0.68)

Industry:

Education (0.46)
Health & Medicine (0.33)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.70)
Information Technology > Data Science > Data Mining > Big Data (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.30)

Add feedback

Provably Efficient Reinforcement Learning with Linear Function Approximation under Adaptivity Constraints

Neural Information Processing SystemsAug-15-2025, 03:12:18 GMT

Real-world reinforcement learning (RL) applications often come with possibly infinite state and action space, and in such a situation classical RL algorithms developed in the tabular setting are not applicable anymore. A popular approach to overcoming this issue is by applying function approximation techniques to the underlying structures of the Markov decision processes (MDPs).

algorithm, batch, rare policy switch model, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > Connecticut > New Haven County > New Haven (0.04)

Genre: Research Report > New Finding (0.68)

Industry:

Education (0.47)
Health & Medicine (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Appendix A Proofs

Neural Information Processing SystemsAug-15-2025, 03:08:44 GMT

The second derivative test confirms that we have a maximum, i.e. The proof for (b) can be found in the work of Goodfellow et al. In this section we present Adversarial Soft Q-Fitting (ASQF), a principled approach to Imitation Learning without Reinforcement Learning that relies exclusively on transitions. Using transitions rather than trajectories presents several practical benefits such as the possibility to deal with asynchronously collected data or non-sequential experts demonstrations. We consider the GAN objective of Eq. (5) with The beginning of the proof closely follows the proof of Appendix A.1.

rl update lr null rl, transition, uniform distribution, (10 more...)

Neural Information Processing Systems

Technology: