AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Planning with General Objective Functions: Going Beyond Total Rewards Ruosong Wang

Neural Information Processing SystemsAug-15-2025, 15:17:33 GMT

This "small" difference requires the agent to change the planning strategy significantly because the

algorithm, objective function, reward value, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Asia > Middle East > Jordan (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)

Add feedback

a6a767bbb2e3513233f942e0ff24272c-Paper.pdf

Neural Information Processing SystemsAug-15-2025, 15:17:26 GMT

algorithm, objective function, reward value, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Asia > Middle East > Jordan (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.97)

Add feedback

Unknown-Aware Domain Adversarial Learning for Open-Set Domain Adaptation

Neural Information Processing SystemsAug-15-2025, 14:41:11 GMT

Open-Set Domain Adaptation (OSDA) assumes that a target domain contains unknown classes, which are not discovered in a source domain.

alignment, feature alignment, uadal, (14 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
Asia > South Korea > Seoul > Seoul (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.43)

Add feedback

Deep Inverse Q-learning with Constraints Appendix Gabriel Kalweit

Neural Information Processing SystemsAug-15-2025, 14:08:07 GMT

Visualizations of the real and learned state-values of IA VI, IQL and DIQL can be found in Figure 7.Figure 7: Visualization of state-values for different numbers of trajectories in Objectworld. Table 2: Comparison between online and offline estimation of state-action visitations for the Ob-jectworld environment, given a data set with an action distribution equivalent to the true optimal Boltzmann distribution. The pseudocode of the tabular variant of Constrained Inverse Q-learning can be found in Algorithm 4. See [4] for further details of Constrained Q-learning.Algorithm 4: Tabular Model-free Constrained Inverse Q-learning The pseudocode of Deep Constrained Inverse Q-learning can be found in Algorithm 5. The lower row shows the EVD. 3 For DIQL, the parameters were optimized in the range of Hence, it can only increase.

deep inverse q-learning, inverse q-learning, q-learning, (14 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Baden-Württemberg > Freiburg (0.07)
North America > Canada (0.05)
Europe > Sweden > Stockholm > Stockholm (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Deep Inverse Q-learning with Constraints

Neural Information Processing SystemsAug-15-2025, 14:08:00 GMT

Popular Maximum Entropy Inverse Reinforcement Learning approaches require the computation of expected state visitation frequencies for the optimal policy under an estimate of the reward function.

inverse q-learning, q-learning, reward function, (14 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Baden-Württemberg > Freiburg (0.05)
North America > Canada > British Columbia > Vancouver (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)
(3 more...)

Genre: Research Report (0.93)

Industry: Transportation (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Iterative Amortized Policy Optimization Joseph Marino

Neural Information Processing SystemsAug-15-2025, 14:07:30 GMT

Given this perspective, we consider the more flexible class of iterative amortized optimizers.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Add feedback

Reinforcement Learning with State Observation Costs in Action-Contingent Noiselessly Observable Markov Decision Processes

Neural Information Processing SystemsAug-15-2025, 14:07:07 GMT

One key difference between reinforcement learning in simulated vs. real-world environments is that, in most simulated environments, the agent can fully observe

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Country: