AITopics | Reinforcement Learning

Sequential decision making problems, such as structured prediction, robotic control, and game playing, require a combination of planning policies and generalisation of those plans.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.65)

Add feedback

Deep Reinforcement Learning from Human Preferences

Paul F. Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, Dario Amodei

Neural Information Processing SystemsNov-21-2025, 13:06:52 GMT

It's not clear how to construct a suitable reward function, which will need to be a

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States > California > Los Angeles County > Long Beach (0.04)

Industry: Leisure & Entertainment (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Dynamic-Depth Context Tree Weighting

Joao V. Messias, Shimon Whiteson

Neural Information Processing SystemsNov-21-2025, 12:46:29 GMT

Reinforcement learning (RL) in partially observable settings is challenging because the agent's observations are not Markov. Recently proposed methods can learn variable-order Markov models of the underlying process but have steep memory requirements and are sensitive to aliasing between observation histories due to sensor noise. This paper proposes dynamic-depth context tree weighting (D2-CTW), a model-learning method that addresses these limitations. D2-CTW dynamically expands a suffix tree while ensuring that the size of the model, but not its depth, remains bounded. We show that D2-CTW approximately matches the performance of state-of-the-art alternatives at stochastic time-series prediction while using at least an order of magnitude less memory. We also apply D2-CTW to model-based RL, showing that, on tasks that require memory of past observations, D2-CTW can learn without prior knowledge of a good state representation, or even the length of history upon which such a representation should depend.

machine learning, node, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > Canada > Alberta > Census Division No. 6 > Calgary Metropolitan Region > Calgary (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Q-LDA: Uncovering Latent Patterns in Text-based Sequential Decision Processes

Jianshu Chen, Chong Wang, Lin Xiao, Ji He, Lihong Li, Li Deng

Neural Information Processing SystemsNov-21-2025, 12:41:44 GMT

This game continues until the agent reaches a final state and receives a terminal reward .

decision process, q-learning, topic model, (15 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Workflow (0.46)

Industry: Leisure & Entertainment > Games (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.96)
(3 more...)

Add feedback

Boltzmann Exploration Done Right

Nicolò Cesa-Bianchi, Claudio Gentile, Gabor Lugosi, Gergely Neu

Neural Information Processing SystemsNov-21-2025, 12:28:08 GMT

Despite its widespread use, there is virtually no theoretical understanding about the limitations or the actual benefits of this exploration scheme.

data mining, machine learning, reinforcement learning, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
North America > United States > California > San Mateo County > San Mateo (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Data Science > Data Mining > Big Data (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)

Add feedback

Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning

Shixiang (Shane) Gu, Timothy Lillicrap, Richard E. Turner, Zoubin Ghahramani, Bernhard Schölkopf, Sergey Levine

Neural Information Processing SystemsNov-21-2025, 12:06:59 GMT

Off-policy model-free deep reinforcement learning methods using previously collected data can improve sample efficiency over on-policy policy gradient techniques. On the other hand, on-policy algorithms are often more stable and easier to use. This paper examines, both theoretically and empirically, approaches to merging on-and off-policy updates for deep reinforcement learning.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country: