AITopics | Reinforcement Learning

A prominent issue with such methods is reward over-optimization or reward hacking, where performance as measured by the learned proxy reward model increases, but true quality plateaus or even deteriorates.

dataset, probability mass, trajectory, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
(2 more...)

Add feedback

Identifying Latent State-Transition Processes for Individualized Reinforcement Learning Y uewen Sun

Neural Information Processing SystemsOct-10-2025, 19:20:53 GMT

The application of reinforcement learning (RL) involving interactions with individuals has grown significantly in recent years.

arxiv preprint arxiv, identifiability, latent factor, (14 more...)

Neural Information Processing Systems

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Asia > Middle East > Israel (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Instructional Material (0.67)

Industry:

Education (1.00)
Health & Medicine > Consumer Health (0.93)
Information Technology > Security & Privacy (0.92)
(3 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

e1cadf5f02cc524b59c208728c73f91c-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 19:20:39 GMT

algorithm, hyperparameter, hyperparameter sensitivity, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.14)
Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

On the Curses of Future and History in Future-dependent Value Functions for OPE

Neural Information Processing SystemsOct-10-2025, 19:20:20 GMT

We study off-policy evaluation (OPE) in partially observable environments with complex observations, with the goal of developing estimators whose guarantee avoids exponential dependence on the horizon.

assumption, belief coverage, international conference, (15 more...)

Neural Information Processing Systems

Country: