When Y our AIs Deceive You: Challenges of Partial Observability in Reinforcement Learning from Human Feedback Leon Lang University of Amsterdam Davis Foote

Oct-10-2025, 12:43:26 GMT–Neural Information Processing Systems

Modeling the human as Boltzmann-rational w.r.t. a belief over trajectories, we prove conditions under which RLHF is

choice probability, return function, sequence, (15 more...)

Neural Information Processing Systems

Oct-10-2025, 12:43:26 GMT

Conferences PDF

Country:
- North America > United States (0.14)
- Europe > Netherlands
  - North Holland > Amsterdam (0.40)

Genre:
- Research Report > Experimental Study (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Uncertainty (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning
    - Neural Networks > Deep Learning (1.00)
    - Learning Graphical Models
      - Directed Networks > Bayesian Learning (0.67)
      - Undirected Networks > Markov Models (0.45)

Duplicate Docs Excel Report

Title
When Y our AIs Deceive You: Challenges of Partial Observability in Reinforcement Learning from Human Feedback Leon Lang University of Amsterdam Davis Foote

Similar Docs Excel Report more

Title	Similarity	Source
None found