Expectation Alignment: Handling Reward Misspecification in the Presence of Expectation Mismatch
–Neural Information Processing Systems
Detecting and handling misspecified objectives, such as reward functions, has been widely recognized as one of the central challenges within the domain of Artificial Intelligence (AI) safety research.
Neural Information Processing Systems
Oct-10-2025, 06:04:55 GMT
- Genre:
- Research Report > Experimental Study (0.93)
- Industry:
- Government (0.46)
- Technology: