Robust Reinforcement Learning from Corrupted Human Feedback Alexander Bukharin
–Neural Information Processing Systems
Reinforcement learning from human feedback (RLHF) provides a principled framework for aligning AI systems with human preference data.
Neural Information Processing Systems
Oct-10-2025, 19:11:45 GMT