Robust Reinforcement Learning from Corrupted Human Feedback

Mar-27-2025, 12:02:21 GMT–Neural Information Processing Systems

Reinforcement learning from human feedback (RLHF) provides a principled framework for aligning AI systems with human preference data. For various reasons, e.g., personal bias, context ambiguity, lack of training, etc, human annotators may give incorrect or inconsistent preference labels.

artificial intelligence, arxiv preprint arxiv, machine learning, (18 more...)

Neural Information Processing Systems

Mar-27-2025, 12:02:21 GMT

Conferences PDF

Add feedback

Genre:
- Research Report > New Finding (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks > Deep Learning (1.00)
  - Statistical Learning (1.00)