Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
–Neural Information Processing Systems
Current AI alignment methodologies rely on human-provided demonstrations or judgments, and the learned capabilities of AI systems would be upper-bounded by human capabilities as a result. This raises a challenging research question: How can we keep improving the systems when their capabilities have surpassed the levels of humans?
Neural Information Processing Systems
Dec-26-2025, 02:12:42 GMT