Reward Generalization in RLHF: A Topological Perspective

Open in new window