Information-Theoretic Reward Decomposition for Generalizable RLHF

Open in new window