Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs

Open in new window