Aligning LLMs with Domain Invariant Reward Models