Reward Modeling with Weak Supervision for Language Models