Simultaneous Reward Distillation and Preference Learning: Get You a Language Model Who Can Do Both