Bootstrapping Language Models with DPO Implicit Rewards