Reverse Preference Optimization for Complex Instruction Following