CRPO: Confidence-Reward Driven Preference Optimization for Machine Translation