Lightweight Robust Direct Preference Optimization