Clear Preferences Leave Traces: Reference Model-Guided Sampling for Preference Learning