BPO: Supercharging Online Preference Learning by Adhering to the Proximity of Behavior LLM