Efficient Two-Phase Offline Deep Reinforcement Learning from Preference Feedback