Efficient Two-Phase Offline Deep Reinforcement Learning from Preference Feedback

Open in new window