Semi-supervised reward learning for offline reinforcement learning