On the Effectiveness of Offline RL for Dialogue Response Generation