Reliability and Learnability of Human Bandit Feedback for Sequence-to-Sequence Reinforcement Learning

Open in new window