Language Models Can Learn from Verbal Feedback Without Scalar Rewards

Open in new window