Reinforcement Learning from LLM Feedback to Counteract Goal Misgeneralization

Open in new window