Annotating the Chain-of-Thought: A Behavior-Labeled Dataset for AI Safety

Open in new window