Annotating the Chain-of-Thought: A Behavior-Labeled Dataset for AI Safety