A generative approach to LLM harmfulness detection with special red flag tokens