A generative approach to LLM harmfulness detection with special red flag tokens

Open in new window