Cognitive Cybersecurity for Artificial Intelligence: Guardrail Engineering with CCS-7

Aug-15-2025–arXiv.org Artificial Intelligence

Language models exhibit human-like cognitive vulnerabilities, such as emotional framing, that escape traditional behavioral alignment. We present CCS-7 (Cognitive Cybersecurity Suite), a taxonomy of seven vulnerabilities grounded in human cognitive security research. To establish a human benchmark, we ran a randomized controlled trial with 151 participants: a "Think First, Verify Always" (TFVA) lesson improved cognitive security by +7.9% overall. We then evaluated TFVA-style guardrails across 12,180 experiments on seven diverse language model architectures. Results reveal architecture-dependent risk patterns: some vulnerabilities (e.g., identity confusion) are almost fully mitigated, while others (e.g., source interference) exhibit escalating backfire, with error rates increasing by up to 135% in certain models. Humans, in contrast, show consistent moderate improvement. These findings reframe cognitive safety as a model-specific engineering problem: interventions effective in one architecture may fail, or actively harm, another, underscoring the need for architecture-aware cognitive safety testing before deployment.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Aug-15-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (1.00)
  - Strength High (1.00)

Industry:
- Government > Military
  - Cyberwarfare (0.61)
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Cognitive Science (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.47)
  - Natural Language > Large Language Model (0.95)
  - Representation & Reasoning (0.89)