Unified Threat Detection and Mitigation Framework (UTDMF): Combating Prompt Injection, Deception, and Bias in Enterprise-Scale Transformers

Oct-7-2025–arXiv.org Artificial Intelligence

Large language models (LLMs) have become integral to enterprise operations, powering applications ranging from automated financial auditing and risk assessment in banking to predictive diagnostics and patient interaction systems in healthcare, and even real-time customer sentiment analysis in e-commerce platforms. However, the deployment of these models at scale introduces multifaceted vulnerabilities that can lead to catastrophic failures. Prompt injection attacks, where malicious inputs manipulate model behavior to bypass safeguards, represent a direct security threat. Strategic deception, where models exhibit emergent behaviors that misalign with intended goals, erodes trust in agentic systems. Biased outputs, stemming from skewed training data or architectural inductive biases, perpetuate unfairness and can result in regulatory non-compliance or reputational damage. Our prior work [Ravindran, 2024] laid the groundwork by introducing adversarial activation patching, a novel interpretability technique that successfully induced deception in simplified toy neural networks, achieving a 23.9% induction rate. This demonstrated the feasibility of using activation-level interventions to probe and expose hidden risks in safety-aligned transformers. Building upon this foundation, we propose the Unified Threat Detection and Mitigation Framework (UTDMF), a comprehensive, scalable, and real-time pipeline explicitly designed for enterprise environments where high-stakes decisions demand robustness, explainability, and compliance.

large language model, machine learning, natural language, (12 more...)

arXiv.org Artificial Intelligence

Oct-7-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (1.00)
- Overview (0.94)

Industry:
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology
  - Security & Privacy (1.00)
  - Artificial Intelligence
    - Natural Language > Large Language Model (1.00)
    - Issues > Social & Ethical Issues (0.94)
    - Machine Learning > Neural Networks
      - Deep Learning (0.70)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found