Trustworthiness Calibration Framework for Phishing Email Detection Using Large Language Models
Ganiuly, Daniyal, Smaiyl, Assel
–arXiv.org Artificial Intelligence
Phishing emails continue to pose a persistent challenge to online communication, exploiting human trust and evading automated filters through realistic language and adaptive tactics. While large language models (LLMs) such as GPT-4 and LLaMA-3-8B achieve strong accuracy in text classification, their deployment in security systems requires assessing reliability beyond benchmark performance. To address this, this study introduces the Trustworthiness Calibration Framework (TCF), a reproducible methodology for evaluating phishing detectors across three dimensions: calibration, consistency, and robustness. These components are integrated into a bounded index, the Trustworthiness Calibration Index (TCI), and complemented by the Cross-Dataset Stability (CDS) metric that quantifies stability of trustworthiness across datasets. Experiments conducted on five corpora, such as SecureMail 2025, Phishing Validation 2024, CSDMC2010, Enron-Spam, and Nazario, using DeBERTa-v3-base, LLaMA-3-8B, and GPT-4 demonstrate that GPT-4 achieves the strongest overall trust profile, followed by LLaMA-3-8B and DeBERTa-v3-base. Statistical analysis confirms that reliability varies independently of raw accuracy, underscoring the importance of trust-aware evaluation for real-world deployment. The proposed framework establishes a transparent and reproducible foundation for assessing model dependability in LLM-based phishing detection.
arXiv.org Artificial Intelligence
Nov-10-2025
- Country:
- Asia > Kazakhstan
- Akmola Region > Astana (0.04)
- North America > United States
- Massachusetts > Suffolk County
- Boston (0.04)
- Nevada > Clark County
- Las Vegas (0.04)
- Washington > King County
- Seattle (0.04)
- Massachusetts > Suffolk County
- Asia > Kazakhstan
- Genre:
- Research Report > New Finding (0.94)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Technology: