Enhancing Password Security Through a High-Accuracy Scoring Framework Using Random Forests

Mazelan, Muhammed El Mustaqeem, Abdul, Noor Hazlina, AlDahoul, Nouar

arXiv.org Artificial Intelligence 

Password security plays a crucial role in cybersecurity, yet traditional password strength meters, which rely on static rules like character - type requirements, often fail . Such methods are easily bypassed by common password patterns (e.g., 'P@ssw0rd1!'), giving users a false sense of security . To address this, we implement and evaluate a password strength scoring system by comparing four machine learning models: Random Forest (RF), Support Vector Machine (SVM), a Convolutional Neural Network (CNN), and Logistic Regression with a dataset of over 660,000 real - world passwords. Our primary contribution is a novel hybrid feature engineering approach that captures nuanced vulnerabilities missed by standard metrics . We introduce features like leetspeak - normalized Shannon entropy to assess true randomness, pattern detection for keyboard walks and sequences, and character - level TF - IDF n - grams to identify frequently reused substrings from breached password datasets. Crucially, the interpretability of the Random Forest model allows for feature importance analysis, providing a clear pathway to developing security tools that offer specific, actionable feedback to users. This study bridges the gap betwee n predictive accuracy and practical usability, resulting in a high - performance scoring system that not only reduces password - based vulnerabilities but also empowers users to make more informed security decisions. Keywords - Password Security, Machine Learning, Rule - Based Attack, Brute - Force Attack, Dictionary Attack, Cybersecurity. 1. P asswords remain a cornerstone of online security, serving as the primary means of authentication for countless systems and applications . However, this reliance is a critical vulnerability; according to a report by Google Cloud, a staggering 86% of breaches involve stolen credentials, posing a significant threat to both user data and system security .[1] M any users choose weak, easily guessable passwords, which pose a serious threat to both user data and system security . Attackers frequently exploit this vulnerability in large - scale attacks, compromising user privacy and enabling financial fraud . Most traditional password strength scoring tools rely on static rules, such as requiring a mix of lowercase, uppercase, digits, and special characters (LUDS), which fail to adapt to evolving attack patterns .