Goto

Collaborating Authors

 keystroke dynamic


A Hybrid CAPTCHA Combining Generative AI with Keystroke Dynamics for Enhanced Bot Detection

Nia, Ayda Aghaei

arXiv.org Artificial Intelligence

Completely Automated Public Turing tests to tell Computers and Humans Apart (CAPTCHAs) are a foundational component of web security, yet traditional implementations suffer from a trade-off between usability and resilience against AI-powered bots. This paper introduces a novel hybrid CAPTCHA system that synergizes the cognitive challenges posed by Large Language Models (LLMs) with the behavioral biometric analysis of keystroke dynamics. Our approach generates dynamic, unpredictable questions that are trivial for humans but non-trivial for automated agents, while simultaneously analyzing the user's typing rhythm to distinguish human patterns from robotic input. We present the system's architecture, formalize the feature extraction methodology for keystroke analysis, and report on an experimental evaluation. The results indicate that our dual-layered approach achieves a high degree of accuracy in bot detection, successfully thwarting both paste-based and script-based simulation attacks, while maintaining a high usability score among human participants. This work demonstrates the potential of combining cognitive and behavioral tests to create a new generation of more secure and user-friendly CAPTCHAs.


LLM-Assisted Cheating Detection in Korean Language via Keystrokes

Roh, Dong Hyun, Kumar, Rajesh, Ngo, An

arXiv.org Artificial Intelligence

This paper presents a keystroke-based framework for detecting LLM-assisted cheating in Korean, addressing key gaps in prior research regarding language coverage, cognitive context, and the granularity of LLM involvement. Our proposed dataset includes 69 participants who completed writing tasks under three conditions: Bona fide writing, paraphrasing ChatGPT responses, and transcribing ChatGPT responses. Each task spans six cognitive processes defined in Bloom's Taxonomy (remember, understand, apply, analyze, evaluate, and create). We extract interpretable temporal and rhythmic features and evaluate multiple classifiers under both Cognition-Aware and Cognition-Unaware settings. Temporal features perform well under Cognition-Aware evaluation scenarios, while rhythmic features generalize better under cross-cognition scenarios. Moreover, detecting bona fide and transcribed responses was easier than paraphrased ones for both the proposed models and human evaluators, with the models significantly outperforming the humans. Our findings affirm that keystroke dynamics facilitate reliable detection of LLM-assisted writing across varying cognitive demands and writing strategies, including paraphrasing and transcribing LLM-generated responses.


Impact of Data Breadth and Depth on Performance of Siamese Neural Network Model: Experiments with Three Keystroke Dynamic Datasets

Wahab, Ahmed Anu, Hou, Daqing, Cheng, Nadia, Huntley, Parker, Devlen, Charles

arXiv.org Machine Learning

Deep learning models, such as the Siamese Neural Networks (SNN), have shown great potential in capturing the intricate patterns in behavioral data. However, the impacts of dataset breadth (i.e., the number of subjects) and depth (e.g., the amount of training samples per subject) on the performance of these models is often informally assumed, and remains under-explored. To this end, we have conducted extensive experiments using the concepts of "feature space" and "density" to guide and gain deeper understanding on the impact of dataset breadth and depth on three publicly available keystroke datasets (Aalto, CMU and Clarkson II). Through varying the number of training subjects, number of samples per subject, amount of data in each sample, and number of triplets used in training, we found that when feasible, increasing dataset breadth enables the training of a well-trained model that effectively captures more inter-subject variability. In contrast, we find that the extent of depth's impact from a dataset depends on the nature of the dataset. Free-text datasets are influenced by all three depth-wise factors; inadequate samples per subject, sequence length, training triplets and gallery sample size, which may all lead to an under-trained model. Fixed-text datasets are less affected by these factors, and as such make it easier to create a well-trained model. These findings shed light on the importance of dataset breadth and depth in training deep learning models for behavioral biometrics and provide valuable insights for designing more effective authentication systems.


TempCharBERT: Keystroke Dynamics for Continuous Access Control Based on Pre-trained Language Models

Simão, Matheus, Prado, Fabiano, Wahab, Omar Abdul, Avila, Anderson

arXiv.org Artificial Intelligence

With the widespread of digital environments, reliable authentication and continuous access control has become crucial. It can minimize cyber attacks and prevent frauds, specially those associated with identity theft. A particular interest lies on keystroke dynamics (KD), which refers to the task of recognizing individuals' identity based on their unique typing style. In this work, we propose the use of pre-trained language models (PLMs) to recognize such patterns. Although PLMs have shown high performance on multiple NLP benchmarks, the use of these models on specific tasks requires customization. BERT and RoBERTa, for instance, rely on subword tokenization, and they cannot be directly applied to KD, which requires temporal-character information to recognize users. Recent character-aware PLMs are able to process both subwords and character-level information and can be an alternative solution. Notwithstanding, they are still not suitable to be directly fine-tuned for KD as they are not optimized to account for user's temporal typing information (e.g., hold time and flight time). To overcome this limitation, we propose TempCharBERT, an architecture that incorporates temporal-character information in the embedding layer of CharBERT. This allows modeling keystroke dynamics for the purpose of user identification and authentication. Our results show a significant improvement with this customization. We also showed the feasibility of training TempCharBERT on a federated learning settings in order to foster data privacy.


Towards Understanding Emotions for Engaged Mental Health Conversations

Sim, Kellie Yu Hui, Fortuno, Kohleen Tijing, Choo, Kenny Tsu Wei

arXiv.org Artificial Intelligence

Providing timely support and intervention is crucial in mental health settings. As the need to engage youth comfortable with texting increases, mental health providers are exploring and adopting text-based media such as chatbots, community-based forums, online therapies with licensed professionals, and helplines operated by trained responders. To support these text-based media for mental health--particularly for crisis care--we are developing a system to perform passive emotion-sensing using a combination of keystroke dynamics and sentiment analysis. Our early studies of this system posit that the analysis of short text messages and keyboard typing patterns can provide emotion information that may be used to support both clients and responders. We use our preliminary findings to discuss the way forward for applying AI to support mental health providers in providing better care.


DEFT: A new distance-based feature set for keystroke dynamics

Kaluarachchi, Nuwan, Kandanaarachchi, Sevvandi, Moore, Kristen, Arakala, Arathi

arXiv.org Artificial Intelligence

Keystroke dynamics is a behavioural biometric utilised for user identification and authentication. We propose a new set of features based on the distance between keys on the keyboard, a concept that has not been considered before in keystroke dynamics. We combine flight times, a popular metric, with the distance between keys on the keyboard and call them as Distance Enhanced Flight Time features (DEFT). This novel approach provides comprehensive insights into a person's typing behaviour, surpassing typing velocity alone. We build a DEFT model by combining DEFT features with other previously used keystroke dynamic features. The DEFT model is designed to be device-agnostic, allowing us to evaluate its effectiveness across three commonly used devices: desktop, mobile, and tablet. The DEFT model outperforms the existing state-of-the-art methods when we evaluate its effectiveness across two datasets. We obtain accuracy rates exceeding 99% and equal error rates below 10% on all three devices.


Keystroke Dynamics for User Identification

Sharma, Atharva, Jureček, Martin, Stamp, Mark

arXiv.org Artificial Intelligence

Authentication and intrusion detection are crucial aspects of online security. Conventional authentication methods, such as passwords, have limitations, and biometric systems may require additional hardware or be unsuitable for specific user groups. Recent research highlights the need for accessible and inclusive authentication systems for all users, including elderly [14, 24] and disabled individuals [26]. Keystroke dynamics are a promising means for improved user authentication and identification. By analyzing keystroke patterns, a user can be identified based on their distinctive typing style, regardless of age or physical ability. Furthermore, keystroke dynamics can aid in detecting an intruder who has gained unauthorized access to a system, making such it potentially a useful tool for intrusion detection. Compared to traditional authentication methods such as passwords, keystroke dynamics offer several benefits. First, keystroke dynamics are challenging to break since people tend to have distinctive typing patterns that may be difficult to replicate or guess.


BeCAPTCHA-Type: Biometric Keystroke Data Generation for Improved Bot Detection

DeAlcala, Daniel, Morales, Aythami, Tolosana, Ruben, Acien, Alejandro, Fierrez, Julian, Hernandez, Santiago, Ferrer, Miguel A., Diaz, Moises

arXiv.org Artificial Intelligence

This work proposes a data driven learning model for the synthesis of keystroke biometric data. The proposed method is compared with two statistical approaches based on Universal and User-dependent models. These approaches are validated on the bot detection task, using the keystroke synthetic data to improve the training process of keystroke-based bot detection systems. Our experimental framework considers a dataset with 136 million keystroke events from 168 thousand subjects. We have analyzed the performance of the three synthesis approaches through qualitative and quantitative experiments. Different bot detectors are considered based on several supervised classifiers (Support Vector Machine, Random Forest, Gaussian Naive Bayes and a Long Short-Term Memory network) and a learning framework including human and synthetic samples. The experiments demonstrate the realism of the synthetic samples. The classification results suggest that in scenarios with large labeled data, these synthetic samples can be detected with high accuracy. However, in few-shot learning scenarios it represents an important challenge. Furthermore, these results show the great potential of the presented models.


Conditional Generative Adversarial Network for keystroke presentation attack

Eizaguirre-Peral, Idoia, Segurola-Gil, Lander, Zola, Francesco

arXiv.org Artificial Intelligence

Cybersecurity is a crucial step in data protection to ensure user security and personal data privacy. In this sense, many companies have started to control and restrict access to their data using authentication systems. However, these traditional authentication methods, are not enough for ensuring data protection, and for this reason, behavioral biometrics have gained importance. Despite their promising results and the wide range of applications, biometric systems have shown to be vulnerable to malicious attacks, such as Presentation Attacks. For this reason, in this work, we propose to study a new approach aiming to deploy a presentation attack towards a keystroke authentication system. Our idea is to use Conditional Generative Adversarial Networks (cGAN) for generating synthetic keystroke data that can be used for impersonating an authorized user. These synthetic data are generated following two different real use cases, one in which the order of the typed words is known (ordered dynamic) and the other in which this order is unknown (no-ordered dynamic). Finally, both keystroke dynamics (ordered and no-ordered) are validated using an external keystroke authentication system. Results indicate that the cGAN can effectively generate keystroke dynamics patterns that can be used for deceiving keystroke authentication systems.


A novel non-linear transformation based multi-user identification algorithm for fixed text keystroke behavioral dynamics

Sahu, Chinmay, Banavar, Mahesh, Schuckers, Stephanie

arXiv.org Artificial Intelligence

Abstract--In this paper, we propose a new technique to uniquely classify and identify multiple users accessing a single application using keystroke dynamics. This problem is usually encountered when multiple users have legitimate access to shared computers and accounts, where, at times, one user can inadvertently be logged in on another user's account. Since the login processes are usually bypassed at this stage, we rely on keystroke dynamics in order to tell users apart. Our algorithm uses the quantile transform and techniques from localization to classify and identify users. Specifically, we use an algorithm known as ordinal Unfolding based Localization (UNLOC), which uses only ordinal data obtained from comparing distance proxies, by "locating" users in a reduced PCA/Kernel-PCA/t-SNE space based on their typing patterns. Our results are validated with the help of benchmark keystroke datasets and show that our algorithm outperforms other methods. In this paper, we consider With increasing digital presence, securing sensitive and personal both sources of keystrokes. In general, systems authentication [9], [12], [14], where a profile is built for only or web applications utilize one-time authentication using one user. The algorithms used in single-user authentication single sign-on for providing security. Banking and financial determine whether the user at the keyboard is the user in the institutions generally use a knowledge-based mechanism to model.