Goto

Collaborating Authors

 Performance Analysis


Efficient IoT Intrusion Detection with an Improved Attention-Based CNN-BiLSTM Architecture

arXiv.org Artificial Intelligence

The ever-increasing security vulnerabilities in the Internet-of-Things (IoT) systems require improved threat detection approaches. This paper presents a compact and efficient approach to detect botnet attacks by employing an integrated approach that consists of traffic pattern analysis, temporal support learning, and focused feature extraction. The proposed attention-based model benefits from a hybrid CNN-BiLSTM architecture and achieves 99% classification accuracy in detecting botnet attacks utilizing the N-BaIoT dataset, while maintaining high precision and recall across various scenarios. The proposed model's performance is further validated by key parameters, such as Mathews Correlation Coefficient and Cohen's kappa Correlation Coefficient. The close-to-ideal results for these parameters demonstrate the proposed model's ability to detect botnet attacks accurately and efficiently in practical settings and on unseen data. The proposed model proved to be a powerful defence mechanism for IoT networks to face emerging security challenges.


Uncertainty, bias and the institution bootstrapping problem

arXiv.org Artificial Intelligence

Institutions play a critical role in enabling communities to manage common-pool resources and avert tragedies of the commons. However, a fundamental issue arises: Individuals typically perceive participation as advantageous only after an institution is established, creating a paradox: How can institutions form if no one will join before a critical mass exists? We term this conundrum the institution bootstrapping problem and propose that misperception, specifically, agents' erroneous belief that an institution already exists, could resolve this paradox. By integrating well-documented psychological phenomena, including cognitive biases, probability distortion, and perceptual noise, into a game-theoretic framework, we demonstrate how these factors collectively mitigate the bootstrapping problem. Notably, unbiased perceptual noise (e.g., noise arising from agents' heterogeneous physical or social contexts) drastically reduces the critical mass of cooperators required for institutional emergence. This effect intensifies with greater diversity of perceptions. We explain this counter-intuitive result through asymmetric boundary conditions: proportional underestimation of low-probability sanctions produces distinct outcomes compared to equivalent overestimation. Furthermore, the type of perceptual distortion, proportional versus absolute, yields qualitatively different evolutionary pathways. These findings challenge conventional assumptions about rationality in institutional design, highlighting how "noisy" cognition can paradoxically enhance cooperation. Finally, we contextualize these insights within broader discussions of multi-agent system design and collective action. Our analysis underscores the importance of incorporating human-like cognitive constraints, not just idealized rationality, into models of institutional emergence and resilience.


MPEC: Manifold-Preserved EEG Classification via an Ensemble of Clustering-Based Classifiers

arXiv.org Artificial Intelligence

ORCID: 0000 - 0003 - 0886 - 7023 Abstract -- Accurate classification of EEG signals is crucial for brain - computer interfaces (BCIs) and neuroprosthetic applications, yet many existing methods fail to account for the non - Euclidean, manifold structure of EEG data, resulting in suboptimal performance. Preserving this manifold information is essential to capture the true geometry of EEG signals, but tradition al classification techniques largely overlook this need. To this end, w e propose MPEC (Manifold - Preserved EEG Classification via an Ensemble of Clus tering - Based Classifiers), that introduces two key innovations: (1) a feature engineering phase that combines covariance matrices and Radial Basis Function (RBF) kernels to capture both linear and non - linear relationships among EEG channels, and (2) a clustering phase that employs a modified K - means al gorithm tailored for the Riemannian manifold space, ensuring local geometric sensitivity. Ensembling multiple clustering - based classifiers, MPEC achieves superior results, validated by significant improvements on the BCI Competition IV dataset 2a. Keywords -- brain - computer interfaces (BCIs), EEG signal classification, ensemble modeling, clustering - based classification. EEG signal classification is essential in brain - computer interfaces (BCIs) and neuroprosthetics, where precise interpretation supports real - time control and cognitive applications. However, traditional techniques often overlook the non - Euclidean, manifold structure of EEG data, leading to suboptimal results [1] . We propose Manifold - Preserved EEG Classification via an Ensemble of Clustering - Based Classifiers (MPEC), a novel method that enhances classification accuracy by preserving the intrinsic manifold structure of EEG signals.


Passive Measurement of Autonomic Arousal in Real-World Settings

arXiv.org Artificial Intelligence

The autonomic nervous system (ANS) is activated during stress, which can have negative effects on cardiovascular health, sleep, the immune system, and mental health. While there are ways to quantify ANS activity in laboratories, there is a paucity of methods that have been validated in real-world contexts. We present the Fitbit Body Response Algorithm, an approach to continuous remote measurement of ANS activation through widely available remote wrist-based sensors. The design was validated via two experiments, a Trier Social Stress Test (n = 45) and ecological momentary assessments (EMA) of perceived stress (n=87), providing both controlled and ecologically valid test data. Model performance predicting perceived stress when using all available sensor modalities was consistent with expectations (accuracy=0.85) and outperformed models with access to only a subset of the signals. We discuss and address challenges to sensing that arise in real world settings that do not present in conventional lab environments.


Security Bug Report Prediction Within and Across Projects: A Comparative Study of BERT and Random Forest

arXiv.org Artificial Intelligence

Early detection of security bug reports (SBRs) is crucial for preventing vulnerabilities and ensuring system reliability. While machine learning models have been developed for SBR prediction, their predictive performance still has room for improvement. In this study, we conduct a comprehensive comparison between BERT and Random Forest (RF), a competitive baseline for predicting SBRs. The results show that RF outperforms BERT with a 34% higher average G-measure for within-project predictions. Adding only SBRs from various projects improves both models' average performance. However, including both security and nonsecurity bug reports significantly reduces RF's average performance to 46%, while boosts BERT to its best average performance of 66%, surpassing RF. In cross-project SBR prediction, BERT achieves a remarkable 62% G-measure, which is substantially higher than RF.


A Generative-AI-Driven Claim Retrieval System Capable of Detecting and Retrieving Claims from Social Media Platforms in Multiple Languages

arXiv.org Artificial Intelligence

Online disinformation poses a global challenge, placing significant demands on fact-checkers who must verify claims efficiently to prevent the spread of false information. A major issue in this process is the redundant verification of already fact-checked claims, which increases workload and delays responses to newly emerging claims. This research introduces an approach that retrieves previously fact-checked claims, evaluates their relevance to a given input, and provides supplementary information to support fact-checkers. Our method employs large language models (LLMs) to filter irrelevant fact-checks and generate concise summaries and explanations, enabling fact-checkers to faster assess whether a claim has been verified before. In addition, we evaluate our approach through both automatic and human assessments, where humans interact with the developed tool to review its effectiveness. Our results demonstrate that LLMs are able to filter out many irrelevant fact-checks and, therefore, reduce effort and streamline the fact-checking process.


What's Wrong with Your Synthetic Tabular Data? Using Explainable AI to Evaluate Generative Models

arXiv.org Machine Learning

Evaluating synthetic tabular data is challenging, since they can differ from the real data in so many ways. There exist numerous metrics of synthetic data quality, ranging from statistical distances to predictive performance, often providing conflicting results. Moreover, they fail to explain or pinpoint the specific weaknesses in the synthetic data. To address this, we apply explainable AI (XAI) techniques to a binary detection classifier trained to distinguish real from synthetic data. While the classifier identifies distributional differences, XAI concepts such as feature importance and feature effects, analyzed through methods like permutation feature importance, partial dependence plots, Shapley values and counterfactual explanations, reveal why synthetic data are distinguishable, highlighting inconsistencies, unrealistic dependencies, or missing patterns. This interpretability increases transparency in synthetic data evaluation and provides deeper insights beyond conventional metrics, helping diagnose and improve synthetic data quality. We apply our approach to two tabular datasets and generative models, showing that it uncovers issues overlooked by standard evaluation techniques.


Financial Data Analysis with Robust Federated Logistic Regression

arXiv.org Machine Learning

Financial data analysis plays a pivotal role in today's business landscape [1, 2, 3, 4, 5, 6, 7], including credit risk assessment (such as loan prediction and credit scoring), fraud detection, and cost optimization, etc. However, when we develop solutions to address financial problems, we will inevitably encounter a number of key challenges [1, 2, 3, 4, 5]. For example, financial data is often voluminous, dynamically and frequently generated in real time, and distributed across diverse locations, making it challenging to process and analyze in a centralized manner[1], e.g., the New Y ork Stock Exchange (NYSE) alone has billions of transactions per day. Similarly, other major exchanges, such as the Shanghai Stock Exchange (SSE) and the London Stock Exchange (LSE), also generate vast amounts of stock data. Additionally, noise and missing values unavoidably occur in financial data, which can cause results and predictions to be skewed (or even completely wrong). These challenges require firms to come up with more efficient and smarter solutions. In recent decades, machine learning has achieved remarkable success across various domains [8, 9, 10], owing to its effective generalization ability and adaptability, and has also received increasing attention in financial data analysis [11, 12], such as credit risk assessment, resource allocation, and cost optimization. However, these classical (supervised) machine learning based solutions, such as logistic regression and random forest, usually implicitly assume that 1) all the data is stored and centralized at one location, typically a single machine, and that we have full access to the entire data; 2) these algorithms expect to run on a single machine with minimal concerns for memory or disk storage limitations; and 3) the provided data is clean and free from outliers introduced by malicious adversaries, as it is stored at a single location equipped with high security protection mechanisms to prevent data corruption. Nonetheless, these assumptions do not always hold in practice.


Classifier-to-Bias: Toward Unsupervised Automatic Bias Detection for Visual Classifiers

arXiv.org Artificial Intelligence

A person downloading a pre-trained model from the web should be aware of its biases. Existing approaches for bias identification rely on datasets containing labels for the task of interest, something that a non-expert may not have access to, or may not have the necessary resources to collect: this greatly limits the number of tasks where model biases can be identified. In this work, we present Classifier-to-Bias (C2B), the first bias discovery framework that works without access to any labeled data: it only relies on a textual description of the classification task to identify biases in the target classification model. This description is fed to a large language model to generate bias proposals and corresponding captions depicting biases together with task-specific target labels. A retrieval model collects images for those captions, which are then used to assess the accuracy of the model w.r.t. the given biases. C2B is training-free, does not require any annotations, has no constraints on the list of biases, and can be applied to any pre-trained model on any classification task. Experiments on two publicly available datasets show that C2B discovers biases beyond those of the original datasets and outperforms a recent state-of-the-art bias detection baseline that relies on task-specific annotations, being a promising first step toward addressing task-agnostic unsupervised bias detection.


TrueFake: A Real World Case Dataset of Last Generation Fake Images also Shared on Social Networks

arXiv.org Artificial Intelligence

--AI-generated synthetic media are increasingly used in real-world scenarios, often with the purpose of spreading misinformation and propaganda through social media platforms, where compression and other processing can degrade fake detection cues. Currently, many forensic tools fail to account for these in-the-wild challenges. In this work, we introduce TrueFake, a large-scale benchmarking dataset of 600,000 images including top notch generative techniques and sharing via three different social networks. This dataset allows for rigorous evaluation of state-of-the-art fake image detectors under very realistic and challenging conditions. Through extensive experimentation, we analyze how social media sharing impacts detection performance, and identify current most effective detection and training strategies. Our findings highlight the need for evaluating forensic models in conditions that mirror real-world use. In recent years, AI-generated media (such as images, videos, and audio) have increasingly become part of everyday life [3] becoming widely used in the entertainment industry, including movie production and advertising. The literature provides a broad range of AI media generators capable of producing hyper-realistic images [4], [5], videos [6], and even audio [7].