Goto

Collaborating Authors

 Performance Analysis


Solar Flare Forecast: A Comparative Analysis of Machine Learning Algorithms for Solar Flare Class Prediction

arXiv.org Artificial Intelligence

Solar flares are among the most powerful and dynamic events in the solar system, resulting from the sudden release of magnetic energy stored in the Sun's atmosphere. These energetic bursts of electromagnetic radiation can release up to 10^32 erg of energy, impacting space weather and posing risks to technological infrastructure and therefore require accurate forecasting of solar flare occurrences and intensities. This study evaluates the predictive performance of three machine learning algorithms: Random Forest, k-Nearest Neighbors (KNN), and Extreme Gradient Boosting (XGBoost) for classifying solar flares into 4 categories (B, C, M, X). Using the dataset of 13 SHARP parameters, the effectiveness of the models was evaluated in binary and multiclass classification tasks. The analysis utilized 8 principal components (PC), capturing 95% of data variance, and 100 PCs, capturing 97.5% of variance. Our approach uniquely combines binary and multiclass classification with different levels of dimensionality reduction, an innovative methodology not previously explored in the context of solar flare prediction. Employing a 10-fold stratified cross-validation and grid search for hyperparameter tuning ensured robust model evaluation. Our findings indicate that Random Forest and XGBoost consistently demonstrate strong performance across all metrics, benefiting significantly from increased dimensionality. The insights of this study enhance future research by optimizing dimensionality reduction techniques and informing model selection for astrophysical tasks. By integrating this newly acquired knowledge into future research, more accurate space weather forecasting systems can be developed, along with a deeper understanding of solar physics.


Detecting Quishing Attacks with Machine Learning Techniques Through QR Code Analysis

arXiv.org Artificial Intelligence

The rise of QR code based phishing ("Quishing") poses a growing cybersecurity threat, as attackers increasingly exploit QR codes to bypass traditional phishing defenses. Existing detection methods predominantly focus on URL analysis, which requires the extraction of the QR code payload, and may inadvertently expose users to malicious content. Moreover, QR codes can encode various types of data beyond URLs, such as Wi-Fi credentials and payment information, making URL-based detection insufficient for broader security concerns. To address these gaps, we propose the first framework for quishing detection that directly analyzes QR code structure and pixel patterns without extracting the embedded content. We generated a dataset of phishing and benign QR codes and we used it to train and evaluate multiple machine learning models, including Logistic Regression, Decision Trees, Random Forest, Naive Bayes, LightGBM, and XGBoost. Our best-performing model (XGBoost) achieves an AUC of 0.9106, demonstrating the feasibility of QR-centric detection. Through feature importance analysis, we identify key visual indicators of malicious intent and refine our feature set by removing non-informative pixels, improving performance to an AUC of 0.9133 with a reduced feature space. Our findings reveal that the structural features of QR code correlate strongly with phishing risk. This work establishes a foundation for quishing mitigation and highlights the potential of direct QR analysis as a critical layer in modern phishing defenses.


Improving Omics-Based Classification: The Role of Feature Selection and Synthetic Data Generation

arXiv.org Artificial Intelligence

Given the increasing complexity of omics datasets, a key challenge is not only improving classification performance but also enhancing the transparency and reliability of model decisions. Effective model performance and feature selection are fundamental for explainability and reliability. In many cases, high dimensional omics datasets suffer from limited number of samples due to clinical constraints, patient conditions, phenotypes rarity and others conditions. Current omics based classification models often suffer from narrow interpretability, making it difficult to discern meaningful insights where trust and reproducibility are critical. This study presents a machine learning based classification framework that integrates feature selection with data augmentation techniques to achieve high standard classification accuracy while ensuring better interpretability. Using the publicly available dataset (E MTAB 8026), we explore a bootstrap analysis in six binary classification scenarios to evaluate the proposed model's behaviour. We show that the proposed pipeline yields cross validated perfomance on small dataset that is conserved when the trained classifier is applied to a larger test set. Our findings emphasize the fundamental balance between accuracy and feature selection, highlighting the positive effect of introducing synthetic data for better generalization, even in scenarios with very limited samples availability.


A new membership inference attack that spots memorization in generative and predictive models: Loss-Based with Reference Model algorithm (LBRM)

arXiv.org Artificial Intelligence

Generative models can unintentionally memorize training data, posing significant privacy risks. This paper addresses the memorization phenomenon in time series imputation models, introducing the Loss-Based with Reference Model (LBRM) algorithm. The LBRM method leverages a reference model to enhance the accuracy of membership inference attacks, distinguishing between training and test data. Our contributions are twofold: first, we propose an innovative method to effectively extract and identify memorized training data, significantly improving detection accuracy. On average, without fine-tuning, the AUROC improved by approximately 40\%. With fine-tuning, the AUROC increased by approximately 60\%. Second, we validate our approach through membership inference attacks on two types of architectures designed for time series imputation, demonstrating the robustness and versatility of the LBRM approach in different contexts. These results highlight the significant enhancement in detection accuracy provided by the LBRM approach, addressing privacy risks in time series imputation models.


A Cognitive-Mechanistic Human Reliability Analysis Framework: A Nuclear Power Plant Case Study

arXiv.org Artificial Intelligence

Traditional human reliability analysis (HRA) methods, such as IDHEAS-ECA, rely on expert judgment and empirical rules that often overlook the cognitive underpinnings of human error. Moreover, conducting human-in-the-loop experiments for advanced nuclear power plants is increasingly impractical due to novel interfaces and limited operational data. This study proposes a cognitive-mechanistic framework (COGMIF) that enhances the IDHEAS-ECA methodology by integrating an ACT-R-based human digital twin (HDT) with TimeGAN-augmented simulation. The ACT-R model simulates operator cognition, including memory retrieval, goal-directed procedural reasoning, and perceptual-motor execution--under high-fidelity scenarios derived from a high-temperature gas-cooled reactor (HTGR) simulator. To overcome the resource constraints of large-scale cognitive modeling, TimeGAN is trained on ACT-R-generated time-series data to produce high-fidelity synthetic operator behavior datasets. These simulations are then used to drive IDHEAS-ECA assessments, enabling scalable, mechanism-informed estimation of human error probabilities (HEPs). Comparative analyses with SPAR-H and sensitivity assessments demonstrate the robustness and practical advantages of the proposed COGMIF. This work offers a credible and computationally efficient pathway to integrate cognitive theory into industrial HRA practices. Keywords: Human Reliability, Human Digital Twins, IDHEAS-ECA, TimeGAN, Bayesian 1 Introduction Human reliability analysis (HRA) plays a pivotal role in the safety assessment of complex socio-technical systems, particularly in high-risk domains such as nuclear power generation [1]. As a fundamental component of probabilistic risk assessment (PRA), HRA aims to estimate the likelihood of human error under specific operational contexts, thereby supporting risk-informed decision-making and the design of resilient safety systems. Over the past decades, a range of structured methodologies, such as the standardized plant analysis risk-human reliability analysis (SPAR-H) [2], the technique for human error rate prediction (THERP) [3], and more recently, the integrated human event analysis system for event and condition assessment (IDHEAS-ECA) [4], have been developed to quantify human error probabilities (HEPs). While these frameworks offer operational utility, they are primarily grounded in expert judgment, predefined performance shaping factors (PSFs), and empirically derived databases, often lacking a mechanistic understanding of the cognitive processes that drive operator actions and errors. Furthermore, traditional HRA approaches are highly dependent on two major data sources: (1) retrospective analysis of operational events, and (2) human-in-the-loop (HITL) simulation experiments conducted in controlled environments.


Automatic Calibration for Membership Inference Attack on Large Language Models

arXiv.org Artificial Intelligence

Membership Inference Attacks (MIAs) have recently been employed to determine whether a specific text was part of the pre-training data of Large Language Models (LLMs). However, existing methods often misinfer non-members as members, leading to a high false positive rate, or depend on additional reference models for probability calibration, which limits their practicality. To overcome these challenges, we introduce a novel framework called Automatic Calibration Membership Inference Attack (ACMIA), which utilizes a tunable temperature to calibrate output probabilities effectively. This approach is inspired by our theoretical insights into maximum likelihood estimation during the pre-training of LLMs. We introduce ACMIA in three configurations designed to accommodate different levels of model access and increase the probability gap between members and non-members, improving the reliability and robustness of membership inference. Extensive experiments on various open-source LLMs demonstrate that our proposed attack is highly effective, robust, and generalizable, surpassing state-of-the-art baselines across three widely used benchmarks. Our code is available at: Github. 1 Introduction Large Language Models (LLMs), pre-trained on massive text corpora, have shown impressive human-level language understanding, reasoning, and decision-making capabilities [4, 28, 1, 23]. However, their tendency to memorize training data also introduces significant ethical and security concerns [14, 31, 2, 21, 22].


VISLIX: An XAI Framework for Validating Vision Models with Slice Discovery and Analysis

arXiv.org Artificial Intelligence

Real-world machine learning models require rigorous evaluation before deployment, especially in safety-critical domains like autonomous driving and surveillance. The evaluation of machine learning models often focuses on data slices, which are subsets of the data that share a set of characteristics. Data slice finding automatically identifies conditions or data subgroups where models underperform, aiding developers in mitigating performance issues. Despite its popularity and effectiveness, data slicing for vision model validation faces several challenges. First, data slicing often needs additional image metadata or visual concepts, and falls short in certain computer vision tasks, such as object detection. Second, understanding data slices is a labor-intensive and mentally demanding process that heavily relies on the expert's domain knowledge. Third, data slicing lacks a human-in-the-loop solution that allows experts to form hypothesis and test them interactively. To overcome these limitations and better support the machine learning operations lifecycle, we introduce VISLIX, a novel visual analytics framework that employs state-of-the-art foundation models to help domain experts analyze slices in computer vision models. Our approach does not require image metadata or visual concepts, automatically generates natural language insights, and allows users to test data slice hypothesis interactively. We evaluate VISLIX with an expert study and three use cases, that demonstrate the effectiveness of our tool in providing comprehensive insights for validating object detection models.


Adversarial Sample Generation for Anomaly Detection in Industrial Control Systems

arXiv.org Artificial Intelligence

--Machine learning (ML)-based intrusion detection systems (IDS) are vulnerable to adversarial attacks. It is crucial for an IDS to learn to recognize adversarial examples before malicious entities exploit them. In this paper, we generated adversarial samples using the Jacobian Saliency Map Attack (JSMA). We validate the generalization and scalability of the adversarial samples to tackle a broad range of real attacks on Industrial Control Systems (ICS). We evaluated the impact by assessing multiple attacks generated using the proposed method. The model trained with adversarial samples detected attacks with 95% accuracy on real-world attack data not used during training. The study was conducted using an operational secure water treatment (SWaT) testbed. Industrial control systems (ICS) comprise a significant portion of any state or nation's critical infrastructure (CI). Examples of such systems include water treatment plants and electric power grids, where an ICS regulates the physical processes. The physical processes consist of two primary parts: monitoring and controlling. The monitoring part maintains processes and ensures they are operating properly by measuring various signals acquired from sensors.


Is AI currently capable of identifying wild oysters? A comparison of human annotators against the AI model, ODYSSEE

arXiv.org Artificial Intelligence

Oysters are ecologically and commercially important species that require frequent monitoring to track population demographics (e.g. abundance, growth, mortality). Current methods of monitoring oyster reefs often require destructive sampling methods and extensive manual effort. Therefore, they are suboptimal for small-scale or sensitive environments. A recent alternative, the ODYSSEE model, was developed to use deep learning techniques to identify live oysters using video or images taken in the field of oyster reefs to assess abundance. The validity of this model in identifying live oysters on a reef was compared to expert and non-expert annotators. In addition, we identified potential sources of prediction error. Although the model can make inferences significantly faster than expert and non-expert annotators (39.6 s, $2.34 \pm 0.61$ h, $4.50 \pm 1.46$ h, respectively), the model overpredicted the number of live oysters, achieving lower accuracy (63\%) in identifying live oysters compared to experts (74\%) and non-experts (75\%) alike. Image quality was an important factor in determining the accuracy of the model and the annotators. Better quality images improved human accuracy and worsened model accuracy. Although ODYSSEE was not sufficiently accurate, we anticipate that future training on higher-quality images, utilizing additional live imagery, and incorporating additional annotation training classes will greatly improve the model's predictive power based on the results of this analysis. Future research should address methods that improve the detection of living vs. dead oysters.


The Multimodal Paradox: How Added and Missing Modalities Shape Bias and Performance in Multimodal AI

arXiv.org Artificial Intelligence

Multimodal learning, which integrates diverse data sources such as images, text, and structured data, has proven superior to unimodal counterparts in high-stakes decision-making. However, while performance gains remain the gold standard for evaluating multimodal systems, concerns around bias and robustness are frequently overlooked. In this context, this paper explores two key research questions (RQs): (i) RQ1 examines whether adding a modality con-sistently enhances performance and investigates its role in shaping fairness measures, assessing whether it mitigates or amplifies bias in multimodal models; (ii) RQ2 investigates the impact of missing modalities at inference time, analyzing how multimodal models generalize in terms of both performance and fairness. Our analysis reveals that incorporating new modalities during training consistently enhances the performance of multimodal models, while fairness trends exhibit variability across different evaluation measures and datasets. Additionally, the absence of modalities at inference degrades performance and fairness, raising concerns about its robustness in real-world deployment. We conduct extensive experiments using multimodal healthcare datasets containing images, time series, and structured information to validate our findings.