Goto

Collaborating Authors

 Performance Analysis


Hybrid GCN-GRU Model for Anomaly Detection in Cryptocurrency Transactions

arXiv.org Artificial Intelligence

Blockchain transaction networks are complex, with evolving temporal patterns and inter - node relationships. To detect illicit activi - ties, we propose a hybrid GCN - GRU model that captu res both structural and sequential features. Using real Bitcoin transaction data (2020 - 2024), our model achieved 0.9470 Accuracy and 0.9807 AUC - ROC, outperform - ing all baselines.


Paladin: Defending LLM-enabled Phishing Emails with a New Trigger-Tag Paradigm

arXiv.org Artificial Intelligence

With the rapid development of large language models, the potential threat of their malicious use, particularly in generating phishing content, is becoming increasingly prevalent. Leveraging the capabilities of LLMs, malicious users can synthesize phishing emails that are free from spelling mistakes and other easily detectable features. Furthermore, such models can generate topic-specific phishing messages, tailoring content to the target domain and increasing the likelihood of success. Detecting such content remains a significant challenge, as LLM-generated phishing emails often lack clear or distinguishable linguistic features. As a result, most existing semantic-level detection approaches struggle to identify them reliably. While certain LLM-based detection methods have shown promise, they suffer from high computational costs and are constrained by the performance of the underlying language model, making them impractical for large-scale deployment. In this work, we aim to address this issue. We propose Paladin, which embeds trigger-tag associations into vanilla LLM using various insertion strategies, creating them into instrumented LLMs. When an instrumented LLM generates content related to phishing, it will automatically include detectable tags, enabling easier identification. Based on the design on implicit and explicit triggers and tags, we consider four distinct scenarios in our work. We evaluate our method from three key perspectives: stealthiness, effectiveness, and robustness, and compare it with existing baseline methods. Experimental results show that our method outperforms the baselines, achieving over 90% detection accuracy across all scenarios.


A transformer-based generative model for planetary systems

arXiv.org Artificial Intelligence

Numerical calculations of planetary system formation are very demanding in terms of computing power. These synthetic planetary systems can however provide access to correlations, as predicted in a given numerical framework, between the properties of planets in the same system. Such correlations can, in return, be used in order to guide and prioritize observational campaigns aiming at discovering some types of planets, as Earth-like planets. Our goal is to develop a generative model which is capable of capturing correlations and statistical relationships between planets in the same system. Such a model, trained on the Bern model, offers the possibility to generate large number of synthetic planetary systems with little computational cost, that can be used, for example, to guide observational campaigns. Our generative model is based on the transformer architecture which is well-known to efficiently capture correlations in sequences and is at the basis of all modern Large Language Models. To assess the validity of the generative model, we perform visual and statistical comparisons, as well as a machine learning driven tests. Finally, as a use case example, we consider the TOI-469 system, in which we aim at predicting the possible properties of planets c and d, based on the properties of planet b (the first that has been detected). We show using different comparison methods that the properties of systems generated by our model are very similar to the ones of the systems computed directly by the Bern model. We also show in the case of the TOI-469 system, that using the generative model allows to predict the properties of planets not yet observed, based on the properties of the already observed planet. We provide our model to the community on our website www.ai4exoplanets.com.


Benchmarking Vision Transformers and CNNs for Thermal Photovoltaic Fault Detection with Explainable AI Validation

arXiv.org Artificial Intelligence

Artificial intelligence deployment for automated photovoltaic (PV) monitoring faces interpretability barriers that limit adoption in energy infrastructure applications. While deep learning achieves high accuracy in thermal fault detection, validation that model decisions align with thermal physics principles remains lacking, creating deployment hesitancy where understanding model reasoning is critical. This study provides a systematic comparison of convolutional neural networks (ResNet-18, EfficientNet-B0) and vision transformers (ViT-Tiny, Swin-Tiny) for thermal PV fault detection, using XRAI saliency analysis to assess alignment with thermal physics principles. This represents the first systematic comparison of CNNs and vision transformers for thermal PV fault detection with physics-validated interpretability. Evaluation on 20,000 infrared images spanning normal operation and 11 fault categories shows that Swin Transformer achieves the highest performance (94% binary accuracy; 73% multiclass accuracy) compared to CNN approaches. XRAI analysis reveals that models learn physically meaningful features, such as localized hotspots for cell defects, linear thermal paths for diode failures, and thermal boundaries for vegetation shading, consistent with expected thermal signatures. However, performance varies significantly across fault types: electrical faults achieve strong detection (F1-scores >0.90) while environmental factors like soiling remain challenging (F1-scores 0.20-0.33), indicating limitations imposed by thermal imaging resolution. The thermal physics-guided interpretability approach provides methodology for validating AI decision-making in energy monitoring applications, addressing deployment barriers in renewable energy infrastructure.


Random Forest Stratified K-Fold Cross Validation on SYN DoS Attack SD-IoV

arXiv.org Artificial Intelligence

In response to the prevalent concern of TCP SYN flood attacks within the context of Software-Defined Internet of Vehicles (SD-IoV), this study addresses the significant challenge of network security in rapidly evolving vehicular communication systems. This research focuses on optimizing a Random Forest Classifier model to achieve maximum accuracy and minimal detection time, thereby enhancing vehicular network security. The methodology involves preprocessing a dataset containing SYN attack instances, employing feature scaling and label encoding techniques, and applying Stratified K-Fold cross-validation to target key metrics such as accuracy, precision, recall, and F1-score. This research achieved an average value of 0.999998 for all metrics with a SYN DoS attack detection time of 0.24 seconds. Results show that the fine-tuned Random Forest model, configured with 20 estimators and a depth of 10, effectively differentiates between normal and malicious traffic with high accuracy and minimal detection time, which is crucial for SD-IoV networks. This approach marks a significant advancement and introduces a state-of-the-art algorithm in detecting SYN flood attacks, combining high accuracy with minimal detection time. It contributes to vehicular network security by providing a robust solution against TCP SYN flood attacks while maintaining network efficiency and reliability.


The Protocol Genome A Self Supervised Learning Framework from DICOM Headers

arXiv.org Artificial Intelligence

In this paper, we introduce the Protocol Genome, a self-supervised learning system that learns correlations from DICOM headers and achieves AUROC 0.901 (vs 0.847 baseline) and ECE 0.036 (vs 0.058) on fully held-out external validation. Our method also improves calibration and robustness across modalities (CT, MRI, CXR) and vendors. Clinical imaging is funneled through PACS/DICOM, where procedure choices (scanner make/model, sequence, kernel, kVp, TR/TE, and slice thickness) have consequences for contrast, noise, and artifact. These latent confounders impede the generalization of image-only networks across sites. We consider structured DICOM headers as a label and learn protocol-aware but clinically robust image representations. Protocol Genome obtains tokenized embeddings of de-identified header fields and models them along with image features using: (1) protocol-image contrastive learning, (2) masked protocol prediction, and (3) protocol-protocol translation. With 1.26M studies (7 health systems, 31 scanners, 3 vendors; CT, MR, CR/DR), we experiment on: (A) chest CT triage for PE, (B) brain MRI glioma grading, and (C) chest radiograph cardiomegaly detection. Relative to strong SSL baselines (SimCLR, MAE) as well as ImageNet transfer, Protocol Genome (+0.046: PE, +0.058: glioma, +0.041: cardiomegaly) is associated with higher external AUROC; 25-37% calibration improvements are obtained (p < 0.01, DeLong tests). While the gains may be task-dependent, they are preserved with 10-20% of labeled data. From a clinical point of view, the technique reduces false positives at protocol borders and is applicable in a PACS (DICOM C-FIND/C-MOVE, DICOMweb QIDO/WADO). We publish a model card and deployment guide, complete with both de-identification and bias audits.


VLMs-in-the-Wild: Bridging the Gap Between Academic Benchmarks and Enterprise Reality

arXiv.org Artificial Intelligence

--Open-source Vision-Language Models show immense promise for enterprise applications, yet a critical disconnect exists between academic evaluation and enterprise deployment requirements. Current benchmarks rely heavily on multiple-choice questions and synthetic data, failing to capture the complexity of real-world business applications like social media content analysis. This paper introduces VLM-in-the-Wild (ViLD), a comprehensive framework to bridge this gap by evaluating VLMs on operational enterprise requirements. We define ten business-critical tasks: logo detection, OCR, object detection, human presence and demographic analysis, human activity and appearance analysis, scene detection, camera perspective and media quality assessment, dominant colors, comprehensive description, and NSFW detection. T o this framework, we bring an innovative BlockWeaver Algorithm that solves the challenging problem of comparing unordered, variably-grouped OCR outputs from VLMs without relying on embeddings or LLMs, achieving remarkable speed and reliability. Besides, ViLD's methodology avoids traditional bounding boxes, which are ill-suited for generative VLMs, in favour of a novel spatial-temporal grid system that captures localisation information effectively for both images and videos. T o demonstrate efficacy of ViLD, we constructed a new benchmark dataset of 7,500 diverse samples, carefully stratified from a corpus of one million real-world images and videos. ViLD provides actionable insights by combining semantic matching (both embedding-based and LLMas-a-judge approaches), traditional metrics, and novel methods to measure the completeness and faithfulness of descriptive outputs. By benchmarking leading open-source VLMs (Qwen, MIMO, and InternVL) against a powerful proprietary baseline as per ViLD framework, we provide one of the first industry-grounded, task-driven assessment of VLMs capabilities, offering actionable insights for their deployment in enterprise environments. Vision-Language Models (VLMs) have fundamentally transformed the landscape of artificial intelligence, enabling systems to understand and reason about visual content through natural language.


Frustratingly Easy Feature Reconstruction for Out-of-Distribution Detection

arXiv.org Artificial Intelligence

Out-of-distribution (OOD) detection helps models identify data outside the training categories, crucial for security applications. While feature-based post-hoc methods address this by evaluating data differences in the feature space without changing network parameters, they often require access to training data, which may not be suitable for some data privacy scenarios. This may not be suitable in scenarios where data privacy protection is a concern. In this paper, we propose a simple yet effective post-hoc method, termed Classifier-based Feature Reconstruction (ClaFR), from the perspective of subspace projection. It first performs an orthogonal decomposition of the classifier's weights to extract the class-known subspace, then maps the original data features into this subspace to obtain new data representations. Subsequently, the OOD score is determined by calculating the feature reconstruction error of the data within the subspace. Compared to existing OOD detection algorithms, our method does not require access to training data while achieving leading performance on multiple OOD benchmarks. Our code is released at https://github.com/Aie0923/ClaFR.


Impact of Neuron Models on Spiking Neural Networks performance. A Complexity Based Classification Approach

arXiv.org Artificial Intelligence

This study explores how the selection of neuron models and learning rules impacts the classification performance of Spiking Neural Networks (SNNs), with a focus on applications in bio-signal processing. We compare biologically inspired neuron models, including Leaky Integrate-and-Fire (LIF), metaneurons, and probabilistic Levy-Baxter (LB) neurons, across multiple learning rules, including spike-timing-dependent plasticity (STDP), tempotron, and reward-modulated updates. A novel element of this work is the integration of a complexity-based decision mechanism into the evaluation pipeline. Using Lempel-Ziv Complexity (LZC), a measure related to entropy rate, we quantify the structural regularity of spike trains and assess classification outcomes in a consistent and interpretable manner across different SNN configurations. To investigate neural dynamics and assess algorithm performance, we employed synthetic datasets with varying temporal dependencies and stochasticity levels. These included Markov and Poisson processes, well-established models to simulate neuronal spike trains and capture the stochastic firing behavior of biological neurons.Validation of synthetic Poisson and Markov-modeled data reveals clear performance trends: classification accuracy depends on the interaction between neuron model, network size, and learning rule, with the LZC-based evaluation highlighting configurations that remain robust to weak or noisy signals. This work delivers a systematic analysis of how neuron model selection interacts with network parameters and learning strategies, supported by a novel complexity-based evaluation approach that offers a consistent benchmark for SNN performance.


Predicting Market Troughs: A Machine Learning Approach with Causal Interpretation

arXiv.org Machine Learning

This paper provides robust, new evidence on the causal drivers of market troughs. We demonstrate that conclusions about these triggers are critically sensitive to model specification, moving beyond restrictive linear models with a flexible DML average partial effect causal machine learning framework. Our robust estimates identify the volatility of options-implied risk appetite and market liquidity as key causal drivers, relationships misrepresented or obscured by simpler models. These findings provide high-frequency empirical support for intermediary asset pricing theories. This causal analysis is enabled by a high-performance nowcasting model that accurately identifies capitulation events in real-time.