AITopics | statistical feature

Collaborating Authors

statistical feature

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

OSTAR: Optimized Statistical Text-classifier with Adversarial Resistance

Neural Information Processing SystemsJun-18-2026, 14:34:33 GMT

The advancements in generative models and the real-world attack of machinegenerated text(MGT) create a demand for more robust detection methods. The existing MGT detection methods for adversarial environments primarily consist of manually designed statistical-based methods and fine-tuned classifier-based approaches. Statistical-based methods extract intrinsic features but suffer from rigid decision boundaries vulnerable to adaptive attacks, while fine-tuned classifiers achieve outstanding performance at the cost of overfitting to superficial textual feature. We argue that the key to detection in current adversarial environments lies in how to extract intrinsic invariant features and ensure that the classifier possesses dynamic adaptability. In that case, we propose OSTAR, a novel MGT detection framework designed for adversarial environments which composed of a statistical enhanced classifier and a Multi-Faceted Contrastive Learning(MFCL).

large language model, machine learning, natural language, (22 more...)

Neural Information Processing Systems

Country: Asia (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology > Security & Privacy (0.95)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)

Add feedback

Hybrid Feature Learning with Time Series Embeddings for Equipment Anomaly Prediction

Yasuno, Takato

arXiv.org Machine LearningFeb-18-2026

In predictive maintenance of equipment, deep learning-based time series anomaly detection has garnered significant attention; however, pure deep learning approaches often fail to achieve sufficient accuracy on real-world data. This study proposes a hybrid approach that integrates 64-dimensional time series embeddings from Granite TinyTimeMixer with 28-dimensional statistical features based on domain knowledge for HVAC equipment anomaly prediction tasks. Specifically, we combine time series embeddings extracted from a Granite TinyTimeMixer encoder fine-tuned with LoRA (Low-Rank Adaptation) and 28 types of statistical features including trend, volatility, and drawdown indicators, which are then learned using a LightGBM gradient boosting classifier. In experiments using 64 equipment units and 51,564 samples, we achieved Precision of 91--95\% and ROC-AUC of 0.995 for anomaly prediction at 30-day, 60-day, and 90-day horizons. Furthermore, we achieved production-ready performance with a false positive rate of 1.1\% or less and a detection rate of 88--94\%, demonstrating the effectiveness of the system for predictive maintenance applications. This work demonstrates that practical anomaly detection systems can be realized by leveraging the complementary strengths between deep learning's representation learning capabilities and statistical feature engineering.

artificial intelligence, machine learning, statistical feature, (11 more...)

arXiv.org Machine Learning

2602.15089

Country:

North America > United States (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Construction & Engineering > HVAC (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

578dbd6ddd458c08f623e15c9f1c1ce8-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsFeb-13-2026, 20:20:07 GMT

artificial intelligence, dataset, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas (0.04)
North America > United States > California > Orange County > Irvine (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry:

Government (0.46)
Energy (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

RAG-HAR: Retrieval Augmented Generation-based Human Activity Recognition

Sivaroopan, Nirhoshan, Karunarathna, Hansi, Madarasingha, Chamara, Jayasumana, Anura, Thilakarathna, Kanchana

arXiv.org Artificial IntelligenceDec-11-2025

Abstract--Human Activity Recognition (HAR) underpins applications in healthcare, rehabilitation, fitness tracking, and smart environments, yet existing deep learning approaches demand dataset-specific training, large labeled corpora, and significant computational resources. We introduce RAG-HAR, a training-free retrieval-augmented framework that leverages large language models (LLMs) for HAR. RAG-HAR computes lightweight statistical descriptors, retrieves semantically similar samples from a vector database, and uses this contextual evidence to make LLM based activity identification. We further enhance RAG-HAR by first applying prompt optimization and introducing an LLM-based activity descriptor that generates context-enriched vector databases for delivering accurate and highly relevant contextual information. Along with these mechanisms, RAG-HAR achieves state-of-the-art performance across six diverse HAR benchmarks. RAG-HAR moves beyond known behaviors, enabling the recognition and meaningful labelling of multiple unseen human activities. Human Activity Recognition (HAR) from wearable sensor data enables continuous monitoring, anomaly detection, and personalized interventions across healthcare [3], rehabilitation [31], fitness [28], and smart environments [14]. Despite wide-ranging applications, HAR remains challenging due to inter-subject variability, differences in sensor placement, device heterogeneity, and subtle distinctions between activities that exhibit similar motion patterns [39]. Those challenges create a strong need for accurate, generalizable, and cost-efficient solutions. Deep learning (DL) has become the dominant paradigm for HAR, with convolutional neural networks (CNNs) [6], [43], recurrent architectures [15], [17], and attention-based models [2] achieving state-of-the-art (SOT A) performance on benchmark datasets. However, DL-based HAR faces three critical limitations: (i) costly and time-consuming training procedures tailored to each dataset; (ii) performance degradation under domain shift across subjects, sensor placements, or devices; and (iii) heavy dependence on large labeled datasets [7], [35]. Despite advances in DL, these limitations leave HAR without a practical solution that is simultaneously training-free, generalizable, and scalable. To address this gap, this paper explores a fundamentally different paradigm: leveraging Large Language Models (LLMs) as reasoning engines for HAR.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2512.08984

Genre:

Research Report (1.00)
Workflow (0.68)

Industry:

Information Technology > Smart Houses & Appliances (0.68)
Health & Medicine > Consumer Health (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Attention-Guided Feature Fusion (AGFF) Model for Integrating Statistical and Semantic Features in News Text Classification

Zare, Mohammad

arXiv.org Artificial IntelligenceNov-24-2025

News text classification is a crucial task in natural language processing, essential for organizing and filtering the massive volume of digital content. Traditional methods typically rely on statistical features like term frequencies or TF-IDF values, which are effective at capturing word-level importance but often fail to reflect contextual meaning. In contrast, modern deep learning approaches utilize semantic features to understand word usage within context, yet they may overlook simple, high-impact statistical indicators. This paper introduces an Attention-Guided Feature Fusion (AGFF) model that combines statistical and semantic features in a unified framework. The model applies an attention-based mechanism to dynamically determine the relative importance of each feature type, enabling more informed classification decisions. Through evaluation on benchmark news datasets, the AGFF model demonstrates superior performance compared to both traditional statistical models and purely semantic deep learning models. The results confirm that strategic integration of diverse feature types can significantly enhance classification accuracy. Additionally, ablation studies validate the contribution of each component in the fusion process. The findings highlight the model's ability to balance and exploit the complementary strengths of statistical and semantic representations, making it a practical and effective solution for real-world news classification tasks.

classification, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2511.17184

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Leveraging NTPs for Efficient Hallucination Detection in VLMs

Azachi, Ofir, Eliyahu, Kfir, Ani, Eyal El, Himelstein, Rom, Reichart, Roi, Pinter, Yuval, Calderon, Nitay

arXiv.org Artificial IntelligenceNov-17-2025

Hallucinations of vision-language models (VLMs), which are misalignments between visual content and generated text, undermine the reliability of VLMs. One common approach for detecting them employs the same VLM, or a different one, to assess generated outputs. This process is computationally intensive and increases model latency. In this paper, we explore an efficient on-the-fly method for hallucination detection by training traditional ML models over signals based on the VLM's next-token probabilities (NTPs). NTPs provide a direct quantification of model uncertainty. We hypothesize that high uncertainty (i.e., a low NTP value) is strongly associated with hallucinations. To test this, we introduce a dataset of 1,400 human-annotated statements derived from VLM-generated content, each labeled as hallucinated or not, and use it to test our NTP-based lightweight method. Our results demonstrate that NTP-based features are valuable predictors of hallucinations, enabling fast and simple ML models to achieve performance comparable to that of strong VLMs. Furthermore, augmenting these NTPs with linguistic NTPs, computed by feeding only the generated text back into the VLM, enhances hallucination detection performance. Finally, integrating hallucination prediction scores from VLMs into the NTP-based models led to better performance than using either VLMs or NTPs alone. We hope this study paves the way for simple, lightweight solutions that enhance the reliability of VLMs.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2509.20379

Country: Asia (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.30)

Add feedback

Wavelet-Based Feature Extraction and Unsupervised Clustering for Parity Detection: A Feature Engineering Perspective

Mutlu, Ertugrul

arXiv.org Artificial IntelligenceNov-4-2025

This paper explores a deliberately over-engineered approach to the classical problem of parity detection -- determining whether a number is odd or even -- by combining wavelet-based feature extraction with unsupervised clustering. Instead of relying on modular arithmetic, integers are transformed into wavelet-domain representations, from which multi-scale statistical features are extracted and clustered using the k-means algorithm. The resulting feature space reveals meaningful structural differences between odd and even numbers, achieving a classification accuracy of approximately 69.67% without any label supervision. These results suggest that classical signal-processing techniques, originally designed for continuous data, can uncover latent structure even in purely discrete symbolic domains. Beyond parity detection, the study provides an illustrative perspective on how feature engineering and clustering may be repurposed for unconventional machine learning problems, potentially bridging symbolic reasoning and feature-based learning.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2511.00071

Country: North America > United States (0.96)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Topology of Currencies: Persistent Homology for FX Co-movements: A Comparative Clustering Study

de Jeneret, Pattravadee de Favereau, Diamantis, Ioannis

arXiv.org Machine LearningOct-23-2025

This study investigates whether Topological Data Analysis (TDA) can provide additional insights beyond traditional statistical methods in clustering currency behaviours. We focus on the foreign exchange (FX) market, which is a complex system often exhibiting non-linear and high-dimensional dynamics that classical techniques may not fully capture. We compare clustering results based on TDA-derived features versus classical statistical features using monthly logarithmic returns of 13 major currency exchange rates (all against the euro). Two widely-used clustering algorithms, \(k\)-means and Hierarchical clustering, are applied on both types of features, and cluster quality is evaluated via the Silhouette score and the Calinski-Harabasz index. Our findings show that TDA-based feature clustering produces more compact and well-separated clusters than clustering on traditional statistical features, particularly achieving substantially higher Calinski-Harabasz scores. However, all clustering approaches yield modest Silhouette scores, underscoring the inherent difficulty of grouping FX time series. The differing cluster compositions under TDA vs. classical features suggest that TDA captures structural patterns in currency co-movements that conventional methods might overlook. These results highlight TDA as a valuable complementary tool for analysing financial time series, with potential applications in risk management where understanding structural co-movements is crucial.

data mining, machine learning, pattravadee de favereau de jeneret, (15 more...)

arXiv.org Machine Learning

2510.19306

Country:

Europe (0.68)
North America > United States (0.28)
Asia > Middle East > Republic of Türkiye (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Government (1.00)
Banking & Finance > Trading (1.00)
Banking & Finance > Economy (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

QRïS: A Preemptive Novel Method for Quishing Detection Through Structural Features of QR

Akram, Muhammad Wahid, Sood, Keshav, Hassan, Muneeb Ul

arXiv.org Artificial IntelligenceOct-21-2025

Globally, individuals and organizations employ Quick Response (QR) codes for swift and convenient communication. Leveraging this, cybercriminals embed falsify and misleading information in QR codes to launch various phishing attacks which termed as Quishing. Many former studies have introduced defensive approaches to preclude Quishing such as by classifying the embedded content of QR codes and then label the QR codes accordingly, whereas other studies classify them using visual features (i.e., deep features, histogram density analysis features). However, these approaches mainly rely on black-box techniques which do not clearly provide interpretability and transparency to fully comprehend and reproduce the intrinsic decision process; therefore, having certain obvious limitations includes the approaches' trust, accountability, issues in bias detection, and many more. We proposed QRïS, the pioneer method to classify QR codes through the comprehensive structural analysis of a QR code which helps to identify phishing QR codes beforehand. Our classification method is clearly transparent which makes it reproducible, scalable, and easy to comprehend. First, we generated QR codes dataset (i.e. 400,000 samples) using recently published URLs datasets [1], [2]. Then, unlike black-box models, we developed a simple algorithm to extract 24 structural features from layout patterns present in QR codes. Later, we train the machine learning models on the harvested features and obtained accuracy of up to 83.18%. To further evaluate the effectiveness of our approach, we perform the comparative analysis of proposed method with relevant contemporary studies. Lastly, for real-world deployment and validation, we developed a mobile app which assures the feasibility of the proposed solution in real-world scenarios which eventually strengthen the applicability of the study.

artificial intelligence, machine learning, qr code, (18 more...)

arXiv.org Artificial Intelligence

2510.17175

Country: Oceania > Australia (0.28)

Genre: Research Report > New Finding (0.93)

Industry: Information Technology > Security & Privacy (1.00)

Technology: