Goto

Collaborating Authors

 ebruary 6


CORTEX: A Cost-Sensitive Rule and Tree Extraction Method

arXiv.org Artificial Intelligence

Tree-based and rule-based machine learning models play pivotal roles in explainable artificial intelligence (XAI) due to their unique ability to provide explanations in the form of tree or rule sets that are easily understandable and interpretable, making them essential for applications in which trust in model decisions is necessary. These transparent models are typically used in surrogate modeling, a post-hoc XAI approach for explaining the logic of black-box models, enabling users to comprehend and trust complex predictive systems while maintaining competitive performance. This study proposes the Cost-Sensitive Rule and Tree Extraction (CORTEX) method, a novel rule-based XAI algorithm grounded in the multi-class cost-sensitive decision tree (CSDT) method. The original version of the CSDT is extended to classification problems with more than two classes by inducing the concept of an n-dimensional class-dependent cost matrix. The performance of CORTEX as a rule-extractor XAI method is compared to other post-hoc tree and rule extraction methods across several datasets with different numbers of classes. Several quantitative evaluation metrics are employed to assess the explainability of generated rule sets. Our findings demonstrate that CORTEX is competitive with other tree-based methods and can be superior to other rule-based methods across different datasets. The extracted rule sets suggest the advantages of using the CORTEX method over other methods by producing smaller rule sets with shorter rules on average across datasets with a diverse number of classes. Overall, the results underscore the potential of CORTEX as a powerful XAI tool for scenarios that require the generation of clear, human-understandable rules while maintaining good predictive performance.


DeepIFSAC: Deep Imputation of Missing Values Using Feature and Sample Attention within Contrastive Framework

arXiv.org Machine Learning

Missing values of varying patterns and rates in real-world tabular data pose a significant challenge in developing reliable data-driven models. Existing missing value imputation methods use statistical and traditional machine learning and are ineffective when the missing rate is high and not at random. This paper explores row and column attention in tabular data as between-feature and between-sample attention in a novel framework to reconstruct missing values. The proposed method uses the CutMix data augmentation within a contrastive learning framework to improve the uncertainty of missing value estimation. The performance and generalizability of trained imputation models are evaluated on set-aside test data folds with missing values. The proposed framework outperforms nine state-of-the-art imputation methods across several missing value types and rates (10\%-50\%) on a diverse selection of twelve tabular data sets. We evaluate the quality of imputed data using real-world electronic health records with missing values, demonstrating our proposed framework's superiority to state-of-the-art statistical, machine learning, and deep imputation methods. This paper highlights the heterogeneity of tabular data sets to recommend imputation methods based on missing value types and data characteristics.


General Time-series Model for Universal Knowledge Representation of Multivariate Time-Series data

arXiv.org Artificial Intelligence

Universal knowledge representation is a central problem for multivariate time series(MTS) foundation models and yet remains open. This paper investigates this problem from the first principle and it makes four folds of contributions. First, a new empirical finding is revealed: time series with different time granularities (or corresponding frequency resolutions) exhibit distinct joint distributions in the frequency domain. This implies a crucial aspect of learning universal knowledge, one that has been overlooked by previous studies. Second, a novel Fourier knowledge attention mechanism is proposed to enable learning time granularity-aware representations from both the temporal and frequency domains. Third, an autoregressive blank infilling pre-training framework is incorporated to time series analysis for the first time, leading to a generative tasks agnostic pre-training strategy. To this end, we develop the General Time-series Model (GTM), a unified MTS foundation model that addresses the limitation of contemporary time series models, which often require token, pre-training, or model-level customizations for downstream tasks adaption. Fourth, extensive experiments show that GTM outperforms state-of-the-art (SOTA) methods across all generative tasks, including long-term forecasting, anomaly detection, and imputation.


Reproduction Research of FSA-Benchmark

arXiv.org Artificial Intelligence

Fail-slow disks pose a distinct challenge due to their subtle yet insidious nature. They exhibit performance degradation that may not be immediately visible but can lead to significant slowdowns and reliability issues within large-scale storage systems. Traditional redundancy and fail-over mechanisms are designed to address outright disk failures but are less effective at detecting and mitigating the gradual performance decline associated with fail-slow disks. The two primary symptoms of fail-slow disks--consistently higher latency compared to peer disks and recurrent abnormal spikes--make it difficult to establish fixed thresholds for alerts or accurately track performance trends. In light of these challenges, there is an urgent need for advanced detection mechanisms that can proactively identify and address fail-slow conditions.


Causal bandits with backdoor adjustment on unknown Gaussian DAGs

arXiv.org Artificial Intelligence

The causal bandit problem aims to sequentially learn the intervention that maximizes the expectation of a reward variable within a system governed by a causal graph. Most existing approaches assume prior knowledge of the graph structure, or impose unrealistically restrictive conditions on the graph. In this paper, we assume a Gaussian linear directed acyclic graph (DAG) over arms and the reward variable, and study the causal bandit problem when the graph structure is unknown. We identify backdoor adjustment sets for each arm using sequentially generated experimental and observational data during the decision process, which allows us to estimate causal effects and construct upper confidence bounds. By integrating estimates from both data sources, we develop a novel bandit algorithm, based on modified upper confidence bounds, to sequentially determine the optimal intervention. We establish both case-dependent and case-independent upper bounds on the cumulative regret for our algorithm, which improve upon the bounds of the standard multi-armed bandit algorithms. Our empirical study demonstrates its advantage over existing methods with respect to cumulative regret and computation time.


Modified Adaptive Tree-Structured Parzen Estimator for Hyperparameter Optimization

arXiv.org Artificial Intelligence

In this paper we review hyperparameter optimization methods for machine learning models, with a particular focus on the Adaptive Tree-Structured Parzen Estimator (ATPE) algorithm. We propose several modifications to ATPE and assess their efficacy on a diverse set of standard benchmark functions. Experimental results demonstrate that the proposed modifications significantly improve the effectiveness of ATPE hyperparameter optimization on selected benchmarks, a finding that holds practical relevance for their application in real-world machine learning / optimization tasks. In machine learning, the performance of a model heavily depends on the correct choice of hyperparameters, such as the learning rate, the number of layers in a neural network, or specific regularization techniques. These hyperparameters form a multidimensional space where some dimensions are continuous (e.g., the learning rate), while others are discrete (e.g., the number of network layers). The task of Hyperparameter Optimization (HPO) aims to find the best combination of these hyperparameters by searching this space in a way that optimizes a predefined objective function. In supervised learning, this function is usually a loss function, which quantifies an error between the predictions of the model and the true values. HPO is applicable across a wide range of machine learning models, as most optimization techniques are agnostic to the underlying model type. The core requirement for any HPO algorithm is to define the hyperparameter space and the objective function. However, HPO presents specific challenges that separate it from other optimization problems, making it a unique area in the field. Each evaluation of the objective function requires training the considered machine learning model from scratch, which is often the most time-consuming part of the optimization process. As a result, when designing HPO algorithms, the focus is less on the internal computational efficiency of the optimizer but rather on minimizing the number of objective function evaluations, while still maintaining good predictive performance.


Offline Learning for Combinatorial Multi-armed Bandits

arXiv.org Artificial Intelligence

The combinatorial multi-armed bandit (CMAB) is a fundamental sequential decision-making framework, extensively studied over the past decade. However, existing work primarily focuses on the online setting, overlooking the substantial costs of online interactions and the readily available offline datasets. To overcome these limitations, we introduce Off-CMAB, the first offline learning framework for CMAB. Central to our framework is the combinatorial lower confidence bound (CLCB) algorithm, which combines pessimistic reward estimations with combinatorial solvers. To characterize the quality of offline datasets, we propose two novel data coverage conditions and prove that, under these conditions, CLCB achieves a near-optimal suboptimality gap, matching the theoretical lower bound up to a logarithmic factor. We validate Off-CMAB through practical applications, including learning to rank, large language model (LLM) caching, and social influence maximization, showing its ability to handle nonlinear reward functions, general feedback models, and out-of-distribution action samples that excludes optimal or even feasible actions. Extensive experiments on synthetic and real-world datasets further highlight the superior performance of CLCB.


Machine Learning Strategies for Parkinson Tremor Classification Using Wearable Sensor Data

arXiv.org Artificial Intelligence

Parkinson's disease (PD) is a neurological disorder requiring early and accurate diagnosis for effective management. Machine learning (ML) has emerged as a powerful tool to enhance PD classification and diagnostic accuracy, particularly by leveraging wearable sensor data. This survey comprehensively reviews current ML methodologies used in classifying Parkinsonian tremors, evaluating various tremor data acquisition methodologies, signal preprocessing techniques, and feature selection methods across time and frequency domains, highlighting practical approaches for tremor classification. The survey explores ML models utilized in existing studies, ranging from traditional methods such as Support Vector Machines (SVM) and Random Forests to advanced deep learning architectures like Convolutional Neural Networks (CNN) and Long Short-Term Memory networks (LSTM). We assess the efficacy of these models in classifying tremor patterns associated with PD, considering their strengths and limitations. Furthermore, we discuss challenges and discrepancies in current research and broader challenges in applying ML to PD diagnosis using wearable sensor data. We also outline future research directions to advance ML applications in PD diagnostics, providing insights for researchers and practitioners.


Vision-based autonomous structural damage detection using data-driven methods

arXiv.org Artificial Intelligence

This study addresses the urgent need for efficient and accurate damage detection in wind turbine structures, a crucial component of renewable energy infrastructure. Traditional inspection methods, such as manual assessments and non-destructive testing (NDT), are often costly, time-consuming, and prone to human error. To tackle these challenges, this research investigates advanced deep learning algorithms for vision-based structural health monitoring (SHM). A dataset of wind turbine surface images, featuring various damage types and pollution, was prepared and augmented for enhanced model training. Three algorithms-YOLOv7, its lightweight variant, and Faster R-CNN- were employed to detect and classify surface damage. The models were trained and evaluated on a dataset split into training, testing, and evaluation subsets (80%-10%-10%). Results indicate that YOLOv7 outperformed the others, achieving 82.4% mAP@50 and high processing speed, making it suitable for real-time inspections. By optimizing hyperparameters like learning rate and batch size, the models' accuracy and efficiency improved further. YOLOv7 demonstrated significant advancements in detection precision and execution speed, especially for real-time applications. However, challenges such as dataset limitations and environmental variability were noted, suggesting future work on segmentation methods and larger datasets. This research underscores the potential of vision-based deep learning techniques to transform SHM practices by reducing costs, enhancing safety, and improving reliability, thus contributing to the sustainable maintenance of critical infrastructure and supporting the longevity of wind energy systems.


Assessment of the January 2025 Los Angeles County wildfires: A multi-modal analysis of impact, response, and population exposure

arXiv.org Artificial Intelligence

This study presents a comprehensive analysis of four significant California wildfires: Palisades, Eaton, Kenneth, and Hurst, examining their impacts through multiple dimensions, including land cover change, jurisdictional management, structural damage, and demographic vulnerability. Using the Chebyshev-Kolmogorov-Arnold network model applied to Sentinel-2 imagery, the extent of burned areas was mapped, ranging from 315.36 to 10,960.98 hectares. Our analysis revealed that shrubland ecosystems were consistently the most affected, comprising 57.4-75.8% of burned areas across all events. The jurisdictional assessment demonstrated varying management complexities, from singular authority (98.7% in the Palisades Fire) to distributed management across multiple agencies. A structural impact analysis revealed significant disparities between urban interface fires (Eaton: 9,869 structures; Palisades: 8,436 structures) and rural events (Kenneth: 24 structures; Hurst: 17 structures). The demographic analysis showed consistent gender distributions, with 50.9% of the population identified as female and 49.1% as male. Working-age populations made up the majority of the affected populations, ranging from 53.7% to 54.1%, with notable temporal shifts in post-fire periods. The study identified strong correlations between urban interface proximity, structural damage, and population exposure. The Palisades and Eaton fires affected over 20,000 people each, compared to fewer than 500 in rural events. These findings offer valuable insights for the development of targeted wildfire management strategies, particularly in wildland urban interface zones, and emphasize the need for age- and gender-conscious approaches in emergency response planning.