Ortega, Antonio
AutoML for Multi-Class Anomaly Compensation of Sensor Drift
Schaller, Melanie, Kruse, Mathis, Ortega, Antonio, Lindauer, Marius, Rosenhahn, Bodo
Addressing sensor drift is essential in industrial measurement systems, where precise data output is necessary for maintaining accuracy and reliability in monitoring processes, as it progressively degrades the performance of machine learning models over time. Our findings indicate that the standard cross-validation method used in existing model training overestimates performance by inadequately accounting for drift. This is primarily because typical cross-validation techniques allow data instances to appear in both training and testing sets, thereby distorting the accuracy of the predictive evaluation. As a result, these models are unable to precisely predict future drift effects, compromising their ability to generalize and adapt to evolving data conditions. This paper presents two solutions: (1) a novel sensor drift compensation learning paradigm for validating models, and (2) automated machine learning (AutoML) techniques to enhance classification performance and compensate sensor drift. By employing strategies such as data balancing, meta-learning, automated ensemble learning, hyperparameter optimization, feature selection, and boosting, our AutoML-DC (Drift Compensation) model significantly improves classification performance against sensor drift. AutoML-DC further adapts effectively to varying drift severities.
Towards joint graph learning and sampling set selection from data
Sridhara, Shashank N., Pavez, Eduardo, Ortega, Antonio
We explore the problem of sampling graph signals in scenarios where the graph structure is not predefined and must be inferred from data. In this scenario, existing approaches rely on a two-step process, where a graph is learned first, followed by sampling. More generally, graph learning and graph signal sampling have been studied as two independent problems in the literature. This work provides a foundational step towards jointly optimizing the graph structure and sampling set. Our main contribution, Vertex Importance Sampling (VIS), is to show that the sampling set can be effectively determined from the vertex importance (node weights) obtained from graph learning. We further propose Vertex Importance Sampling with Repulsion (VISR), a greedy algorithm where spatially -separated "important" nodes are selected to ensure better reconstruction. Empirical results on simulated data show that sampling using VIS and VISR leads to competitive reconstruction performance and lower complexity than the conventional two-step approach of graph learning followed by graph sampling.
Out-of-Distribution Detection through Soft Clustering with Non-Negative Kernel Regression
Gulati, Aryan, Dong, Xingjian, Hurtado, Carlos, Shekkizhar, Sarath, Swayamdipta, Swabha, Ortega, Antonio
As language models become more general purpose, increased attention needs to be paid to detecting out-of-distribution (OOD) instances, i.e., those not belonging to any of the distributions seen during training. Existing methods for detecting OOD data are computationally complex and storage-intensive. We propose a novel soft clustering approach for OOD detection based on non-negative kernel regression. Our approach greatly reduces computational and space complexities (up to 11x improvement in inference time and 87% reduction in storage requirements) and outperforms existing approaches by up to 4 AUROC points on four different benchmarks. We also introduce an entropy-constrained version of our algorithm, which leads to further reductions in storage requirements (up to 97% lower than comparable approaches) while retaining competitive performance. Our soft clustering approach for OOD detection highlights its potential for detecting tail-end phenomena in extreme-scale data settings.
Optimizing $k$ in $k$NN Graphs with Graph Learning Perspective
Tamaru, Asuka, Hara, Junya, Higashi, Hiroshi, Tanaka, Yuichi, Ortega, Antonio
In this paper, we propose a method, based on graph signal processing, to optimize the choice of $k$ in $k$-nearest neighbor graphs ($k$NNGs). $k$NN is one of the most popular approaches and is widely used in machine learning and signal processing. The parameter $k$ represents the number of neighbors that are connected to the target node; however, its appropriate selection is still a challenging problem. Therefore, most $k$NNGs use ad hoc selection methods for $k$. In the proposed method, we assume that a different $k$ can be chosen for each node. We formulate a discrete optimization problem to seek the best $k$ with a constraint on the sum of distances of the connected nodes. The optimal $k$ values are efficiently obtained without solving a complex optimization. Furthermore, we reveal that the proposed method is closely related to existing graph learning methods. In experiments on real datasets, we demonstrate that the $k$NNGs obtained with our method are sparse and can determine an appropriate variable number of edges per node. We validate the effectiveness of the proposed method for point cloud denoising, comparing our denoising performance with achievable graph construction methods that can be scaled to typical point cloud sizes (e.g., thousands of nodes).
Study of Manifold Geometry using Multiscale Non-Negative Kernel Graphs
Hurtado, Carlos, Shekkizhar, Sarath, Ruiz-Hidalgo, Javier, Ortega, Antonio
Modern machine learning systems are increasingly trained on large amounts of data embedded in high-dimensional spaces. Often this is done without analyzing the structure of the dataset. In this work, we propose a framework to study the geometric structure of the data. We make use of our recently introduced non-negative kernel (NNK) regression graphs to estimate the point density, intrinsic dimension, and the linearity of the data manifold (curvature). We further generalize the graph construction and geometric estimation to multiple scale by iteratively merging neighborhoods in the input data. Our experiments demonstrate the effectiveness of our proposed approach over other baselines in estimating the local geometry of the data manifolds on synthetic and real datasets.
Joint Graph and Vertex Importance Learning
Girault, Benjamin, Pavez, Eduardo, Ortega, Antonio
To account for the difficulty associated with singular CGL matrices in inverse covariance estimation, the objective In this paper, we explore the topic of graph learning from the function is oftentimes modified [5, 9-12]. However, such an perspective of the Irregularity-Aware Graph Fourier Transform, approach produces dense graphs, even if variables are weakly with the goal of learning the graph signal space inner correlated (see Sec. 4 and [11]) because the modified objective product to better model data. We propose a novel method to function encourages well connected graphs [9]. This issue learn a graph with smaller edge weight upper bounds compared can be solved by incorporating non-convex sparse regularization to combinatorial Laplacian approaches. Experimentally, [11, 13] at the expense of a more complex graph our approach yields much sparser graphs compared to a learning algorithm.
Channel-Wise Early Stopping without a Validation Set via NNK Polytope Interpolation
Bonet, David, Ortega, Antonio, Ruiz-Hidalgo, Javier, Shekkizhar, Sarath
State-of-the-art neural network architectures continue to scale in size and deliver impressive generalization results, although this comes at the expense of limited interpretability. In particular, a key challenge is to determine when to stop training the model, as this has a significant impact on generalization. Convolutional neural networks (ConvNets) comprise high-dimensional feature spaces formed by the aggregation of multiple channels, where analyzing intermediate data representations and the model's evolution can be challenging owing to the curse of dimensionality. We present channel-wise DeepNNK (CW-DeepNNK), a novel channel-wise generalization estimate based on non-negative kernel regression (NNK) graphs with which we perform local polytope interpolation on low-dimensional channels. This method leads to instance-based interpretability of both the learned data representations and the relationship between channels. Motivated by our observations, we use CW-DeepNNK to propose a novel early stopping criterion that (i) does not require a validation set, (ii) is based on a task performance metric, and (iii) allows stopping to be reached at different points for each channel. Our experiments demonstrate that our proposed method has advantages as compared to the standard criterion based on validation set performance.
Spatio-Temporal Graph Scattering Transform
Pan, Chao, Chen, Siheng, Ortega, Antonio
Although spatiotemporal graph neural networks have achieved great empirical success in handling multiple correlated time series, they may be impractical in some real-world scenarios due to a lack of sufficient high-quality training data. Furthermore, spatiotemporal graph neural networks lack theoretical interpretation. To address these issues, we put forth a novel mathematically designed framework to analyze spatiotemporal data. Our proposed spatiotemporal graph scattering transform (ST-GST) extends traditional scattering transforms to the spatiotemporal domain. It performs iterative applications of spatiotemporal graph wavelets and nonlinear activation functions, which can be viewed as a forward pass of spatiotemporal graph convolutional networks without training. Since all the filter coefficients in ST-GST are mathematically designed, it is promising for the real-world scenarios with limited training data, and also allows for a theoretical analysis, which shows that the proposed ST-GST is stable to small perturbations of input signals and structures. Finally, our experiments show that i) ST-GST outperforms spatiotemporal graph convolutional networks by an increase of 35% in accuracy for MSR Action3D dataset; ii) it is better and computationally more efficient to design the transform based on separable spatiotemporal graphs than the joint ones; and iii) the nonlinearity in ST-GST is critical to empirical performance. Processing and learning from spatiotemporal data have received increasing attention recently. Examples include: i) skeleton-based human action recognition based on a sequence of human poses (Liu et al. (2019)), which is critical to human behavior understanding (Borges et al. (2013)), and ii) multi-agent trajectory prediction (Hu et al. (2020)), which is critical to robotics and autonomous driving (Shalev-Shwartz et al. (2016)). A common pattern across these applications is that data evolves in both spatial and temporal domains.
DeepNNK: Explaining deep models and their generalization using polytope interpolation
Shekkizhar, Sarath, Ortega, Antonio
Modern machine learning systems based on neural networks have shown great success in learning complex data patterns while being able to make good predictions on unseen data points. However, the limited interpretability of these systems hinders further progress and application to several domains in the real world. This predicament is exemplified by time consuming model selection and the difficulties faced in predictive explainability, especially in the presence of adversarial examples. In this paper, we take a step towards better understanding of neural networks by introducing a local polytope interpolation method. The proposed Deep Non Negative Kernel regression (NNK) interpolation framework is non parametric, theoretically simple and geometrically intuitive. We demonstrate instance based explainability for deep learning models and develop a method to identify models with good generalization properties using leave one out estimation. Finally, we draw a rationalization to adversarial and generative examples which are inevitable from an interpolation view of machine learning.
Graph Construction from Data using Non Negative Kernel regression (NNK Graphs)
Shekkizhar, Sarath, Ortega, Antonio
Data driven graph constructions are often used in various applications, including several machine learning tasks, where the goal is to make predictions and discover patterns. However, learning an optimal graph from data is still a challenging task. Weighted $K$-nearest neighbor and $\epsilon$-neighborhood methods are among the most common graph construction methods, due to their computational simplicity but the choice of parameters such as $K$ and $\epsilon$ associated with these methods is often ad hoc and lacks a clear interpretation. We formulate graph construction as the problem of finding a sparse signal approximation in kernel space, identifying key similarities between methods in signal approximation and existing graph learning methods. We propose non-negative kernel regression~(NNK), an improved approach for graph construction with interesting geometric and theoretical properties. We show experimentally the efficiency of NNK graphs, its robustness to choice of sparsity $K$ and better performance over state of the art graph methods in semi supervised learning tasks on real world data.