Goto

Collaborating Authors

 Colombo, Nicolo


Enhanced Route Planning with Calibrated Uncertainty Set

arXiv.org Machine Learning

This paper investigates the application of probabilistic prediction methodologies in route planning within a road network context. Specifically, we introduce the Conformalized Quantile Regression for Graph Autoencoders (CQR-GAE), which leverages the conformal prediction technique to offer a coverage guarantee, thus improving the reliability and robustness of our predictions. By incorporating uncertainty sets derived from CQR-GAE, we substantially improve the decision-making process in route planning under a robust optimization framework. We demonstrate the effectiveness of our approach by applying the CQR-GAE model to a real-world traffic scenario. The results indicate that our model significantly outperforms baseline methods, offering a promising avenue for advancing intelligent transportation systems.


State-space models can learn in-context by gradient descent

arXiv.org Artificial Intelligence

Deep state-space models (Deep SSMs) have shown capabilities for in-context learning on autoregressive tasks, similar to transformers. However, the architectural requirements and mechanisms enabling this in recurrent networks remain unclear. This study demonstrates that state-space model architectures can perform gradient-based learning and use it for in-context learning. We prove that a single structured state-space model layer, augmented with local self-attention, can reproduce the outputs of an implicit linear model with least squares loss after one step of gradient descent. Our key insight is that the diagonal linear recurrent layer can act as a gradient accumulator, which can be `applied' to the parameters of the implicit regression model. We validate our construction by training randomly initialized augmented SSMs on simple linear regression tasks. The empirically optimized parameters match the theoretical ones, obtained analytically from the implicit model construction. Extensions to multi-step linear and non-linear regression yield consistent results. The constructed SSM encompasses features of modern deep state-space models, with the potential for scalable training and effectiveness even in general tasks. The theoretical construction elucidates the role of local self-attention and multiplicative interactions in recurrent architectures as the key ingredients for enabling the expressive power typical of foundation models.


Entropy Reweighted Conformal Classification

arXiv.org Artificial Intelligence

Conformal Prediction (CP) is a powerful framework for constructing prediction sets with guaranteed coverage. However, recent studies have shown that integrating confidence calibration with CP can lead to a degradation in efficiency. In this paper, We propose an adaptive approach that considers the classifier's uncertainty and employs entropy-based reweighting to enhance the efficiency of prediction sets for conformal classification. Our experimental results demonstrate that this method significantly improves efficiency.


Normalizing Flows for Conformal Regression

arXiv.org Machine Learning

Conformal Prediction (CP) algorithms estimate the uncertainty of a prediction model by calibrating its outputs on labeled data. The same calibration scheme usually applies to any model and data without modifications. The obtained prediction intervals are valid by construction but could be inefficient, i.e. unnecessarily big, if the prediction errors are not uniformly distributed over the input space. We present a general scheme to localize the intervals by training the calibration process. The standard prediction error is replaced by an optimized distance metric that depends explicitly on the object attributes. Learning the optimal metric is equivalent to training a Normalizing Flow that acts on the joint distribution of the errors and the inputs. Unlike the Error Reweighting CP algorithm of Papadopoulos et al. (2008), the framework allows estimating the gap between nominal and empirical conditional validity. The approach is compatible with existing locally-adaptive CP strategies based on re-weighting the calibration samples and applies to any point-prediction model without retraining.


Conformal Load Prediction with Transductive Graph Autoencoders

arXiv.org Machine Learning

Graph machine learning has seen a surge in interest with the advent of complex networked systems in diverse domains. Applications include social and transportation networks and various kinds of biological systems. In most cases, the interaction between nodes is typically represented by edges with associated weights. The edge weights can embody varying characteristics, from the strength of interaction between two individuals in a social network to the traffic capacity of a route in a transportation system. The prediction of the edge weights is vital to understanding and modelling graph data. Graph Neural Networks (GNNs) have been successfully used on node classification and link prediction tasks.


On training locally adaptive CP

arXiv.org Artificial Intelligence

We address the problem of making Conformal Prediction (CP) intervals locally adaptive. Most existing methods focus on approximating the object-conditional validity of the intervals by partitioning or re-weighting the calibration set. Our strategy is new and conceptually different. Instead of re-weighting the calibration data, we redefine the conformity measure through a trainable change of variables, $A \to \phi_X(A)$, that depends explicitly on the object attributes, $X$. Under certain conditions and if $\phi_X$ is monotonic in $A$ for any $X$, the transformations produce prediction intervals that are guaranteed to be marginally valid and have $X$-dependent sizes. We describe how to parameterize and train $\phi_X$ to maximize the interval efficiency. Contrary to other CP-aware training methods, the objective function is smooth and can be minimized through standard gradient methods without approximations.


Differentiable Architecture Pruning for Transfer Learning

arXiv.org Machine Learning

Transfer learning methods aim to produce machine learning models that are trained on a given problem but perform well also on different new tasks. The interest in transfer learning comes from situations where large data sets can be used for solving a given training task but the data associated with new tasks are too small to train expressive models from scratch. The general transfer-learning strategy is to use the small available new data for adapting a large model that has been previously optimized on the training data. One option consists of keeping the structure of the pre-trained large model intact and fine-tuning its weights to solve the new task. When very few data points are available and the pre-trained network is large, however, customized regularization strategies are needed to mitigate the risk of over-fitting. Fine-tuning only a few parameters is a possible way out but can strongly limit the performance of the final model. Another option is to prune the pre-trained model to reduce its complexity, increase transferability, and prevent overfitting. Existing strategies, however, focus on optimized models and are unable to disentangle the network architecture from the attached weights. As a consequence, the pruned version of the original model can hardly be interpreted as a transferable new architecture and it is difficult to reuse it on new tasks.


Disentangling Neural Architectures and Weights: A Case Study in Supervised Classification

arXiv.org Machine Learning

The history of deep learning has shown that human-designed problem-specific networks can greatly improve the classification performance of general neural models. In most practical cases, however, choosing the optimal architecture for a given task remains a challenging problem. Recent architecture-search methods are able to automatically build neural models with strong performance but fail to fully appreciate the interaction between neural architecture and weights. This work investigates the problem of disentangling the role of the neural structure and its edge weights, by showing that well-trained architectures may not need any link-specific fine-tuning of the weights. We compare the performance of such weight-free networks (in our case these are binary networks with {0, 1}-valued weights) with random, weight-agnostic, pruned and standard fully connected networks. To find the optimal weight-agnostic network, we use a novel and computationally efficient method that translates the hard architecture-search problem into a feasible optimization problem.More specifically, we look at the optimal task-specific architectures as the optimal configuration of binary networks with {0, 1}-valued weights, which can be found through an approximate gradient descent strategy. Theoretical convergence guarantees of the proposed algorithm are obtained by bounding the error in the gradient approximation and its practical performance is evaluated on two real-world data sets. For measuring the structural similarities between different architectures, we use a novel spectral approach that allows us to underline the intrinsic differences between real-valued networks and weight-free architectures.


Training conformal predictors

arXiv.org Machine Learning

Efficiency criteria for conformal prediction, such as \emph{observed fuzziness} (i.e., the sum of p-values associated with false labels), are commonly used to \emph{evaluate} the performance of given conformal predictors. Here, we investigate whether it is possible to exploit efficiency criteria to \emph{learn} classifiers, both conformal predictors and point classifiers, by using such criteria as training objective functions. The proposed idea is implemented for the problem of binary classification of hand-written digits. By choosing a 1-dimensional model class (with one real-valued free parameter), we can solve the optimization problems through an (approximate) exhaustive search over (a discrete version of) the parameter space. Our empirical results suggest that conformal predictors trained by minimizing their observed fuzziness perform better than conformal predictors trained in the traditional way by minimizing the \emph{prediction error} of the corresponding point classifier. They also have a reasonable performance in terms of their prediction error on the test set.


Multiple Metric Learning for Structured Data

arXiv.org Machine Learning

We address the problem of merging graph and feature-space information while learning a metric from structured data. Existing algorithms tackle the problem in an asymmetric way, by either extracting vectorized summaries of the graph structure or adding hard constraints to feature-space algorithms. Following a different path, we define a metric regression scheme where we train metric-constrained linear combinations of dissimilarity matrices. The idea is that the input matrices can be pre-computed dissimilarity measures obtained from any kind of available data (e.g. node attributes or edge structure). As the model inputs are distance measures, we do not need to assume the existence of any underlying feature space. Main challenge is that metric constraints (especially positive-definiteness and sub-additivity), are not automatically respected if, for example, the coefficients of the linear combination are allowed to be negative. Both positive and sub-additive constraints are linear inequalities, but the computational complexity of imposing them scales as O(D3), where D is the size of the input matrices (i.e. the size of the data set). This becomes quickly prohibitive, even when D is relatively small. We propose a new graph-based technique for optimizing under such constraints and show that, in some cases, our approach may reduce the original computational complexity of the optimization process by one order of magnitude. Contrarily to existing methods, our scheme applies to any (possibly non-convex) metric-constrained objective function.