AITopics | Colombo, Nicolo

Collaborating Authors

Colombo, Nicolo

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Enhanced Route Planning with Calibrated Uncertainty Set

Tang, Lingxuan, Luo, Rui, Zhou, Zhixin, Colombo, Nicolo

arXiv.org Machine LearningMar-13-2025

This paper investigates the application of probabilistic prediction methodologies in route planning within a road network context. Specifically, we introduce the Conformalized Quantile Regression for Graph Autoencoders (CQR-GAE), which leverages the conformal prediction technique to offer a coverage guarantee, thus improving the reliability and robustness of our predictions. By incorporating uncertainty sets derived from CQR-GAE, we substantially improve the decision-making process in route planning under a robust optimization framework. We demonstrate the effectiveness of our approach by applying the CQR-GAE model to a real-world traffic scenario. The results indicate that our model significantly outperforms baseline methods, offering a promising avenue for advancing intelligent transportation systems.

data mining, machine learning, prediction, (17 more...)

arXiv.org Machine Learning

2503.10088

Country:

North America > United States (0.14)
Europe > Sweden (0.14)

Genre: Research Report (1.00)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (0.36)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

State-space models can learn in-context by gradient descent

Sushma, Neeraj Mohan, Tian, Yudou, Mestha, Harshvardhan, Colombo, Nicolo, Kappel, David, Subramoney, Anand

arXiv.org Artificial IntelligenceOct-15-2024

Deep state-space models (Deep SSMs) have shown capabilities for in-context learning on autoregressive tasks, similar to transformers. However, the architectural requirements and mechanisms enabling this in recurrent networks remain unclear. This study demonstrates that state-space model architectures can perform gradient-based learning and use it for in-context learning. We prove that a single structured state-space model layer, augmented with local self-attention, can reproduce the outputs of an implicit linear model with least squares loss after one step of gradient descent. Our key insight is that the diagonal linear recurrent layer can act as a gradient accumulator, which can be `applied' to the parameters of the implicit regression model. We validate our construction by training randomly initialized augmented SSMs on simple linear regression tasks. The empirically optimized parameters match the theoretical ones, obtained analytically from the implicit model construction. Extensions to multi-step linear and non-linear regression yield consistent results. The constructed SSM encompasses features of modern deep state-space models, with the potential for scalable training and effectiveness even in general tasks. The theoretical construction elucidates the role of local self-attention and multiplicative interactions in recurrent architectures as the key ingredients for enabling the expressive power typical of foundation models.

artificial intelligence, gradient descent, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2410.11687

Country: Europe (0.46)

Genre: Research Report (0.52)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

Entropy Reweighted Conformal Classification

Luo, Rui, Colombo, Nicolo

arXiv.org Artificial IntelligenceJul-24-2024

Conformal Prediction (CP) is a powerful framework for constructing prediction sets with guaranteed coverage. However, recent studies have shown that integrating confidence calibration with CP can lead to a degradation in efficiency. In this paper, We propose an adaptive approach that considers the classifier's uncertainty and employs entropy-based reweighting to enhance the efficiency of prediction sets for conformal classification. Our experimental results demonstrate that this method significantly improves efficiency.

machine learning, natural language, prediction, (13 more...)

arXiv.org Artificial Intelligence

2407.17377

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

Normalizing Flows for Conformal Regression

Colombo, Nicolo

arXiv.org Machine LearningJun-26-2024

Conformal Prediction (CP) algorithms estimate the uncertainty of a prediction model by calibrating its outputs on labeled data. The same calibration scheme usually applies to any model and data without modifications. The obtained prediction intervals are valid by construction but could be inefficient, i.e. unnecessarily big, if the prediction errors are not uniformly distributed over the input space. We present a general scheme to localize the intervals by training the calibration process. The standard prediction error is replaced by an optimized distance metric that depends explicitly on the object attributes. Learning the optimal metric is equivalent to training a Normalizing Flow that acts on the joint distribution of the errors and the inputs. Unlike the Error Reweighting CP algorithm of Papadopoulos et al. (2008), the framework allows estimating the gap between nominal and empirical conditional validity. The approach is compatible with existing locally-adaptive CP strategies based on re-weighting the calibration samples and applies to any point-prediction model without retraining.

artificial intelligence, machine learning, modeling & simulation, (18 more...)

arXiv.org Machine Learning

2406.03346

Country:

Europe > United Kingdom > England (0.14)
Europe > Spain (0.14)
Europe > Finland (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Modeling & Simulation (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.87)

Add feedback

Conformal Load Prediction with Transductive Graph Autoencoders

Luo, Rui, Colombo, Nicolo

arXiv.org Machine LearningJun-12-2024

Graph machine learning has seen a surge in interest with the advent of complex networked systems in diverse domains. Applications include social and transportation networks and various kinds of biological systems. In most cases, the interaction between nodes is typically represented by edges with associated weights. The edge weights can embody varying characteristics, from the strength of interaction between two individuals in a social network to the traffic capacity of a route in a transportation system. The prediction of the edge weights is vital to understanding and modelling graph data. Graph Neural Networks (GNNs) have been successfully used on node classification and link prediction tasks.

data mining, machine learning, prediction, (15 more...)

arXiv.org Machine Learning

2406.08281

Country:

Europe (0.67)
North America > United States (0.29)
Asia > China > Hong Kong (0.14)

Genre: Research Report (0.64)

Industry: Transportation > Infrastructure & Services (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

On training locally adaptive CP

Colombo, Nicolo

arXiv.org Artificial IntelligenceJun-5-2023

We address the problem of making Conformal Prediction (CP) intervals locally adaptive. Most existing methods focus on approximating the object-conditional validity of the intervals by partitioning or re-weighting the calibration set. Our strategy is new and conceptually different. Instead of re-weighting the calibration data, we redefine the conformity measure through a trainable change of variables, $A \to \phi_X(A)$, that depends explicitly on the object attributes, $X$. Under certain conditions and if $\phi_X$ is monotonic in $A$ for any $X$, the transformations produce prediction intervals that are guaranteed to be marginally valid and have $X$-dependent sizes. We describe how to parameterize and train $\phi_X$ to maximize the interval efficiency. Contrary to other CP-aware training methods, the objective function is smooth and can be minimized through standard gradient methods without approximations.

artificial intelligence, conformity score, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2306.04648

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Differentiable Architecture Pruning for Transfer Learning

Colombo, Nicolo, Gao, Yang

arXiv.org Machine LearningJul-7-2021

Transfer learning methods aim to produce machine learning models that are trained on a given problem but perform well also on different new tasks. The interest in transfer learning comes from situations where large data sets can be used for solving a given training task but the data associated with new tasks are too small to train expressive models from scratch. The general transfer-learning strategy is to use the small available new data for adapting a large model that has been previously optimized on the training data. One option consists of keeping the structure of the pre-trained large model intact and fine-tuning its weights to solve the new task. When very few data points are available and the pre-trained network is large, however, customized regularization strategies are needed to mitigate the risk of over-fitting. Fine-tuning only a few parameters is a possible way out but can strongly limit the performance of the final model. Another option is to prune the pre-trained model to reduce its complexity, increase transferability, and prevent overfitting. Existing strategies, however, focus on optimized models and are unable to disentangle the network architecture from the attached weights. As a consequence, the pruned version of the original model can hardly be interpreted as a transferable new architecture and it is difficult to reuse it on new tasks.

artificial intelligence, neural network, optimization problem, (14 more...)

arXiv.org Machine Learning

2107.03375

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)

Add feedback

Disentangling Neural Architectures and Weights: A Case Study in Supervised Classification

Colombo, Nicolo, Gao, Yang

arXiv.org Machine LearningSep-11-2020

The history of deep learning has shown that human-designed problem-specific networks can greatly improve the classification performance of general neural models. In most practical cases, however, choosing the optimal architecture for a given task remains a challenging problem. Recent architecture-search methods are able to automatically build neural models with strong performance but fail to fully appreciate the interaction between neural architecture and weights. This work investigates the problem of disentangling the role of the neural structure and its edge weights, by showing that well-trained architectures may not need any link-specific fine-tuning of the weights. We compare the performance of such weight-free networks (in our case these are binary networks with {0, 1}-valued weights) with random, weight-agnostic, pruned and standard fully connected networks. To find the optimal weight-agnostic network, we use a novel and computationally efficient method that translates the hard architecture-search problem into a feasible optimization problem.More specifically, we look at the optimal task-specific architectures as the optimal configuration of binary networks with {0, 1}-valued weights, which can be found through an approximate gradient descent strategy. Theoretical convergence guarantees of the proposed algorithm are obtained by bounding the error in the gradient approximation and its practical performance is evaluated on two real-world data sets. For measuring the structural similarities between different architectures, we use a novel spectral approach that allows us to underline the intrinsic differences between real-valued networks and weight-free architectures.

deep learning, neural network, optimization problem, (17 more...)

arXiv.org Machine Learning

2009.05346

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Training conformal predictors

Colombo, Nicolo, Vovk, Vladimir

arXiv.org Machine LearningMay-14-2020

Efficiency criteria for conformal prediction, such as \emph{observed fuzziness} (i.e., the sum of p-values associated with false labels), are commonly used to \emph{evaluate} the performance of given conformal predictors. Here, we investigate whether it is possible to exploit efficiency criteria to \emph{learn} classifiers, both conformal predictors and point classifiers, by using such criteria as training objective functions. The proposed idea is implemented for the problem of binary classification of hand-written digits. By choosing a 1-dimensional model class (with one real-valued free parameter), we can solve the optimization problems through an (approximate) exhaustive search over (a discrete version of) the parameter space. Our empirical results suggest that conformal predictors trained by minimizing their observed fuzziness perform better than conformal predictors trained in the traditional way by minimizing the \emph{prediction error} of the corresponding point classifier. They also have a reasonable performance in terms of their prediction error on the test set.

artificial intelligence, optimization problem, percentile oftrain, (19 more...)

arXiv.org Machine Learning

2005.07037

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Multiple Metric Learning for Structured Data

Colombo, Nicolo

arXiv.org Machine LearningFeb-13-2020

We address the problem of merging graph and feature-space information while learning a metric from structured data. Existing algorithms tackle the problem in an asymmetric way, by either extracting vectorized summaries of the graph structure or adding hard constraints to feature-space algorithms. Following a different path, we define a metric regression scheme where we train metric-constrained linear combinations of dissimilarity matrices. The idea is that the input matrices can be pre-computed dissimilarity measures obtained from any kind of available data (e.g. node attributes or edge structure). As the model inputs are distance measures, we do not need to assume the existence of any underlying feature space. Main challenge is that metric constraints (especially positive-definiteness and sub-additivity), are not automatically respected if, for example, the coefficients of the linear combination are allowed to be negative. Both positive and sub-additive constraints are linear inequalities, but the computational complexity of imposing them scales as O(D3), where D is the size of the input matrices (i.e. the size of the data set). This becomes quickly prohibitive, even when D is relatively small. We propose a new graph-based technique for optimizing under such constraints and show that, in some cases, our approach may reduce the original computational complexity of the optimization process by one order of magnitude. Contrarily to existing methods, our scheme applies to any (possibly non-convex) metric-constrained objective function.

algorithm, artificial intelligence, optimization problem, (18 more...)

arXiv.org Machine Learning

2002.05747

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.66)

Add feedback