Collaborating Authors

Estimation, Optimization, and Parallelism when Data is Sparse

Neural Information Processing Systems

We study stochastic optimization problems when the \emph{data} is sparse, which is in a sense dual to the current understanding of high-dimensional statistical learning and optimization. We highlight both the difficulties---in terms of increased sample complexity that sparse data necessitates---and the potential benefits, in terms of allowing parallelism and asynchrony in the design of algorithms. Concretely, we derive matching upper and lower bounds on the minimax rate for optimization and learning with sparse data, and we exhibit algorithms achieving these rates. Our algorithms are adaptive: they achieve the best possible rate for the data observed. We also show how leveraging sparsity leads to (still minimax optimal) parallel and asynchronous algorithms, providing experimental evidence complementing our theoretical results on medium to large-scale learning tasks.

Optimal Function Approximation with Relu Neural Networks Machine Learning

We consider in this paper the optimal approximations of convex univariate functions with feed-forward Relu neural networks. We are interested in the following question: what is the minimal approximation error given the number of approximating linear pieces? We establish the necessary and sufficient conditions and uniqueness of optimal approximations, and give lower and upper bounds of the optimal approximation errors. Relu neural network architectures are then presented to generate these optimal approximations. Finally, we propose an algorithm to find the optimal approximations, as well as prove its convergence and validate it with experimental results.

Transfer Learning via Minimizing the Performance Gap Between Domains

Neural Information Processing Systems

We propose a new principle for transfer learning, based on a straightforward intuition: if two domains are similar to each other, the model trained on one domain should also perform well on the other domain, and vice versa. To formalize this intuition, we define the performance gap as a measure of the discrepancy between the source and target domains. We derive generalization bounds for the instance weighting approach to transfer learning, showing that the performance gap can be viewed as an algorithm-dependent regularizer, which controls the model complexity. Our theoretical analysis provides new insight into transfer learning and motivates a set of general, principled rules for designing new instance weighting schemes for transfer learning. These rules lead to gapBoost, a novel and principled boosting approach for transfer learning.

Information-theoretic analysis of generalization capability of learning algorithms

Neural Information Processing Systems

We derive upper bounds on the generalization error of a learning algorithm in terms of the mutual information between its input and output. The bounds provide an information-theoretic understanding of generalization in learning problems, and give theoretical guidelines for striking the right balance between data fit and generalization by controlling the input-output mutual information. We propose a number of methods for this purpose, among which are algorithms that regularize the ERM algorithm with relative entropy or with random noise. Our work extends and leads to nontrivial improvements on the recent results of Russo and Zou. Papers published at the Neural Information Processing Systems Conference.

On the Upper Bounds of the Real-Valued Predictions - Silvia Benevenuta, Piero Fariselli, 2019


Predictions are fundamental in science as they allow to test and falsify theories. Predictions are ubiquitous in bioinformatics and also help when no first principles are available. Predictions can be distinguished between classifications (when we associate a label to a given input) or regression (when a real value is assigned). Different scores are used to assess the performance of regression predictors; the most widely adopted include the mean square error, the Pearson correlation (ρ), and the coefficient of determination (or R2). The common conception related to the last 2 indices is that the theoretical upper bound is 1; however, their upper bounds depend both on the experimental uncertainty and the distribution of target variables.