AITopics | Orabona, Francesco

Collaborating Authors

Orabona, Francesco

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Regression-tree Tuning in a Streaming Setting

Kpotufe, Samory, Orabona, Francesco

Neural Information Processing SystemsFeb-14-2020, 17:42:22 GMT

We consider the problem of maintaining the data-structures of a partition-based regression procedure in a setting where the training data arrives sequentially over time. We prove that it is possible to maintain such a structure in time $O(\log n)$ at any time step $n$ while achieving a nearly-optimal regression rate of $\tilde{O}(n {-2/(2 d)})$ in terms of the unknown metric dimension $d$. Finally we prove a new regression lower-bound which is independent of a given data size, and hence is more appropriate for the streaming setting. Papers published at the Neural Information Processing Systems Conference.

artificial intelligence, decision tree learning, machine learning

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.40)

Add feedback

Simultaneous Model Selection and Optimization through Parameter-free Stochastic Learning

Orabona, Francesco

Neural Information Processing SystemsFeb-14-2020, 07:13:31 GMT

Stochastic gradient descent algorithms for training linear and kernel predictors are gaining more and more importance, thanks to their scalability. While various methods have been proposed to speed up their convergence, the model selection phase is often ignored. In fact, in theoretical works most of the time assumptions are made, for example, on the prior knowledge of the norm of the optimal solution, while in the practical world validation methods remain the only viable approach. In this paper, we propose a new kernel-based stochastic gradient descent algorithm that performs model selection while training, with no parameters to tune, nor any form of cross-validation. The algorithm builds on recent advancement in online learning theory for unconstrained settings, to estimate over time the right regularization in a data-dependent way.

artificial intelligence, machine learning, stochastic learning, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Kernel Truncated Randomized Ridge Regression: Optimal Rates and Low Noise Acceleration

Jun, Kwang-Sung, Cutkosky, Ashok, Orabona, Francesco

arXiv.org Machine LearningMay-25-2019

In this paper, we consider the nonparametric least square regression in a Reproducing Kernel Hilbert Space (RKHS). We propose a new randomized algorithm that has optimal generalization error bounds with respect to the square loss, closing a long-standing gap between upper and lower bounds. Moreover, we show that our algorithm has faster finite-time and asymptotic rates on problems where the Bayes risk with respect to the square loss is small. We state our results using standard tools from the theory of least square regression in RKHSs, namely, the decay of the eigenvalues of the associated integral operator and the complexity of the optimal predictor measured through the integral operator.

artificial intelligence, assumption, machine learning, (16 more...)

arXiv.org Machine Learning

1905.1068

Country:

North America > United States (0.14)
Europe > Sweden (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.41)

Add feedback

Momentum-Based Variance Reduction in Non-Convex SGD

Cutkosky, Ashok, Orabona, Francesco

arXiv.org Machine LearningMay-23-2019

Variance reduction has emerged in recent years as a strong competitor to stochastic gradient descent in non-convex problems, providing the first algorithms to improve upon the converge rate of stochastic gradient descent for finding first-order critical points. However, variance reduction techniques typically require carefully tuned learning rates and willingness to use excessively large "mega-batches" in order to achieve their improved results. We present a new variance reduction algorithm, STORM, that does not require any batches and makes use of adaptive learning rates, enabling simpler implementation and less tuning of hyperparameters. Our technique for removing the batches uses a variant of momentum to achieve variance reduction in non-convex optimization. On smooth losses $F$, STORM finds a point $\boldsymbol{x}$ with $\mathbb{E}[\|\nabla F(\boldsymbol{x})\|]\le O(1/\sqrt{T}+\sigma^{1/3}/T^{1/3})$ in $T$ iterations with $\sigma^2$ variance in the gradients, matching the optimal rate but without requiring knowledge of $\sigma$.

algorithm, computer based training, educational technology, (19 more...)

arXiv.org Machine Learning

1905.10018

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.96)

Add feedback

Parameter-free Online Convex Optimization with Sub-Exponential Noise

Jun, Kwang-Sung, Orabona, Francesco

arXiv.org Machine LearningFeb-4-2019

We consider the problem of unconstrained online convex optimization (OCO) with sub-exponential noise, a strictly more general problem than the standard OCO. In this setting, the learner receives a subgradient of the loss functions corrupted by sub-exponential noise and strives to achieve optimal regret guarantee, without knowledge of the competitor norm, i.e., in a parameter-free way. Recently, Cutkosky and Boahen (COLT 2017) proved that, given unbounded subgradients, it is impossible to guarantee a sublinear regret due to an exponential penalty. This paper shows that it is possible to go around the lower bound by allowing the observed subgradients to be unbounded via stochastic noise. However, the presence of unbounded noise in unconstrained OCO is challenging; existing algorithms do not provide near-optimal regret bounds or fail to have a guarantee. So, we design a novel parameter-free OCO algorithm for Banach space, which we call BANCO, via a reduction to betting on noisy coins. We show that BANCO achieves the optimal regret rate in our problem. Finally, we show the application of our results to obtain a parameter-free locally private stochastic subgradient descent algorithm, and the connection to the law of iterated logarithms.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1902.015

Country: Europe > Netherlands (0.14)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Surrogate Losses for Online Learning of Stepsizes in Stochastic Non-Convex Optimization

Zhuang, Zhenxun, Cutkosky, Ashok, Orabona, Francesco

arXiv.org Machine LearningJan-25-2019

Stochastic Gradient Descent (SGD) has played a central role in machine learning. However, it requires a carefully hand-picked stepsize for fast convergence, which is notoriously tedious and time-consuming to tune. Over the last several years, a plethora of adaptive gradient-based algorithms have emerged to ameliorate this problem. They have proved efficient in reducing the labor of tuning in practice, but many of them lack theoretic guarantees even in the convex setting. In this paper, we propose new surrogate losses to cast the problem of learning the optimal stepsizes for the stochastic optimization of a non-convex smooth objective function onto an online convex optimization problem. This allows the use of no-regret online algorithms to compute optimal stepsizes on the fly. In turn, this results in a SGD algorithm with self-tuned stepsizes that guarantees convergence rates that are automatically adaptive to the level of noise.

computer based training, educational technology, stepsize, (21 more...)

arXiv.org Machine Learning

1901.09068

Country: Europe > Spain (0.14)

Genre: Research Report (0.50)

Industry: Education > Educational Setting > Online (0.43)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)

Add feedback

On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes

Li, Xiaoyu, Orabona, Francesco

arXiv.org Machine LearningMay-31-2018

Stochastic gradient descent is the method of choice for large scale optimization of machine learning objective functions. Yet, its performance is greatly variable and heavily depends on the choice of the stepsizes. This has motivated a large body of research on adaptive stepsizes. However, there is currently a gap in our theoretical understanding of these methods, especially in the non-convex setting. In this paper, we start closing this gap: we theoretically analyze the use of adaptive stepsizes, like the ones in AdaGrad, in the non-convex setting. We show sufficient conditions for almost sure convergence to a stationary point when the adaptive stepsizes are used, proving the first guarantee for AdaGrad in the non-convex setting. Moreover, we show explicit rates of convergence that automatically interpolates between $O(1/T)$ and $O(1/\sqrt{T})$ depending on the noise of the stochastic gradients, in both the convex and non-convex setting.

artificial intelligence, machine learning, stepsize, (17 more...)

arXiv.org Machine Learning

1805.08114

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Industry: Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Black-Box Reductions for Parameter-free Online Learning in Banach Spaces

Cutkosky, Ashok, Orabona, Francesco

arXiv.org Machine LearningFeb-17-2018

We introduce several new black-box reductions that significantly improve the design of adaptive and parameter-free online learning algorithms by simplifying analysis, improving regret guarantees, and sometimes even improving runtime. We reduce parameter-free online learning to online exp-concave optimization, we reduce optimization in a Banach space to one-dimensional optimization, and we reduce optimization over a constrained domain to unconstrained optimization. All of our reductions run as fast as online gradient descent. We use our new techniques to improve upon the previously best regret bounds for parameter-free learning, and do so for arbitrary norms.

air transportation, algorithm, computer based training, (19 more...)

arXiv.org Machine Learning

1802.06293

Country: North America > United States (0.14)

Genre: Research Report (0.50)

Industry:

Education > Educational Setting > Online (0.84)
Transportation > Air (0.61)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.84)

Add feedback

Efficient Online Bandit Multiclass Learning with $\tilde{O}(\sqrt{T})$ Regret

Beygelzimer, Alina, Orabona, Francesco, Zhang, Chicheng

arXiv.org Machine LearningJan-17-2018

We present an efficient second-order algorithm with $\tilde{O}(\frac{1}{\eta}\sqrt{T})$ regret for the bandit online multiclass problem. The regret bound holds simultaneously with respect to a family of loss functions parameterized by $\eta$, for a range of $\eta$ restricted by the norm of the competitor. The family of loss functions ranges from hinge loss ($\eta=0$) to squared hinge loss ($\eta=1$). This provides a solution to the open problem of (J. Abernethy and A. Rakhlin. An efficient bandit algorithm for $\sqrt{T}$-regret in online multiclass prediction? In COLT, 2009). We test our algorithm experimentally, showing that it also performs favorably against earlier algorithms.

algorithm, artificial intelligence, big data, (20 more...)

arXiv.org Machine Learning

1702.07958

Country: North America > United States > California > San Diego County (0.14)

Genre: Research Report (0.50)

Industry: Education > Educational Setting > Online (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Data Science > Data Mining > Big Data (0.66)

Add feedback

Training Deep Networks without Learning Rates Through Coin Betting

Orabona, Francesco, Tommasi, Tatiana

Neural Information Processing SystemsDec-31-2017

Deep learning methods achieve state-of-the-art performance in many application scenarios. Yet, these methods require a significant amount of hyperparameters tuning in order to achieve the best results. In particular, tuning the learning rates in the stochastic optimization process is still one of the main bottlenecks. In this paper, we propose a new stochastic gradient descent procedure for deep networks that does not require any learning rate setting. Contrary to previous methods, we do not adapt the learning rates nor we make use of the assumed curvature of the objective function. Instead, we reduce the optimization process to a game of betting on a coin and propose a learning rate free optimal algorithm for this scenario. Theoretical convergence is proven for convex and quasi-convex functions and empirical evidence shows the advantage of our algorithm over popular stochastic gradient algorithms.

algorithm, deep learning, neural network, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (0.46)
North America > Canada > Ontario > Toronto (0.14)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback