AITopics

doi: 10.1007/s10489-019-01421-8

1902.09238

Country: Asia > China > Hubei Province > Wuhan (0.05)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.49)

arXiv.org Machine LearningFeb-25-2019

GAN-based Projector for Faster Recovery in Compressed Sensing with Convergence Guarantees

Raj, Ankit, Li, Yuqi, Bresler, Yoram

A Generative Adversarial Network (GAN) with generator $G$ trained to model the prior of images has been shown to perform better than sparsity-based regularizers in ill-posed inverse problems. In this work, we propose a new method of deploying a GAN-based prior to solve linear inverse problems using projected gradient descent (PGD). Our method learns a network-based projector for use in the PGD algorithm, eliminating the need for expensive computation of the Jacobian of $G$. Experiments show that our approach provides a speed-up of $30\text{-}40\times$ over earlier GAN-based recovery methods for similar accuracy in compressed sensing. Our main theoretical result is that if the measurement matrix is moderately conditioned for range($G$) and the projector is $\delta$-approximate, then the algorithm is guaranteed to reach $O(\delta)$ reconstruction error in $O(log(1/\delta))$ steps in the low noise regime. Additionally, we propose a fast method to design such measurement matrices for a given $G$. Extensive experiments demonstrate the efficacy of this method by requiring $5\text{-}10\times$ fewer measurements than random Gaussian measurement matrices for comparable recovery performance.

artificial intelligence, machine learning, matrix, (15 more...)

1902.09698

Country: North America > United States (0.46)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

arXiv.org Machine LearningFeb-24-2019

Training GANs with Centripetal Acceleration

Peng, Wei, Dai, Yuhong, Zhang, Hui, Cheng, Lizhi

Training generative adversarial networks (GANs) often suffers from cyclic behaviors of iterates. Based on a simple intuition that the direction of centripetal acceleration of an object moving in uniform circular motion is toward the center of the circle, we present the Simultaneous Centripetal Acceleration (SCA) method and the Alternating Centripetal Acceleration (ACA) method to alleviate the cyclic behaviors. Under suitable conditions, gradient descent methods with either SCA or ACA are shown to be linearly convergent for bilinear games. Numerical experiments are conducted by applying ACA to existing gradient-based algorithms in a GAN setup scenario, which demonstrate the superiority of ACA.

centripetal acceleration method, linearly convergent, proposition 3, (9 more...)

1902.08949

Country:

Asia > China > Hunan Province (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.50)

Mücke, Nicole, Neu, Gergely, Rosasco, Lorenzo

Beating SGD Saturation with Tail-Averaging and Minibatching

arXiv.org Machine LearningFeb-22-2019

While stochastic gradient descent (SGD) is one of the major workhorses in machine learning, the learning properties of many practically used variants are poorly understood. In this paper, we consider least squares learning in a nonparametric setting and contribute to filling this gap by focusing on the effect and interplay of multiple passes, mini-batching and averaging, and in particular tail averaging. Our results show how these different variants of SGD can be combined to achieve optimal learning errors, hence providing practical insights. In particular, we show for the first time in the literature that tail averaging allows faster convergence rates than uniform averaging in the nonparametric setting. Finally, we show that a combination of tail-averaging and minibatching allows more aggressive step-size choices than using any one of said components.

assumption, gradient descent, probability, (15 more...)

1902.08668

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)

Genre: Research Report > New Finding (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.69)

Finn, Chelsea, Rajeswaran, Aravind, Kakade, Sham, Levine, Sergey

Online Meta-Learning

arXiv.org Artificial IntelligenceFeb-22-2019

A central capability of intelligent systems is the ability to continuously build upon previous experiences to speed up and enhance learning of new tasks. Two distinct research paradigms have studied this question. Meta-learning views this problem as learning a prior over model parameters that is amenable for fast adaptation on a new task, but typically assumes the set of tasks are available together as a batch. In contrast, online (regret based) learning considers a sequential setting in which problems are revealed one after the other, but conventionally train only a single model without any task-specific adaptation. This work introduces an online meta-learning setting, which merges ideas from both the aforementioned paradigms to better capture the spirit and practice of continual lifelong learning. We propose the follow the meta leader algorithm which extends the MAML algorithm to this setting. Theoretically, this work provides an $\mathcal{O}(\log T)$ regret guarantee with only one additional higher order smoothness assumption in comparison to the standard online setting. Our experimental evaluation on three different large-scale tasks suggest that the proposed algorithm significantly outperforms alternatives based on traditional online learning approaches.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

1902.08438

Country: North America > United States > California (0.28)

Genre:

Research Report (0.82)
Instructional Material (0.57)

Industry:

Education > Educational Setting > Online (0.50)
Education > Educational Setting > Continuing Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.47)

Lu, Songtao, Tsaknakis, Ioannis, Hong, Mingyi, Chen, Yongxin

Hybrid Block Successive Approximation for One-Sided Non-Convex Min-Max Problems: Algorithms and Applications

arXiv.org Machine LearningFeb-21-2019

The min-max problem, also known as the saddle point problem, is a class of optimization problems in which we minimize and maximize two subsets of variables simultaneously. This class of problems can be used to formulate a wide range of signal processing and communication (SPCOM) problems. Despite its popularity, existing theory for this class has been mainly developed for problems with certain special convex-concave structure. Therefore, it cannot be used to guide the algorithm design for many interesting problems in SPCOM, where some kind of non-convexity often arises. In this work, we consider a general block-wise one-sided non-convex min-max problem, in which the minimization problem consists of multiple blocks and is non-convex, while the maximization problem is (strongly) concave. We propose a class of simple algorithms named Hybrid Block Successive Approximation (HiBSA), which alternatingly performs gradient descent-type steps for the minimization blocks and one gradient ascent-type step for the maximization problem. A key element in the proposed algorithm is the introduction of certain properly designed regularization and penalty terms, which are used to stabilize the algorithm and ensure convergence. For the first time, we show that such simple alternating min-max algorithms converge to first-order stationary solutions, with quantifiable global rates. To validate the efficiency of the proposed algorithms, we conduct numerical tests on a number of information processing and wireless communication problems, including the robust learning problem, the non-convex min-utility maximization problems, and certain wireless jamming problem arising in interfering channels.

algorithm, artificial intelligence, machine learning, (15 more...)

1902.08294

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)

Genre: Research Report (0.81)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Wójcik, Bartosz, Maziarka, Łukasz, Tabor, Jacek

LOSSGRAD: automatic learning rate in gradient descent

In this paper, we propose a simple, fast and easy to implement algorithm LOSSGRAD (locally optimal step-size in gradient descent), which automatically modifies the step-size in gradient descent during neural networks training. Given a function $f$, a point $x$, and the gradient $\nabla_x f$ of $f$, we aim to find the step-size $h$ which is (locally) optimal, i.e. satisfies: $$ h=arg\,min_{t \geq 0} f(x-t \nabla_x f). $$ Making use of quadratic approximation, we show that the algorithm satisfies the above assumption. We experimentally show that our method is insensitive to the choice of initial learning rate while achieving results comparable to other methods.

experiment, learning rate, lossgrad, (13 more...)

1902.07656

Country: Europe > Poland > Lesser Poland Province > Kraków (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

de Roos, Filip, Hennig, Philipp

Active Probabilistic Inference on Matrices for Pre-Conditioning in Stochastic Optimization

Pre-conditioning is a well-known concept that can significantly improve the convergence of optimization algorithms. For noise-free problems, where good pre-conditioners are not known a priori, iterative linear algebra methods offer one way to efficiently construct them. For the stochastic optimization problems that dominate contemporary machine learning, however, this approach is not readily available. We propose an iterative algorithm inspired by classic iterative linear solvers that uses a probabilistic model to actively infer a pre-conditioner in situations where Hessian-projections can only be constructed with strong Gaussian noise. The algorithm is empirically demonstrated to efficiently construct effective pre-conditioners for stochastic gradient descent and its variants. Experiments on problems of comparably low dimensionality show improved convergence. In very high-dimensional problems, such as those encountered in deep learning, the pre-conditioner effectively becomes an automatic learning-rate adaptation scheme, which we also empirically show to work well.

active probabilistic inference, algorithm, matrix, (14 more...)

1902.07557

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > Panama (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(4 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Kempka, Michał, Kotłowski, Wojciech, Warmuth, Manfred K.

Adaptive scale-invariant online algorithms for learning linear models

We consider online learning with linear models, where the algorithm predicts on sequentially revealed instances (feature vectors), and is compared against the best linear function (comparator) in hindsight. Popular algorithms in this framework, such as Online Gradient Descent (OGD), have parameters (learning rates), which ideally should be tuned based on the scales of the features and the optimal comparator, but these quantities only become available at the end of the learning process. In this paper, we resolve the tuning problem by proposing online algorithms making predictions which are invariant under arbitrary rescaling of the features. The algorithms have no parameters to tune, do not require any prior knowledge on the scale of the instances or the comparator, and achieve regret bounds matching (up to a logarithmic factor) that of OGD with optimally tuned separate learning rates per dimension, while retaining comparable runtime performance.

adaptive scale-invariant online algorithm, algorithm, scinol 2, (12 more...)

1902.07528

Country:

Europe > Poland > Greater Poland Province > Poznań (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Improving SGD convergence by tracing multiple promising directions and estimating distances to their extrema

Duda, Jarek

Deep neural networks are usually trained with stochastic gradient descent (SGD), which optimizes $\theta\in\mathbb{R}^D$ parameters to minimize objective function using very rough approximations of gradient, only averaging to the real gradient. Standard approaches like momentum or ADAM only consider single direction, and do not try to model distance from extremum - neglecting valuable information from calculated gradients. It can be improved by second order methods, but they are costly, need inverse of Hessian - problematic especially in the stochastic setting. Proposed general framework should overcome these difficulties by directly evolving local second order parametrization in $d\ll D$ directions: as $\sum_{i=1}^d \lambda_{i}(\theta\cdot v_{i}-p_{i})^2$ modelling local information we are interested in, and relatively simple to update for better agreement with calculated gradients. It allows for $\theta$ update by simultaneously attracting toward modelled directional minima $(\lambda_i>0)$, and repulsing from maxima $(\lambda_i<0)$, correspondingly to distances from $p_i$ (and uncertainty), what allows to also handle problematic saddles. Calculated gradients can be used to slowly evolve this parametrization to improve agreement with local behavior of objective function, accumulating their statistical trends: 1) update $\lambda, p$ parameters for more accurate description of parabola in corresponding directions (also uncertainty), 2) rotate considered subspace toward recently statistically significant directions (replacing the less frequent ones), and 3) rotate $(v_i)$ inside the subspace to improve diagonal form of Hessian in this basis. Presented general framework leaves many customization options for optimizations to specific tasks.

approximation, gradient, parametrization, (16 more...)

1901.11457

Country:

Oceania > Tonga (0.04)
Europe > Poland > Lesser Poland Province > Kraków (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)