AITopics

2006.0665

Genre: Research Report > New Finding (0.34)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Neural Information Processing SystemsFeb-15-2020, 19:28:10 GMT

Preconditioned Spectral Descent for Deep Learning

Carlson, David E., Collins, Edo, Hsieh, Ya-Ping, Carin, Lawrence, Cevher, Volkan

Deep learning presents notorious computational challenges. These challenges include, but are not limited to, the non-convexity of learning objectives and estimating the quantities needed for optimization algorithms, such as gradients. While we do not address the non-convexity, we present an optimization solution that ex- ploits the so far unused "geometry" in the objective function in order to best make use of the estimated gradients. Previous work attempted similar goals with preconditioned methods in the Euclidean space, such as L-BFGS, RMSprop, and ADA-grad. In stark contrast, our approach combines a non-Euclidean gradient method with preconditioning.

deep learning, neural network, preconditioned spectral descent, (7 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.81)

Neural Information Processing SystemsFeb-15-2020, 01:13:40 GMT

Learning with Compressible Priors

Cevher, Volkan

We describe probability distributions, dubbed compressible priors, whose independent and identically distributed (iid) realizations result in compressible signals. A signal is compressible when sorted magnitudes of its coefficients exhibit a power-law decay so that the signal can be well-approximated by a sparse signal. Since compressible signals live close to sparse signals, their intrinsic information can be stably embedded via simple non-adaptive linear projections into a much lower dimensional space whose dimension grows logarithmically with the ambient signal dimension. By using order statistics, we show that N-sample iid realizations of generalized Pareto, Student's t, log-normal, Frechet, and log-logistic distributions are compressible, i.e., they have a constant expected decay rate, which is independent of N. In contrast, we show that generalized Gaussian distribution with shape parameter q is compressible only in restricted cases since the expected decay rate of its N-sample iid realizations decreases with N as 1/[q log(N/q)]. We use compressible priors as a scaffold to build new iterative sparse signal recovery algorithms based on Bayesian inference arguments.

artificial intelligence, decay rate, sparse signal

Technology: Information Technology > Artificial Intelligence (0.46)

Neural Information Processing SystemsFeb-14-2020, 17:26:47 GMT

Adversarially Robust Optimization with Gaussian Processes

Bogunovic, Ilija, Scarlett, Jonathan, Jegelka, Stefanie, Cevher, Volkan

In this paper, we consider the problem of Gaussian process (GP) optimization with an added robustness requirement: The returned point may be perturbed by an adversary, and we require the function value to remain as high as possible even after this perturbation. This problem is motivated by settings in which the underlying functions during optimization and implementation stages are different, or when one is interested in finding an entire region of good inputs rather than only a single point. We show that standard GP optimization algorithms do not exhibit the desired robustness properties, and provide a novel confidence-bound based algorithm StableOpt for this purpose. We rigorously establish the required number of samples for StableOpt to find a near-optimal point, and we complement this guarantee with an algorithm-independent lower bound. We experimentally demonstrate several potential applications of interest using real-world data sets, and we show that StableOpt consistently succeeds in finding a stable maximizer where several baseline methods fail.

adversarially robust optimization, artificial intelligence, optimization problem, (3 more...)

Technology:

Information Technology > Modeling & Simulation (0.65)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.64)

Neural Information Processing SystemsFeb-14-2020, 15:56:06 GMT

Stochastic Three-Composite Convex Minimization

Yurtsever, Alp, Vu, Bang Cong, Cevher, Volkan

We propose a stochastic optimization method for the minimization of the sum of three convex functions, one of which has Lipschitz continuous gradient as well as restricted strong convexity. Our approach is most suitable in the setting where it is computationally advantageous to process smooth term in the decomposition with its stochastic gradient estimate and the other two functions separately with their proximal operators, such as doubly regularized empirical risk minimization problems. We prove the convergence characterization of the proposed algorithm in expectation under the standard assumptions for the stochastic gradient estimate of the smooth term. Our method operates in the primal space and can be considered as a stochastic extension of the three-operator splitting method. Finally, numerical evidence supports the effectiveness of our method in real-world problems.

artificial intelligence, machine learning, stochastic three-composite convex minimization, (1 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

arXiv.org Machine LearningOct-29-2019

Nearly Minimal Over-Parametrization of Shallow Neural Networks

Eftekhari, Armin, Song, ChaeHwan, Cevher, Volkan

A recent line of work has shown that an overparametrized neural network can perfectly fit the training data, an otherwise often intractable nonconvex optimization problem. For (fully-connected) shallow networks, in the best case scenario, the existing theory requires quadratic over-parametrization as a function of the number of training samples. This paper establishes that linear overparametrization is sufficient to fit the training data, using a simple variant of the (stochastic) gradient descent. Crucially, unlike several related works, the training considered in this paper is not limited to the lazy regime in the sense cautioned against in [1, 2]. Beyond shallow networks, the framework developed in this work for over-parametrization is applicable to a variety of learning problems.

shallow neural network

1910.03948

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.53)

arXiv.org Machine LearningJul-7-2019

Fast and Provable ADMM for Learning with Generative Priors

Gómez, Fabian Latorre, Eftekhari, Armin, Cevher, Volkan

In this work, we propose a (linearized) Alternating Direction Method-of-Multipliers (ADMM) algorithm for minimizing a convex function subject to a nonconvex constraint. We focus on the special case where such constraint arises from the specification that a variable should lie in the range of a neural network. This is motivated by recent successful applications of Generative Adversarial Networks (GANs) in tasks like compressive sensing, denoising and robustness against adversarial examples. The derived rates for our algorithm are characterized in terms of certain geometric properties of the generator network, which we show hold for feedforward architectures, under mild assumptions. Unlike gradient descent (GD), it can efficiently handle non-smooth objectives as well as exploit efficient partial minimization procedures, thus being faster in many practical scenarios.

algorithm 1, deep learning, neural network, (19 more...)

1907.03343

Country:

Europe > Sweden (0.14)
Oceania > Australia (0.14)
North America > United States (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceJun-5-2019

Interactive Teaching Algorithms for Inverse Reinforcement Learning

Kamalaruban, Parameswaran, Devidze, Rati, Cevher, Volkan, Singla, Adish

We study the problem of inverse reinforcement learning (IRL) with the added twist that the learner is assisted by a helpful teacher. More formally, we tackle the following algorithmic question: How could a teacher provide an informative sequence of demonstrations to an IRL learner to speed up the learning process? We present an interactive teaching framework where a teacher adaptively chooses the next demonstration based on learner's current policy. In particular, we design teaching algorithms for two concrete settings: an omniscient setting where a teacher has full knowledge about the learner's dynamics and a blackbox setting where the teacher has minimal knowledge. Then, we study a sequential variant of the popular MCE-IRL learner and prove convergence guarantees of our teaching algorithm in the omniscient setting. Extensive experiments with a car driving simulator environment show that the learning progress can be speeded up drastically as compared to an uninformative teacher.

artificial intelligence, learner, survey article, (21 more...)

arXiv.org Artificial Intelligence

1905.11867

Genre: Research Report (0.82)

Industry: Education > Educational Setting > Online (0.61)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Machine LearningMar-15-2019

On Certifying Non-uniform Bound against Adversarial Attacks

Liu, Chen, Tomioka, Ryota, Cevher, Volkan

This work studies the robustness certification problem of neural network models, which aims to find certified adversary-free regions as large as possible around data points. In contrast to the existing approaches that seek regions bounded uniformly along all input features, we consider non-uniform bounds and use it to study the decision boundary of neural network models. We formulate our target as an optimization problem with nonlinear constraints. Then, a framework applicable for general feedforward neural networks is proposed to bound the output logits so that the relaxed problem can be solved by the augmented Lagrangian method. Our experiments show the non-uniform bounds have larger volumes than uniform ones. Compared with normal models, the robust models have even larger non-uniform bounds and better interpretability. Further, the geometric similarity of the non-uniform bounds gives a quantitative, data-agnostic metric of input features' robustness.

algorithm, deep learning, neural network, (18 more...)

1903.06603

Country: Europe > Switzerland (0.14)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (0.66)
Government > Military (0.42)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

arXiv.org Machine LearningJan-29-2019

Stochastic Conditional Gradient Method for Composite Convex Minimization

Locatello, Francesco, Yurtsever, Alp, Fercoq, Olivier, Cevher, Volkan

In this paper, we propose the first practical algorithm to minimize stochastic composite optimization problems over compact convex sets. This template allows for affine constraints and therefore covers stochastic semidefinite programs (SDPs), which are vastly applicable in both machine learning and statistics. In this setup, stochastic algorithms with convergence guarantees are either not known or not tractable. We tackle this general problem and propose a convergent, easy to implement and tractable algorithm. We prove $\mathcal{O}(k^{-1/3})$ convergence rate in expectation on the objective residual and $\mathcal{O}(k^{-5/12})$ in expectation on the feasibility gap. These rates are achieved without increasing the batchsize, which can contain a single sample. We present extensive empirical evidence demonstrating the superiority of our algorithm on a broad range of applications including optimization of stochastic SDPs.

algorithm 1, artificial intelligence, optimization problem, (16 more...)

1901.10348

Country:

North America > United States (0.28)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)