AITopics

In this paper, we study the minimax optimization problem in the smooth and strongly convex-strongly concave setting when we have access to noisy estimates of gradients. In particular, we first analyze the stochastic Gradient Descent Ascent (GDA) method with constant stepsize, and show that it converges to a neighborhood of the solution of the minimax problem. We further provide tight bounds on the convergence rate and the size of this neighborhood. Next, we propose a multistage variant of stochastic GDA (M-GDA) that runs in multiple stages with a particular learning rate decay schedule and converges to the exact solution of the minimax problem. We show M-GDA achieves the lower bounds in terms of noise dependence without any assumptions on the knowledge of noise characteristics. We also show that M-GDA obtains a linear decay rate with respect to the error's dependence on the initial error, although the dependence on condition number is suboptimal. In order to improve this dependence, we apply the multistage machinery to the stochastic Optimistic Gradient Descent Ascent (OGDA) algorithm and propose the M-OGDA algorithm which also achieves the optimal linear decay rate with respect to the initial error. To the best of our knowledge, this method is the first to simultaneously achieve the best dependence on noise characteristic as well as the initial error and condition number.

algorithm, assumption 2, saddle point, (12 more...)

2002.05683

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > Japan > Kyūshū & Okinawa > Okinawa (0.04)
(3 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.71)

Diakonikolas, Ilias, Kontonis, Vasilis, Tzamos, Christos, Zarifis, Nikos

Learning Halfspaces with Massart Noise Under Structured Distributions

We study the problem of learning halfspaces with Massart noise in the distribution-specific PAC model. We give the first computationally efficient algorithm for this problem with respect to a broad family of distributions, including log-concave distributions. This resolves an open question posed in a number of prior works. Our approach is extremely simple: We identify a smooth {\em non-convex} surrogate loss with the property that any approximate stationary point of this loss defines a halfspace that is close to the target halfspace. Given this structural result, we can use SGD to solve the underlying learning problem.

algorithm, gradient, halfspace, (13 more...)

2002.05632

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Wisconsin > Dane County > Madison (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Hu, Shi, Pezzotti, Nicola, Mavroeidis, Dimitrios, Welling, Max

Simple and Accurate Uncertainty Quantification from Bias-Variance Decomposition

Examples include medical diagnosis and selfdriving (Kennedy & O'Hagan, 2001) provides a more fine-grained vehicles. We propose a new method that categorization of uncertainty into six terms. Among them, is based directly on the bias-variance decomposition, the parameter and experimental uncertainties correspond where the parameter uncertainty is given by to the epistemic and aleatoric uncertainties in (Kendall & the variance of an ensemble divided by the number Gal, 2017), and the structural uncertainty corresponds to of members in the ensemble, and the aleatoric the missing model bias. For clarity, from now on we switch uncertainty plus the squared bias is estimated by to the uncertainty terminologies defined in (Kennedy & training a separate model that is regressed directly O'Hagan, 2001) for the rest of this paper.

ensemble, prediction, simple and accurate uncertainty quantification, (13 more...)

2002.05582

Country: Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)

Moitra, Ankur, Risteski, Andrej

Fast Convergence for Langevin Diffusion with Matrix Manifold Structure

In this paper, we study the problem of sampling from distributions of the form p(x) \propto e^{-\beta f(x)} for some function f whose values and gradients we can query. This mode of access to f is natural in the scenarios in which such problems arise, for instance sampling from posteriors in parametric Bayesian models. Classical results show that a natural random walk, Langevin diffusion, mixes rapidly when f is convex. Unfortunately, even in simple examples, the applications listed above will entail working with functions f that are nonconvex -- for which sampling from p may in general require an exponential number of queries. In this paper, we study one aspect of nonconvexity relevant for modern machine learning applications: existence of invariances (symmetries) in the function f, as a result of which the distribution p will have manifolds of points with equal probability. We give a recipe for proving mixing time bounds of Langevin dynamics in order to sample from manifolds of local optima of the function f in settings where the distribution is well-concentrated around them. We specialize our arguments to classic matrix factorization-like Bayesian inference problems where we get noisy measurements A(XX^T), X \in R^{d \times k} of a low-rank matrix, i.e. f(X) = \|A(XX^T) - b\|^2_2, X \in R^{d \times k}, and \beta the inverse of the variance of the noise. Such functions f are invariant under orthogonal transformations, and include problems like matrix factorization, sensing, completion. Beyond sampling, Langevin dynamics is a popular toy model for studying stochastic gradient descent. Along these lines, we believe that our work is an important first step towards understanding how SGD behaves when there is a high degree of symmetry in the space of parameters the produce the same output.

inequality, manifold, matrix, (15 more...)

2002.05576

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Rothfuss, Jonas, Fortuin, Vincent, Krause, Andreas

PACOH: Bayes-Optimal Meta-Learning with PAC-Guarantees

Meta-learning can successfully acquire useful inductive biases from data, especially when a large number of meta-tasks are available. Yet, its generalization properties to unseen tasks are poorly understood. Particularly if the number of meta-tasks is small, this raises concerns for potential overfitting. We provide a theoretical analysis using the PAC-Bayesian framework and derive novel generalization bounds for meta-learning with unbounded loss functions and Bayesian base learners. Using these bounds, we develop a class of PAC-optimal meta-learning algorithms with performance guarantees and a principled meta-regularization. When instantiating our PAC-optimal hyper-posterior (PACOH) with Gaussian processes as base learners, the resulting approach consistently outperforms several popular meta-learning methods, both in terms of predictive accuracy and the quality of its uncertainty estimates.

bayes-optimal meta-learning, meta-learning, pacoh, (15 more...)

2002.05551

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
(3 more...)

Genre: Research Report (0.64)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Avelar, Pedro H. C., Tavares, Anderson R., da Silveira, Thiago L. T., Jung, Cláudio R., Lamb, Luís C.

Superpixel Image Classification with Graph Attention Networks

This document reports the use of Graph Attention Networks for classifying oversegmented images, as well as a general procedure for generating oversegmented versions of image-based datasets. The code and learnt models for/from the experiments are available on github. The experiments were ran from June 2019 until December 2019. We obtained better results than the baseline models that uses geometric distance-based attention by using instead self attention, in a more sparsely connected graph network.

computer vision, dataset, superpixel, (13 more...)

2002.05544

Country:

North America > United States > Hawaii > Honolulu County > Honolulu (0.05)
Europe > Italy > Tuscany > Florence (0.05)
South America > Brazil > Rio Grande do Sul (0.04)
(5 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)
Information Technology > Sensing and Signal Processing > Image Processing (0.84)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.65)

Improving Generalization of Reinforcement Learning with Minimax Distributional Soft Actor-Critic

Ren, Yangang, Duan, Jingliang, Guan, Yang, Li, Shengbo Eben

Reinforcement learning (RL) has achieved remarkable performance in a variety of sequential decision making and control tasks. However, a common problem is that learned nearly optimal policy always overfits to the training environment and may not be extended to situations never encountered during training. For practical applications, the randomness of the environment usually leads to rare but devastating events, which should be the focus of safety-critical systems, such as autonomous driving. In this paper, we introduce the minimax formulation and distributional framework to improve the generalization ability of RL algorithms and develop the Minimax Distributional Soft Actor-Critic (Minimax DSAC) algorithm. Minimax formulation aims to seek optimal policy considering the most serious disturbances from environment, in which the protagonist policy maximizes action-value function while the adversary policy tries to minimize it. Distributional framework aims to learn a state-action return distribution, from which we can model the risk of different returns explicitly, thus, formulating a risk-averse protagonist policy and a risk-seeking adversarial policy. We implement our method on the decision-making tasks of autonomous vehicles at intersections and test the trained policy in distinct environments from training environment. Results demonstrate that our method can greatly improve the generalization ability of the protagonist agent to different environmental variations.

algorithm, return distribution, vehicle, (12 more...)

2002.05502

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > New York > Richmond County > New York City (0.04)
North America > United States > New York > Queens County > New York City (0.04)
(6 more...)

Genre: Research Report > New Finding (0.88)

Industry:

Transportation (0.35)
Information Technology (0.35)
Automobiles & Trucks (0.35)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Akyildiz, Ömer Deniz, Sabanis, Sotirios

Nonasymptotic analysis of Stochastic Gradient Hamiltonian Monte Carlo under local conditions for nonconvex optimization

This problem arises in many cases in machine learning, most notably in large-scale (mini-batch) Bayesian inference (Welling and Teh, 2011, Ahn et al., 2012) and nonconvex stochastic optimization (Raginsky et al., 2017). For the setting of Bayesian inference, one is interested in sampling from a posterior probability measure where U corresponds to the sum of the log-likelihood and the log-prior. For the nonconvex optimization, U(·) is the nonconvex cost function to be minimized. For large values ofβ, a sample from the target measure (1) is an approximate minimizer of the potential U (Raginsky et al., 2017). Consequently, nonasymptotic error bounds for the schemes, which are designed to sample from (1), can be used to obtain guarantees for Bayesian inference or nonconvex optimization. Sampling from a measure of the form (1) is also central in statistical physics (Binder et al., 1993), most notably in molecular dynamics Haile (1992).

assumption 2, chau and rasonyi, theorem 2, (8 more...)

2002.05465

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > New York (0.04)
Europe > United Kingdom > England > West Midlands > Coventry (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.74)

Akinwande, Victor, Cintas, Celia, Speakman, Skyler, Sridharan, Srihari

Identifying Audio Adversarial Examples via Anomalous Pattern Detection

Audio processing models based on deep neural networks are susceptible to adversarial attacks even when the adversarial audio waveform is 99.9% similar to a benign sample. Given the wide application of DNN-based audio recognition systems, detecting the presence of adversarial examples is of high practical relevance. By applying anomalous pattern detection techniques in the activation space of these models, we show that 2 of the recent and current state-of-the-art adversarial attacks on audio processing systems systematically lead to higher-than-expected activation at some subset of nodes and we can detect these with up to an AUC of 0.98 with no degradation in performance on benign samples.

activation, adversarial example, subset, (13 more...)

2002.05463

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Africa (0.04)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (0.72)
Government > Military (0.57)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

PHOTON -- A Python API for Rapid Machine Learning Model Development

Leenings, Ramona, Winter, Nils Ralf, Plagwitz, Lucas, Holstein, Vincent, Ernsting, Jan, Steenweg, Jakob, Gebker, Julian, Sarink, Kelvin, Emden, Daniel, Grotegerd, Dominik, Opel, Nils, Risse, Benjamin, Jiang, Xiaoyi, Dannlowski, Udo, Hahn, Tim

This article describes the implementation and use of PHOTON, a high-level Python API designed to simplify and accelerate the process of machine learning model development. It enables designing both basic and advanced machine learning pipeline architectures and automatizes the repetitive training, optimization and evaluation workflow. PHOTON offers easy access to established machine learning toolboxes as well as the possibility to integrate custom algorithms and solutions for any part of the model construction and evaluation process. By adding a layer of abstraction incorporating current best practices it offers an easy-to-use, flexible approach to implementing fast, reproducible, and unbiased machine learning solutions.

algorithm, photon, pipeline, (12 more...)

2002.05426

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Germany > North Rhine-Westphalia > Münster Region > Münster (0.04)
Europe > Poland > Masovia Province > Warsaw (0.04)

Genre:

Research Report (1.00)
Workflow (0.89)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)