AITopics | Gradient Descent

Collaborating Authors

Gradient Descent

News Overviews Instructional Materials AI-Alerts Classics

Black-box Explanation of Object Detectors via Saliency Maps

Petsiuk, Vitali, Jain, Rajiv, Manjunatha, Varun, Morariu, Vlad I., Mehra, Ashutosh, Ordonez, Vicente, Saenko, Kate

arXiv.org Artificial IntelligenceJun-4-2020

We propose D-RISE, a method for generating visual explanations for the predictions of object detectors. D-RISE can be considered "black-box" in the software testing sense, it only needs access to the inputs and outputs of an object detector. Compared to gradient-based methods, D-RISE is more general and agnostic to the particular type of object detector being tested as it does not need to know about the inner workings of the model. We show that D-RISE can be easily applied to different object detectors including one-stage detectors such as YOLOv3 and two-stage detectors such as Faster-RCNN. We present a detailed analysis of the generated visual explanations to highlight the utilization of context and the possible biases learned by object detectors.

artificial intelligence, machine learning, saliency map, (15 more...)

arXiv.org Artificial Intelligence

2006.03204

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
(17 more...)

Genre: Research Report > New Finding (0.68)

Industry:

Transportation > Air (0.61)
Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Local SGD With a Communication Overhead Depending Only on the Number of Workers

Spiridonoff, Artin, Olshevsky, Alex, Paschalidis, Ioannis Ch.

arXiv.org Machine LearningJun-3-2020

We consider speeding up stochastic gradient descent (SGD) by parallelizing it across multiple workers. We assume the same data set is shared among $n$ workers, who can take SGD steps and coordinate with a central server. Unfortunately, this could require a lot of communication between the workers and the server, which can dramatically reduce the gains from parallelism. The Local SGD method, proposed and analyzed in the earlier literature, suggests machines should make many local steps between such communications. While the initial analysis of Local SGD showed it needs $\Omega ( \sqrt{T} )$ communications for $T$ local gradient steps in order for the error to scale proportionately to $1/(nT)$, this has been successively improved in a string of papers, with the state-of-the-art requiring $\Omega \left( n \left( \mbox{ polynomial in log } (T) \right) \right)$ communications. In this paper, we give a new analysis of Local SGD. A consequence of our analysis is that Local SGD can achieve an error that scales as $1/(nT)$ with only a fixed number of communications independent of $T$: specifically, only $\Omega(n)$ communications are required.

artificial intelligence, local sgd, machine learning, (14 more...)

arXiv.org Machine Learning

2006.02582

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)

Add feedback

Shallow Neural Hawkes: Non-parametric kernel estimation for Hawkes processes

Joseph, Sobin, Kashyap, Lekhapriya Dheeraj, Jain, Shashi

arXiv.org Machine LearningJun-3-2020

Multi-dimensional Hawkes process (MHP) is a class of self and mutually exciting point processes that find wide range of applications -- from prediction of earthquakes to modelling of order books in high frequency trading. This paper makes two major contributions, we first find an unbiased estimator for the log-likelihood estimator of the Hawkes process to enable efficient use of the stochastic gradient descent method for maximum likelihood estimation. The second contribution is, we propose a specific single hidden layered neural network for the non-parametric estimation of the underlying kernels of the MHP. We evaluate the proposed model on both synthetic and real datasets, and find the method has comparable or better performance than existing estimation methods. The use of shallow neural network ensures that we do not compromise on the interpretability of the Hawkes model, while at the same time have the flexibility to estimate any non-standard Hawkes excitation kernel.

artificial intelligence, bayesian inference, machine learning, (17 more...)

arXiv.org Machine Learning

2006.0246

Country:

Asia > India > Karnataka > Bengaluru (0.04)
Asia > Middle East > Iraq (0.04)
North America > United States > New York (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)

Genre: Research Report (0.82)

Industry: Banking & Finance > Trading (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Learning with CVaR-based feedback under potentially heavy tails

Holland, Matthew J., Haress, El Mehdi

arXiv.org Machine LearningJun-2-2020

We study learning algorithms that seek to minimize the conditional value-at-risk (CVaR), when all the learner knows is that the losses incurred may be heavy-tailed. We begin by studying a general-purpose estimator of CVaR for potentially heavy-tailed random variables, which is easy to implement in practice, and requires nothing more than finite variance and a distribution function that does not change too fast or slow around just the quantile of interest. With this estimator in hand, we then derive a new learning algorithm which robustly chooses among candidates produced by stochastic gradient-driven sub-processes. For this procedure we provide high-probability excess CVaR bounds, and to complement the theory we conduct empirical tests of the underlying CVaR estimator and the learning algorithm derived from it.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Machine Learning

2006.02001

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback

Acceleration of Descent-based Optimization Algorithms via Carath\'eodory's Theorem

Cosentino, Francesco, Oberhauser, Harald, Abate, Alessandro

arXiv.org Machine LearningJun-2-2020

We propose a new technique to accelerate algorithms based on Gradient Descent using Carath\'eodory's Theorem. In the case of the standard Gradient Descent algorithm, we analyse the theoretical convergence of the approach under convexity assumptions and empirically display its ameliorations. As a core contribution, we then present an application of the acceleration technique to Block Coordinate Descent methods. Experimental comparisons on least squares regression with a LASSO regularisation term show remarkably improved performance on LASSO than the ADAM and SAG algorithms.

algorithm, artificial intelligence, machine learning, (19 more...)

arXiv.org Machine Learning

2006.01819

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
Europe > Denmark > North Jutland (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback

Learning optimal environments using projected stochastic gradient ascent

Bolland, Adrien, Boukas, Ioannis, Cornet, François, Berger, Mathias, Ernst, Damien

arXiv.org Machine LearningJun-2-2020

In this work, we generalize the direct policy search algorithms to an algorithm we call Direct Environment Search with (projected stochastic) Gradient Ascent (DESGA). The latter can be used to jointly learn a reinforcement learning (RL) environment and a policy with maximal expected return over a joint hypothesis space of environments and policies. We illustrate the performance of DESGA on two benchmarks. First, we consider a parametrized space of Mass-Spring-Damper (MSD) environments. Then, we use our algorithm for optimizing the size of the components and the operation of a small-scale and autonomous energy system, i.e. a solar off-grid microgrid, composed of photovoltaic panels, batteries, etc.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

2006.01738

Country:

Europe > Belgium > Wallonia > Liège Province > Liège (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Industry:

Energy > Power Industry (1.00)
Energy > Renewable > Solar (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.71)

Add feedback

Least-squares regressions via randomized Hessians

Kahale, Nabil

arXiv.org Machine LearningJun-1-2020

The recent availability of massive volumes of data fosters the need to design computationally efficient algorithms for optimization in high dimensions. In large-scale machine learning, stochastic gradient descent algorithms are among the most effective optimization methods (Bottou, Curtis and Nocedal 2018). For general smooth convex functions, averaged SGD achieves the rate of convergence of O(1/ k) after k iterations (Nemirovski, Juditsky, Lan and Shapiro 2009). For strongly-convex functions, i.e. when the smallest eigenvalue of the Hessian matrix is bounded away from 0, the convergence rate after k iterations is O(1/k) (Nemirovski, Juditsky, Lan and Shapiro 2009). Variance-reduced SGD algorithms that optimize the sum of n convex functions are described in (Schmidt, Le Roux and Bach 2017, Shalev-Shwartz and Zhang 2013, Johnson and Zhang 2013), and related accelerated methods are analysed in (Shalev-Shwartz and Zhang 2014, Nitanda 2014, Lan and Zhou 2018, Scieur, dAspremont and Bach 2018). These methods enjoy linear convergence(a convergence rate that decreases exponentially with the number of iterations) in the strongly-convex case. For general smooth convex functions, the stochastic average gradient method (SAG) of Schmidt, Le Roux and Bach (2017) yields a convergence rate of O( n/k) after k iterations. This paper focuses on the least-squares regression, which often arises in scientific computing and data analysis, and is widely used for inference and prediction. Many of the modern machine learning techniques such as the logistic and ridge regressions, the lasso method and neural networks can be considered as extensions of the least-squares regression technique.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

2006.01017

Country:

North America > United States > New York (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Add feedback

Tree-Projected Gradient Descent for Estimating Gradient-Sparse Parameters on Graphs

Xu, Sheng, Fan, Zhou, Negahban, Sahand

arXiv.org Machine LearningMay-31-2020

We study estimation of a gradient-sparse parameter vector $\boldsymbol{\theta}^* \in \mathbb{R}^p$, having strong gradient-sparsity $s^*:=\|\nabla_G \boldsymbol{\theta}^*\|_0$ on an underlying graph $G$. Given observations $Z_1,\ldots,Z_n$ and a smooth, convex loss function $\mathcal{L}$ for which $\boldsymbol{\theta}^*$ minimizes the population risk $\mathbb{E}[\mathcal{L}(\boldsymbol{\theta};Z_1,\ldots,Z_n)]$, we propose to estimate $\boldsymbol{\theta}^*$ by a projected gradient descent algorithm that iteratively and approximately projects gradient steps onto spaces of vectors having small gradient-sparsity over low-degree spanning trees of $G$. We show that, under suitable restricted strong convexity and smoothness assumptions for the loss, the resulting estimator achieves the squared-error risk $\frac{s^*}{n} \log (1+\frac{p}{s^*})$ up to a multiplicative constant that is independent of $G$. In contrast, previous polynomial-time algorithms have only been shown to achieve this guarantee in more specialized settings, or under additional assumptions for $G$ and/or the sparsity pattern of $\nabla_G \boldsymbol{\theta}^*$. As applications of our general framework, we apply our results to the examples of linear models and generalized linear models with random design.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

2006.01662

Country: Europe > Spain > Galicia > Madrid (0.04)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.72)

Add feedback

Graph Learning with Loss-Guided Training

Buchnik, Eliav, Cohen, Edith

arXiv.org Machine LearningMay-31-2020

Classically, ML models trained with stochastic gradient descent (SGD) are designed to minimize the average loss per example and use a distribution of training examples that remains {\em static} in the course of training. Research in recent years demonstrated, empirically and theoretically, that significant acceleration is possible by methods that dynamically adjust the training distribution in the course of training so that training is more focused on examples with higher loss. We explore {\em loss-guided training} in a new domain of node embedding methods pioneered by {\sc DeepWalk}. These methods work with implicit and large set of positive training examples that are generated using random walks on the input graph and therefore are not amenable for typical example selection methods. We propose computationally efficient methods that allow for loss-guided training in this framework. Our empirical evaluation on a rich collection of datasets shows significant acceleration over the baseline static methods, both in terms of total training performed and overall computation.

artificial intelligence, machine learning, selection, (17 more...)

arXiv.org Machine Learning

2006.0046

Country:

North America > United States > Oregon > Multnomah County > Portland (0.04)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)

Add feedback

A New Accelerated Stochastic Gradient Method with Momentum

Liu, Liang, Luo, Xiaopeng

arXiv.org Machine LearningMay-30-2020

In this paper, we propose a novel accelerated stochastic gradient method with momentum, which momentum is the weighted average of previous gradients. The weights decays inverse proportionally with the iteration times. Stochastic gradient descent with momentum (Sgdm) use weights that decays exponentially with the iteration times to generate an momentum term. Using exponentially decaying weights, variants of Sgdm with well designed and complicated formats have been proposed to achieve better performance. The momentum update rules of our method is as simple as that of Sgdm. We provide theoretical convergence properties analyses for our method, which show both the exponentially decay weights and our inverse proportionally decay weights can limit the variance of the moving direction of parameters to be optimized to a region. Experimental results empirically show that our method works well with practical problems and outperforms Sgdm, and it outperforms Adam in convolutional neural networks.

artificial intelligence, machine learning, new accelerated stochastic gradient method, (13 more...)

arXiv.org Machine Learning

2006.00423

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > Russia (0.04)
(2 more...)

Genre: Research Report > New Finding (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback