AITopics | Gradient Descent

Minimizing the empirical risk is a popular training strategy, but for learning tasks where the data may be noisy or heavy-tailed, one may require many observations in order to generalize well. To achieve better performance under less stringent requirements, we introduce a procedure which constructs a robust approximation of the risk gradient for use in an iterative learning routine. We provide high-probability bounds on the excess risk of this algorithm, by showing that it does not deviate far from the ideal gradient-based update. Empirical tests show that in diverse settings, the proposed procedure can learn more efficiently, using less resources (iterations and observations) while generalizing better.

algorithm 1, artificial intelligence, machine learning, (15 more...)

arXiv.org Machine Learning

1706.00182

Country: Europe > Austria (0.28)

Genre:

Research Report (0.50)
Workflow (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.64)

Add feedback

Keep it simple! How to understand Gradient Descent algorithm

@machinelearnbotMay-30-2017, 22:23:59 GMT

When I first started out learning about machine learning algorithms, it turned out to be quite a task to gain an intuition of what the algorithms are doing. Not just because it was difficult to understand all the mathematical theory and notations, but it was also plain boring. When I turned to online tutorials for answers, I could again only see equations or high level explanations without going through the detail in a majority of the cases. It was then that one of my data science colleagues introduced me to the concept of working out an algorithm in an excel sheet. And that worked wonders for me.

algorithm, artificial intelligence, machine learning, (13 more...)

@machinelearnbot

Industry: Education (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)

Add feedback

will wolf

@machinelearnbotMay-30-2017, 06:55:08 GMT

Roughly speaking, my machine learning journey began on Kaggle. "Regression models predict continuous-valued real numbers; classification models predict'red,' 'green,' 'blue.' Typically, the former employs the mean squared error or mean absolute error; the latter, the cross-entropy loss. Stochastic gradient descent updates the model's parameters to drive these losses down." Furthermore, to fit these models, just import sklearn. A dexterity with the above is often sufficient for -- at least from a technical stance -- both employment and impact as a data scientist. In industry, commonplace prediction and inference problems -- binary churn, credit scoring, product recommendation and A/B testing, for example -- are easily matched with an off-the-shelf algorithm plus proficient data scientist for a measurable boost to the company's bottom line. In a vacuum I think this is fine: the winning driver does not need to know how to build the car.

artificial intelligence, machine learning, probability, (18 more...)

@machinelearnbot

Country:

Africa > West Africa (0.04)
Africa > Middle East > Morocco > Casablanca-Settat Region > Casablanca (0.04)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.49)

Add feedback

Diagonal Rescaling For Neural Networks

Lafond, Jean, Vasilache, Nicolas, Bottou, Léon

arXiv.org Machine LearningMay-25-2017

We define a second-order neural network stochastic gradient training algorithm whose block-diagonal structure effectively amounts to normalizing the unit activations. Investigating why this algorithm lacks in robustness then reveals two interesting insights. The first insight suggests a new way to scale the stepsizes, clarifying popular algorithms such as RMSProp as well as old neural network tricks such as fanin stepsize scaling. The second insight stresses the practical importance of dealing with fast changes of the curvature of the cost.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

1705.09319

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback

Implicit Regularization in Matrix Factorization

Gunasekar, Suriya, Woodworth, Blake, Bhojanapalli, Srinadh, Neyshabur, Behnam, Srebro, Nathan

arXiv.org Machine LearningMay-25-2017

We study implicit regularization when optimizing an underdetermined quadratic objective over a matrix $X$ with gradient descent on a factorization of $X$. We conjecture and provide empirical and theoretical evidence that with small enough step sizes and initialization close enough to the origin, gradient descent on a full dimensional factorization converges to the minimum nuclear norm solution.

artificial intelligence, gradient descent, machine learning, (16 more...)

arXiv.org Machine Learning

1705.0928

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.76)

Add feedback

Adequacy of the Gradient-Descent Method for Classifier Evasion Attacks

Han, Yi, Rubinstein, Benjamin I. P.

arXiv.org Machine LearningMay-25-2017

Despite the wide use of machine learning in adversarial settings including computer security, recent studies have demonstrated vulnerabilities to evasion attacks---carefully crafted adversarial samples that closely resemble legitimate instances, but cause misclassification. In this paper, we examine the adequacy of the leading approach to generating adversarial samples---the gradient descent approach. In particular (1) we perform extensive experiments on three datasets, MNIST, USPS and Spambase, in order to analyse the effectiveness of the gradient-descent method against non-linear support vector machines, and conclude that carefully reduced kernel smoothness can significantly increase robustness to the attack; (2) we demonstrate that separated inter-class support vectors lead to more secure models, and propose a quantity similar to margin that can efficiently predict potential susceptibility to gradient-descent attacks, before the attack is launched; and (3) we design a new adversarial sample construction algorithm based on optimising the multiplicative ratio of class decision functions.

artificial intelligence, evasion attack, machine learning, (15 more...)

arXiv.org Machine Learning

1704.01704

Country: North America > United States (0.89)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (0.86)
Government > Regional Government > North America Government > United States Government (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Learn under the hood of Gradient Descent algorithm using excel

@machinelearnbotMay-24-2017, 22:55:28 GMT

When I first started out learning about machine learning algorithms, it turned out to be quite a task to gain an intuition of what the algorithms are doing. Not just because it was difficult to understand all the mathematical theory and notations, but it was also plain boring. When I turned to online tutorials for answers, I could again only see equations or high level explanations without going through the detail in a majority of the cases. It was then that one of my data science colleagues introduced me to the concept of working out an algorithm in an excel sheet. And that worked wonders for me.

algorithm, artificial intelligence, machine learning, (14 more...)

@machinelearnbot

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.47)

Add feedback

Efficient Private ERM for Smooth Objectives

Zhang, Jiaqi, Zheng, Kai, Mou, Wenlong, Wang, Liwei

arXiv.org Machine LearningMay-24-2017

In this paper, we consider efficient differentially private empirical risk minimization from the viewpoint of optimization algorithms. For strongly convex and smooth objectives, we prove that gradient descent with output perturbation not only achieves nearly optimal utility, but also significantly improves the running time of previous state-of-the-art private optimization algorithms, for both $\epsilon$-DP and $(\epsilon, \delta)$-DP. For non-convex but smooth objectives, we propose an RRPSGD (Random Round Private Stochastic Gradient Descent) algorithm, which provably converges to a stationary point with privacy guarantee. Besides the expected utility bounds, we also provide guarantees in high probability form. Experiments demonstrate that our algorithm consistently outperforms existing method in both utility and running time.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1703.09947

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.77)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)

Add feedback

From optimal transport to generative modeling: the VEGAN cookbook

Bousquet, Olivier, Gelly, Sylvain, Tolstikhin, Ilya, Simon-Gabriel, Carl-Johann, Schoelkopf, Bernhard

arXiv.org Machine LearningMay-22-2017

We study unsupervised generative modeling in terms of the optimal transport (OT) problem between true (but unknown) data distribution $P_X$ and the latent variable model distribution $P_G$. We show that the OT problem can be equivalently written in terms of probabilistic encoders, which are constrained to match the posterior and prior distributions over the latent space. When relaxed, this constrained optimization problem leads to a penalized optimal transport (POT) objective, which can be efficiently minimized using stochastic gradient descent by sampling from $P_X$ and $P_G$. We show that POT for the 2-Wasserstein distance coincides with the objective heuristically employed in adversarial auto-encoders (AAE) (Makhzani et al., 2016), which provides the first theoretical justification for AAEs known to the authors. We also compare POT to other popular techniques like variational auto-encoders (VAE) (Kingma and Welling, 2014). Our theoretical results include (a) a better understanding of the commonly observed blurriness of images generated by VAEs, and (b) establishing duality between Wasserstein GAN (Arjovsky and Bottou, 2017) and POT for the 1-Wasserstein distance.

artificial intelligence, machine learning, optimization problem, (18 more...)

arXiv.org Machine Learning

1705.07642

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Parallel Stochastic Gradient Descent with Sound Combiners

Maleki, Saeed, Musuvathi, Madanlal, Mytkowicz, Todd

arXiv.org Machine LearningMay-22-2017

Stochastic gradient descent (SGD) is a well known method for regression and classification tasks. However, it is an inherently sequential algorithm at each step, the processing of the current example depends on the parameters learned from the previous examples. Prior approaches to parallelizing linear learners using SGD, such as HOGWILD! and ALLREDUCE, do not honor these dependencies across threads and thus can potentially suffer poor convergence rates and/or poor scalability. This paper proposes SYMSGD, a parallel SGD algorithm that, to a first-order approximation, retains the sequential semantics of SGD. Each thread learns a local model in addition to a model combiner, which allows local models to be combined to produce the same result as what a sequential SGD would have produced. This paper evaluates SYMSGD's accuracy and performance on 6 datasets on a shared-memory machine shows upto 11x speedup over our heavily optimized sequential baseline on 16 cores and 2.2x, on average, faster than HOGWILD!.

artificial intelligence, machine learning, parallel stochastic gradient descent, (14 more...)

arXiv.org Machine Learning

1705.0803

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Industry: Education (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback