Gradient Descent
A Further Related Work on Optimization
Different from these gradient-based methods, we focus on the gradient-free methods in this paper. We are also aware of many recent works on the algorithmic design in the structured nonsmooth nonconvex optimization. Then, we proceed to prove the second statement. In this section, we present some technical lemmas for analyzing the convergence property of gradient-free method and its two-phase version. We also give the proofs of Theorem 3.2 and 3.4.
Analyzing the Generalization Capability of SGLD Using Properties of Gaussian Channels
Optimization is a key component for training machine learning models and has a strong impact on their generalization. In this paper, we consider a particular optimization method--the stochastic gradient Langevin dynamics (SGLD) algorithm--and investigate the generalization of models trained by SGLD.