Optimization
. We would like to point out that
We would like to thank all the valuable and constructive feedback from the reviewers. AdaReg does not explicitly enforce the weight matrices to be positively/negatively correlated. Therefore, our method is orthogonal to but not contradictory with Dropout. Inspired by this result, we explored hyperparameter learning by empirical Bayes. BatchNorm, we do observe that smaller batch size leads to better generalizations.
Appendices for " Finding Second-Order Stationary Points Efficiently in Smooth Nonconvex Linearly Constrained Optimization Problems " A Details of Implementation of Algorithms
In this section, we will elaborate more about the ideas of designing SNAP . First, we give the main motivation of selecting the update directions. Next, we will give the detailed algorithm description of the line search used in SNAP . A.2 Line Search Algorithm To understand the algorithm, let us first define the set of inactive constraints as A Lemma 2. If there exists an index i A (x Therefore, the line search algorithm reduces to the classic unconstrained update. If so, then the algorithm either touches the boundary without increasing the objective, or it has already achieved sufficient descent.
Finding Second-Order Stationary Points Efficiently in Smooth Nonconvex Linearly Constrained Optimization Problems
This paper proposes two efficient algorithms for computing approximate second-order stationary points (SOSPs) of problems with generic smooth non-convex objective functions and generic linear constraints. While finding (approximate) SOSPs for the class of smooth non-convex linearly constrained problems is computationally intractable, we show that generic problem instances in this class can be solved efficiently. Specifically, for a generic problem instance, we show that certain strict complementarity (SC) condition holds for all Karush-Kuhn-Tucker (KKT) solutions. Based on this condition, we design an algorithm named S uccessive N egative-curvature grA dient P rojection (SNAP), which performs either conventional gradient projection or some negative curvature based projection steps to find SOSPs.