Gradient Descent
Reviews: Accelerating Rescaled Gradient Descent: Fast Optimization of Smooth Functions
I think the first part of the paper has very good original contributions with correct and nicely-written proofs in the appendix. However, I have the following questions regarding the parts of the paper starting at Section 3. Sorry if these are redundant questions with obvious answers that I missed. The RGD framework is mentioned for both convex and non-convex functions (Lemma 4 doesn't require f to be convex). However, the examples provided are all convex functions, and the focus also seems to be quite heavily on convex functions (because none of the papers on nonconvex optimization are compared with). Do the authors have (1) theoretical results and comparisons with existing work and/or (2)experiments, for non-convex functions?
Review for NeurIPS paper: Reparameterizing Mirror Descent as Gradient Descent
Additional Feedback: Suggestions: Lack definition (anything that is not'common knowledge' should be defined and explained before using. Should not let readers guess.) 1. In eq(1), w and L is used without defined. Could first introduce the problem and mention L is loss or the target function, and w is the model parameter. 'coincides with' is not a commonly used, mathematically rigorous and clear expression.
Review for NeurIPS paper: Reparameterizing Mirror Descent as Gradient Descent
The theoretical result of the paper is significant in my opinion. I also agree with the authors that the topic of this paper is at the core of machine learning and thus the paper should be evaluated based on its contributions. The reviewers also adjusted their reviews based on this point. However, I should also mention that the reviewers raised the concern that some of the definitions are omitted and few parts of the paper is not rigorous enough. Therefore, I suggest that the authors take care of such ambiguities in the final version.
Review for NeurIPS paper: Efficient Online Learning of Optimal Rankings: Dimensionality Reduction via Gradient Descent
Summary and Contributions: The paper studies a variant of online ranking problem. In the offline settings preferred items belong to different groups, and one need to generate a sequence of items so that qualitatively speaking for each group R_t of items at least k_t elements appear high in the list. More formally they formulate the problem as an online version of the known "Generalized Min-Sum Set Cover" (GMSSC) task: In this problem, given a set U {1,...,n} of n items, in each step 1. The learner selects a permutation \pi_t items in U 2. The adversary selects a request R_t \subset U with demand k_t . The goal is to minimize the multiplicative regret, that is the ratio of the total cost to the cost of selecting a fixed optimal permutation \pi* in all steps.
Reviews: Poincaré Recurrence, Cycles and Spurious Equilibria in Gradient-Descent-Ascent for Non-Convex Non-Concave Zero-Sum Games
I appreciate the author's taking time to respond to comments from my review. In particular I look forward to see similar techniques applied to multi-agent hidden network zero-sum games as per their response. The action space is given by finite vector spaces, and these map to a lower dimensional space via a smooth function. Payoffs are then zero-sum in this space. Although it is already known that gradient descent is difficult for games (even zero-sum games as those used), the authors show that even in this general dynamic the situation is just as dire.
Reviews: Poincaré Recurrence, Cycles and Spurious Equilibria in Gradient-Descent-Ascent for Non-Convex Non-Concave Zero-Sum Games
Following the author response, consensus on acceptance was easily reached, on the grounds that the paper makes important contributions to gradient dynamics in certain zero-sum games which capture a restricted class of GANs. We encourage the authors to clarify the introduction/motivation to accurately reflect the results.
Review for NeurIPS paper: A Single-Loop Smoothed Gradient Descent-Ascent Algorithm for Nonconvex-Concave Min-Max Problems
The title used for the paper is misleading since the rate O(epsilon {-2}) can only be achieved for a special class of non-convex concave min-max problems. More specifically, this rate is attained when using the algorithm to minimize the pointwise maximum of finite collection of non-convex functions. For general non-convex concave the rate is O(epsilon {-4}). This is clarified in the body of the paper, but the title is clearly misleading. It is not clear in the paper when this assumption (or even its relaxed replacement in the appendix) holds in practice.