Goto

Collaborating Authors

 escaping


Escaping from saddle points on Riemannian manifolds

Neural Information Processing Systems

We consider minimizing a nonconvex, smooth function $f$ on a Riemannian manifold $\mathcal{M}$. We show that a perturbed version of the gradient descent algorithm converges to a second-order stationary point for this problem (and hence is able to escape saddle points on the manifold). While the unconstrained problem is well-studied, our result is the first to prove such a rate for nonconvex, manifold-constrained problems. The rate of convergence depends as $1/\epsilon^2$ on the accuracy $\epsilon$, which matches a rate known only for unconstrained smooth minimization. The convergence rate also has a polynomial dependence on the parameters denoting the curvature of the manifold and the smoothness of the function.


Escaping the Gravitational Pull of Softmax

Neural Information Processing Systems

The softmax is the standard transformation used in machine learning to map real-valued vectors to categorical distributions. Unfortunately, this transform poses serious drawbacks for gradient descent (ascent) optimization. We reveal this difficulty by establishing two negative results: (1) optimizing any expectation with respect to the softmax must exhibit sensitivity to parameter initialization ( softmax damping''). Both findings are based on an analysis of convergence rates using the Non-uniform \L{}ojasiewicz (N\L{}) inequalities. To circumvent these shortcomings we investigate an alternative transformation, the \emph{escort} mapping, that demonstrates better optimization properties. The disadvantages of the softmax and the effectiveness of the escort transformation are further explained using the concept of N\L{} coefficient. In addition to proving bounds on convergence rates to firmly establish these results, we also provide experimental evidence for the superiority of the escort transformation.


Escaping from the Barren Plateau via Gaussian Initializations in Deep Variational Quantum Circuits

Neural Information Processing Systems

Variational quantum circuits have been widely employed in quantum simulation and quantum machine learning in recent years. However, quantum circuits with random structures have poor trainability due to the exponentially vanishing gradient with respect to the circuit depth and the qubit number. This result leads to a general standpoint that deep quantum circuits would not be feasible for practical tasks. In this work, we propose an initialization strategy with theoretical guarantees for the vanishing gradient problem in general deep quantum circuits. Specifically, we prove that under proper Gaussian initialized parameters, the norm of the gradient decays at most polynomially when the qubit number and the circuit depth increase. Our theoretical results hold for both the local and the global observable cases, where the latter was believed to have vanishing gradients even for very shallow circuits. Experimental results verify our theoretical findings in quantum simulation and quantum chemistry.


DreamSparse: Escaping from Plato's Cave with 2D Diffusion Model Given Sparse Views

Neural Information Processing Systems

Synthesizing novel view images from a few views is a challenging but practical problem. Existing methods often struggle with producing high-quality results or necessitate per-object optimization in such few-view settings due to the insufficient information provided. In this work, we explore leveraging the strong 2D priors in pre-trained diffusion models for synthesizing novel view images.


Review for NeurIPS paper: Escaping the Gravitational Pull of Softmax

Neural Information Processing Systems

Summary and Contributions: ##Update## The rebuttal adequately addressed my main concerns and I am consequently increasing my score to a 7. In particular I was pleased that the authors investigated the issues with the learning rate, and I would be happy if they mention this potential limitation in their revisions, and include the experimental results showing that the naive adaptive learning rate proposals I made would not be effective. It was also pleasing that they will discuss and compare with Neural Replicator Dynamics, and the additional experiment with sampled actions also looks promising. The reason I didn't increase my score further was that the current set of experiments is still rather simple, and it is difficult for me to assess whether the new method is likely to be widely used. Though, I feel that the contribution may well turn out to be much more influential.


Review for NeurIPS paper: Escaping the Gravitational Pull of Softmax

Neural Information Processing Systems

This paper is proposing alternative to common practices in machine learning: Softmax Policy Gradient for RL and softmax parameterization in classification when minimizing cross-entropy loss. The limitation of softmax in these two cases are well explained, and the paper will be interesting for a wide range of the NeurIPS community.


Escaping the Gravitational Pull of Softmax

Neural Information Processing Systems

The softmax is the standard transformation used in machine learning to map real-valued vectors to categorical distributions. Unfortunately, this transform poses serious drawbacks for gradient descent (ascent) optimization. We reveal this difficulty by establishing two negative results: (1) optimizing any expectation with respect to the softmax must exhibit sensitivity to parameter initialization (softmax gravity well''), and (2) optimizing log-probabilities under the softmax must exhibit slow convergence (softmax damping''). Both findings are based on an analysis of convergence rates using the Non-uniform \L{}ojasiewicz (N\L{}) inequalities. To circumvent these shortcomings we investigate an alternative transformation, the \emph{escort} mapping, that demonstrates better optimization properties.


Escaping from the Barren Plateau via Gaussian Initializations in Deep Variational Quantum Circuits

Neural Information Processing Systems

Variational quantum circuits have been widely employed in quantum simulation and quantum machine learning in recent years. However, quantum circuits with random structures have poor trainability due to the exponentially vanishing gradient with respect to the circuit depth and the qubit number. This result leads to a general standpoint that deep quantum circuits would not be feasible for practical tasks. In this work, we propose an initialization strategy with theoretical guarantees for the vanishing gradient problem in general deep quantum circuits. Specifically, we prove that under proper Gaussian initialized parameters, the norm of the gradient decays at most polynomially when the qubit number and the circuit depth increase.


Escaping from saddle points on Riemannian manifolds

Neural Information Processing Systems

We consider minimizing a nonconvex, smooth function f on a Riemannian manifold \mathcal{M} . We show that a perturbed version of the gradient descent algorithm converges to a second-order stationary point for this problem (and hence is able to escape saddle points on the manifold). While the unconstrained problem is well-studied, our result is the first to prove such a rate for nonconvex, manifold-constrained problems. The rate of convergence depends as 1/\epsilon 2 on the accuracy \epsilon, which matches a rate known only for unconstrained smooth minimization. The convergence rate also has a polynomial dependence on the parameters denoting the curvature of the manifold and the smoothness of the function.


DreamSparse: Escaping from Plato's Cave with 2D Diffusion Model Given Sparse Views

Neural Information Processing Systems

Synthesizing novel view images from a few views is a challenging but practical problem. Existing methods often struggle with producing high-quality results or necessitate per-object optimization in such few-view settings due to the insufficient information provided. In this work, we explore leveraging the strong 2D priors in pre-trained diffusion models for synthesizing novel view images. To address these problems, we propose \textit{DreamSparse}, a framework that enables the frozen pre-trained diffusion model to generate geometry and identity-consistent novel view images. Specifically, DreamSparse incorporates a geometry module designed to capture features about spatial information from sparse views as a 3D prior. Subsequently, a spatial guidance model is introduced to convert rendered feature maps as spatial information for the generative process.