Goto

Collaborating Authors

 direct optimization



Direct Optimization through \arg \max for Discrete Variational Auto-Encoder

Neural Information Processing Systems

Reparameterization of variational auto-encoders with continuous random variables is an effective method for reducing the variance of their gradient estimates. In the discrete case, one can perform reparametrization using the Gumbel-Max trick, but the resulting objective relies on an $\arg \max$ operation and is non-differentiable. In contrast to previous works which resort to \emph{softmax}-based relaxations, we propose to optimize it directly by applying the \emph{direct loss minimization} approach. Our proposal extends naturally to structured discrete latent variable models when evaluating the $\arg \max$ operation is tractable. We demonstrate empirically the effectiveness of the direct loss minimization technique in variational autoencoders with both unstructured and structured discrete latent variables.


Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces

Neural Information Processing Systems

Direct optimization (McAllester et al., 2010; Song et al., 2016) is an appealing framework that replaces integration with optimization of a random objective for approximating gradients in models with discrete random variables (Lorberbom et al., 2018). A* sampling (Maddison et al., 2014) is a framework for optimizing such random objectives over large spaces. We show how to combine these techniques to yield a reinforcement learning algorithm that approximates a policy gradient by finding trajectories that optimize a random objective. We call the resulting algorithms \emph{direct policy gradient} (DirPG) algorithms. A main benefit of DirPG algorithms is that they allow the insertion of domain knowledge in the form of upper bounds on return-to-go at training time, like is used in heuristic search, while still directly computing a policy gradient. We further analyze their properties, showing there are cases where DirPG has an exponentially larger probability of sampling informative gradients compared to REINFORCE. We also show that there is a built-in variance reduction technique and that a parameter that was previously viewed as a numerical approximation can be interpreted as controlling risk sensitivity. Empirically, we evaluate the effect of key degrees of freedom and show that the algorithm performs well in illustrative domains compared to baselines.


Amortized Synthesis of Constrained Configurations Using a Differentiable Surrogate

Neural Information Processing Systems

In design, fabrication, and control problems, we are often faced with the task of synthesis, in which we must generate an object or configuration that satisfies a set of constraints while maximizing one or more objective functions. The synthesis problem is typically characterized by a physical process in which many different realizations may achieve the goal. This many-to-one map presents challenges to the supervised learning of feed-forward synthesis, as the set of viable designs may have a complex structure. In addition, the non-differentiable nature of many physical simulations prevents efficient direct optimization. We address both of these problems with a two-stage neural network architecture that we may consider to be an autoencoder.


Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The paper is aiming to speed up optimization for the of graphical model MAP problem. The authors suggest a coarse to fine approach and clearly motivate their work. The method suggested by the authors combines label pruning for each variable with sequential optimization of the pruned, yet finer problem. As far as i can tell, the work is original and correct.



Differentiable Sparsity via $D$-Gating: Simple and Versatile Structured Penalization

Kolb, Chris, Frost, Laetitia, Bischl, Bernd, Rügamer, David

arXiv.org Machine Learning

Structured sparsity regularization offers a principled way to compact neural networks, but its non-differentiability breaks compatibility with conventional stochastic gradient descent and requires either specialized optimizers or additional post-hoc pruning without formal guarantees. In this work, we propose $D$-Gating, a fully differentiable structured overparameterization that splits each group of weights into a primary weight vector and multiple scalar gating factors. We prove that any local minimum under $D$-Gating is also a local minimum using non-smooth structured $L_{2,2/D}$ penalization, and further show that the $D$-Gating objective converges at least exponentially fast to the $L_{2,2/D}$-regularized loss in the gradient flow limit. Together, our results show that $D$-Gating is theoretically equivalent to solving the original group sparsity problem, yet induces distinct learning dynamics that evolve from a non-sparse regime into sparse optimization. We validate our theory across vision, language, and tabular tasks, where $D$-Gating consistently delivers strong performance-sparsity tradeoffs and outperforms both direct optimization of structured penalties and conventional pruning baselines.


Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces

Neural Information Processing Systems

Direct optimization (McAllester et al., 2010; Song et al., 2016) is an appealing framework that replaces integration with optimization of a random objective for approximating gradients in models with discrete random variables (Lorberbom et al., 2018). A* sampling (Maddison et al., 2014) is a framework for optimizing such random objectives over large spaces. We show how to combine these techniques to yield a reinforcement learning algorithm that approximates a policy gradient by finding trajectories that optimize a random objective. We call the resulting algorithms \emph{direct policy gradient} (DirPG) algorithms. A main benefit of DirPG algorithms is that they allow the insertion of domain knowledge in the form of upper bounds on return-to-go at training time, like is used in heuristic search, while still directly computing a policy gradient.


Review for NeurIPS paper: Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces

Neural Information Processing Systems

Additional Feedback: The motivating example could be explained more clearly. How exactly is the heuristic information incorporated into the search for a_dir? If a simulator is available, one typically wouldn't use a model-free algorithm like REINFORCE. A major benefit of REINFORCE is that it can do a Monte Carlo rollout and have an estimate of the direction to improve the policy without needing a simulator or a model of the environment. Once a simulator is added, it changes the structure of the problem such that different solution methods become available (i.e., MCTS).


Reviews: Direct Optimization through \arg \max for Discrete Variational Auto-Encoder

Neural Information Processing Systems

Originality To the best of my knowledge, the insight that the Gumbel-max trick can be combined with direct loss mimization in the VAE setting is novel. Moreover, the work that had to be done to get the ideas to play well together seems significant and original. Quality I have some reservations about the method presented, which is why I've given a slightly negative overall score. However, it seems very plausible that an author response could clarify my concerns and cause me to revise my score upward. Figure 1: -Epsilon and Tau are different variables in quite different settings.