Search
SDEs for Minimax Optimization
Compagnoni, Enea Monzio, Orvieto, Antonio, Kersting, Hans, Proske, Frank Norbert, Lucchi, Aurelien
Minimax optimization problems have attracted a lot of attention over the past few years, with applications ranging from economics to machine learning. While advanced optimization methods exist for such problems, characterizing their dynamics in stochastic scenarios remains notably challenging. In this paper, we pioneer the use of stochastic differential equations (SDEs) to analyze and compare Minimax optimizers. Our SDE models for Stochastic Gradient Descent-Ascent, Stochastic Extragradient, and Stochastic Hamiltonian Gradient Descent are provable approximations of their algorithmic counterparts, clearly showcasing the interplay between hyperparameters, implicit regularization, and implicit curvature-induced noise. This perspective also allows for a unified and simplified analysis strategy based on the principles of It\^o calculus. Finally, our approach facilitates the derivation of convergence conditions and closed-form solutions for the dynamics in simplified settings, unveiling further insights into the behavior of different optimizers.
Partial Search in a Frozen Network is Enough to Find a Strong Lottery Ticket
Otsuka, Hikari, Chijiwa, Daiki, Garcรญa-Arias, รngel Lรณpez, Okoshi, Yasuyuki, Kawamura, Kazushi, Van Chu, Thiem, Fujiki, Daichi, Takeuchi, Susumu, Motomura, Masato
Randomly initialized dense networks contain subnetworks that achieve high accuracy without weight learning -- strong lottery tickets (SLTs). Recently, Gadhikar et al. (2023) demonstrated theoretically and experimentally that SLTs can also be found within a randomly pruned source network, thus reducing the SLT search space. However, this limits the search to SLTs that are even sparser than the source, leading to worse accuracy due to unintentionally high sparsity. This paper proposes a method that reduces the SLT search space by an arbitrary ratio that is independent of the desired SLT sparsity. A random subset of the initial weights is excluded from the search space by freezing it -- i.e., by either permanently pruning them or locking them as a fixed part of the SLT. Indeed, the SLT existence in such a reduced search space is theoretically guaranteed by our subset-sum approximation with randomly frozen variables. In addition to reducing search space, the random freezing pattern can also be exploited to reduce model size in inference. Furthermore, experimental results show that the proposed method finds SLTs with better accuracy and model size trade-off than the SLTs obtained from dense or randomly pruned source networks. In particular, the SLT found in a frozen graph neural network achieves higher accuracy than its weight trained counterpart while reducing model size by $40.3\times$.
Linear bandits with polylogarithmic minimax regret
Lumbreras, Josep, Tomamichel, Marco
We study a noise model for linear stochastic bandits for which the subgaussian noise parameter vanishes linearly as we select actions on the unit sphere closer and closer to the unknown vector. We introduce an algorithm for this problem that exhibits a minimax regret scaling as $\log^3(T)$ in the time horizon $T$, in stark contrast the square root scaling of this regret for typical bandit algorithms. Our strategy, based on weighted least-squares estimation, achieves the eigenvalue relation $\lambda_{\min} ( V_t ) = \Omega (\sqrt{\lambda_{\max}(V_t ) })$ for the design matrix $V_t$ at each time step $t$ through geometrical arguments that are independent of the noise model and might be of independent interest. This allows us to tightly control the expected regret in each time step to be of the order $O(\frac1{t})$, leading to the logarithmic scaling of the cumulative regret.
NAS-Bench-Graph: Benchmarking Graph Neural Architecture Search โ, Zeyang Zhang, Wenwu Zhu
Graph neural architecture search (GraphNAS) has recently aroused considerable attention in both academia and industry. However, two key challenges seriously hinder the further research of GraphNAS. First, since there is no consensus for the experimental setting, the empirical results in different research papers are often not comparable and even not reproducible, leading to unfair comparisons. Secondly, GraphNAS often needs extensive computations, which makes it highly inefficient and inaccessible to researchers without access to large-scale computation. To solve these challenges, we propose NAS-Bench-Graph, a tailored benchmark that supports unified, reproducible, and efficient evaluations for GraphNAS.
Lower bounds on minimax rates for nonparametric regression with additive sparsity and smoothness
This paper uses information-theoretic techniques to determine minimax rates for estimating nonparametric sparse additive regression models under high-dimensional scaling. The first term reflects the difficulty of performing \emph{subset selection} and is independent of the Hilbert space \Hilb; the second term \LowerRateSq is an \emph{\s-dimensional estimation} term, depending only on the low dimension \s but not the ambient dimension \pdim, that captures the difficulty of estimating a sum of \s univariate functions in the Hilbert space \Hilb . The minimax rates are compared with rates achieved by an \ell_1 -penalty based approach, it can be shown that a certain \ell_1 -based approach achieves the minimax optimal rate.
Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression
Generalized Linear Models (GLMs) and Single Index Models (SIMs) provide powerful generalizations of linear regression, where the target variable is assumed to be a (possibly unknown) 1-dimensional function of a linear predictor. In general, these problems entail non-convex estimation procedures, and, in practice, iterative local search heuristics are often used. Kalai and Sastry (2009) provided the first provably efficient method, the \emph{Isotron} algorithm, for learning SIMs and GLMs, under the assumption that the data is in fact generated under a GLM and under certain monotonicity and Lipschitz (bounded slope) constraints. However, to obtain provable performance, the method requires a fresh sample every iteration. In this paper, we provide algorithms for learning GLMs and SIMs, which are both computationally and statistically efficient.
Selecting Diverse Features via Spectral Regularization
We study the problem of diverse feature selection in linear regression: selecting a small subset of diverse features that can predict a given objective. Diversity is useful for several reasons such as interpretability, robustness to noise, etc. We propose several spectral regularizers that capture a notion of diversity of features and show that these are all submodular set functions. These regularizers, when added to the objective function for linear regression, result in approximately submodular functions, which can then be maximized approximately by efficient greedy and local search algorithms, with provable guarantees. We compare our algorithms to traditional greedy and \ell_1 -regularization schemes and show that we obtain a more diverse set of features that result in the regression problem being stable under perturbations.
IntSat: Integer Linear Programming by Conflict-Driven Constraint-Learning
Nieuwenhuis, Robert, Oliveras, Albert, Rodriguez-Carbonell, Enric
State-of-the-art SAT solvers are nowadays able to handle huge real-world instances. The key to this success is the so-called Conflict-Driven Clause-Learning (CDCL) scheme, which encompasses a number of techniques that exploit the conflicts that are encountered during the search for a solution. In this article we extend these techniques to Integer Linear Programming (ILP), where variables may take general integer values instead of purely binary ones, constraints are more expressive than just propositional clauses, and there may be an objective function to optimise. We explain how these methods can be implemented efficiently, and discuss possible improvements. Our work is backed with a basic implementation that shows that, even in this far less mature stage, our techniques are already a useful complement to the state of the art in ILP solving.