Mathematical & Statistical Methods
Unified Convergence Theory of Stochastic and Variance-Reduced Cubic Newton Methods
Chayti, El Mahdi, Doikov, Nikita, Jaggi, Martin
We study stochastic Cubic Newton methods for solving general possibly non-convex minimization problems. We propose a new framework, which we call the helper framework, that provides a unified view of the stochastic and variance-reduced second-order algorithms equipped with global complexity guarantees. It can also be applied to learning with auxiliary information. Our helper framework offers the algorithm designer high flexibility for constructing and analyzing the stochastic Cubic Newton methods, allowing arbitrary size batches, and the use of noisy and possibly biased estimates of the gradients and Hessians, incorporating both the variance reduction and the lazy Hessian updates. We recover the best-known complexities for the stochastic and variance-reduced Cubic Newton, under weak assumptions on the noise. A direct consequence of our theory is the new lazy stochastic second-order method, which significantly improves the arithmetic complexity for large dimension problems. We also establish complexity bounds for the classes of gradient-dominated objectives, that include convex and strongly convex problems. For Auxiliary Learning, we show that using a helper (auxiliary function) can outperform training alone if a given similarity measure is small.
First and zeroth-order implementations of the regularized Newton method with lazy approximated Hessians
Doikov, Nikita, Grapiglia, Geovani Nunes
In this work, we develop first-order (Hessian-free) and zero-order (derivative-free) implementations of the Cubically regularized Newton method for solving general non-convex optimization problems. For that, we employ finite difference approximations of the derivatives. We use a special adaptive search procedure in our algorithms, which simultaneously fits both the regularization constant and the parameters of the finite difference approximations. It makes our schemes free from the need to know the actual Lipschitz constants. Additionally, we equip our algorithms with the lazy Hessian update that reuse a previously computed Hessian approximation matrix for several iterations. Specifically, we prove the global complexity bound of $\mathcal{O}( n^{1/2} \epsilon^{-3/2})$ function and gradient evaluations for our new Hessian-free method, and a bound of $\mathcal{O}( n^{3/2} \epsilon^{-3/2} )$ function evaluations for the derivative-free method, where $n$ is the dimension of the problem and $\epsilon$ is the desired accuracy for the gradient norm. These complexity bounds significantly improve the previously known ones in terms of the joint dependence on $n$ and $\epsilon$, for the first-order and zeroth-order non-convex optimization.
Model-agnostic network inference enhancement from noisy measurements via curriculum learning
Wu, Kai, Li, Yuanyuan, Liu, Jing
Noise is a pervasive element within real-world measurement data, significantly undermining the performance of network inference models. However, the quest for a comprehensive enhancement framework capable of bolstering noise resistance across a diverse array of network inference models has remained elusive. Here, we present an elegant and efficient framework tailored to amplify the capabilities of network inference models in the presence of noise. Leveraging curriculum learning, we mitigate the deleterious impact of noisy samples on network inference models. Our proposed framework is model-agnostic, seamlessly integrable into a plethora of model-based and model-free network inference methods. Notably, we utilize one model-based and three model-free network inference methods as the foundation. Extensive experimentation across various synthetic and real-world networks, encapsulating diverse nonlinear dynamic processes, showcases substantial performance augmentation under varied noise types, particularly thriving in scenarios enriched with clean samples. This framework's adeptness in fortifying both model-free and model-based network inference methodologies paves the avenue towards a comprehensive and unified enhancement framework, encompassing the entire spectrum of network inference models. Available Code: https://github.com/xiaoyuans/MANIE.
Locally Stationary Graph Processes
Canbolat, Abdullah, Vural, Elif
Stationary graph process models are commonly used in the analysis and inference of data sets collected on irregular network topologies. While most of the existing methods represent graph signals with a single stationary process model that is globally valid on the entire graph, in many practical problems, the characteristics of the process may be subject to local variations in different regions of the graph. In this work, we propose a locally stationary graph process (LSGP) model that aims to extend the classical concept of local stationarity to irregular graph domains. We characterize local stationarity by expressing the overall process as the combination of a set of component processes such that the extent to which the process adheres to each component varies smoothly over the graph. We propose an algorithm for computing LSGP models from realizations of the process, and also study the approximation of LSGPs locally with WSS processes. Experiments on signal interpolation problems show that the proposed process model provides accurate signal representations competitive with the state of the art.
Logarithmic Regret in Multisecretary and Online Linear Programs with Continuous Valuations
I study how the shadow prices of a linear program that allocates an endowment of $n\beta \in \mathbb{R}^{m}$ resources to $n$ customers behave as $n \rightarrow \infty$. I show the shadow prices (i) adhere to a concentration of measure, (ii) converge to a multivariate normal under central-limit-theorem scaling, and (iii) have a variance that decreases like $\Theta(1/n)$. I use these results to prove that the expected regret in \cites{Li2019b} online linear program is $\Theta(\log n)$, both when the customer variable distribution is known upfront and must be learned on the fly. I thus tighten \citeauthors{Li2019b} upper bound from $O(\log n \log \log n)$ to $O(\log n)$, and extend \cites{Lueker1995} $\Omega(\log n)$ lower bound to the multi-dimensional setting. I illustrate my new techniques with a simple analysis of \cites{Arlotto2019} multisecretary problem.
MF-OMO: An Optimization Formulation of Mean-Field Games
Guo, Xin, Hu, Anran, Zhang, Junzi
This paper proposes a new mathematical paradigm to analyze discrete-time mean-field games. It is shown that finding Nash equilibrium solutions for a general class of discrete-time mean-field games is equivalent to solving an optimization problem with bounded variables and simple convex constraints, called MF-OMO. This equivalence framework enables finding multiple (and possibly all) Nash equilibrium solutions of mean-field games by standard algorithms. For instance, projected gradient descent is shown to be capable of retrieving all possible Nash equilibrium solutions when there are finitely many of them, with proper initializations. Moreover, analyzing mean-field games with linear rewards and mean-field independent dynamics is reduced to solving a finite number of linear programs, hence solvable in finite time. This framework does not rely on the contractive and the monotone assumptions and the uniqueness of the Nash equilibrium.
On the lifting and reconstruction of nonlinear systems with multiple attractors
Pan, Shaowu, Duraisamy, Karthik
The Koopman operator provides a linear perspective on non-linear dynamics by focusing on the evolution of observables in an invariant subspace. Observables of interest are typically linearly reconstructed from the Koopman eigenfunctions. Despite the broad use of Koopman operators over the past few years, there exist some misconceptions about the applicability of Koopman operators to dynamical systems with more than one fixed point. In this work, an explanation is provided for the mechanism of lifting for the Koopman operator of nonlinear systems with multiple attractors. Considering the example of the Duffing oscillator, we show that by exploiting the inherent symmetry between the basins of attraction, a linear reconstruction with three degrees of freedom in the Koopman observable space is sufficient to globally linearize the system.
The SO(3) and SE(3) Lie Algebras of Rigid Body Rotations and Motions and their Application to Discrete Integration, Gradient Descent Optimization, and State Estimation
Classical mathematical techniques such as discrete integration, gradient descent optimization, and state estimation (exemplified by the Runge-Kutta method, Gauss-Newton minimization, and extended Kalman filter or EKF, respectively), rely on linear algebra and hence are only applicable to state vectors belonging to Euclidean spaces when implemented as described in the literature. This document discusses how to modify these methods so they can be applied to non-Euclidean state vectors, such as those containing rotations and full motions of rigid bodies. To do so, this document provides an in-depth review of the concept of manifolds or Lie groups, together with their tangent spaces or Lie algebras, their exponential and logarithmic maps, the analysis of perturbations, the treatment of uncertainty and covariance, and in particular the definitions of the Jacobians required to employ the previously mentioned calculus methods. These concepts are particularized to the specific cases of the SO(3) and SE(3) Lie groups, known as the special orthogonal and special Euclidean groups of R3, which represent the rigid body rotations and motions, describing their various possible parameterizations as well as their advantages and disadvantages.
Why Isaac Newton's laws still give physicists a lot to think about
THERE are two kinds of theoretical physicists: those who use the correct equation for calculating distances in space-time, and those who don't. Obviously, I'm being a bit tongue in cheek here, but I'm serious when I say there is a real bifurcation. The two formulations of this equation for distance – what we call the metric – are equivalent and when used correctly will give the same answers for all calculations. But each group has reasons for believing that one is more natural than the other. Particle physicists tend to use one; relativists (people trained in general relativity) tend to use the other.
Decision-Focused Learning: Foundations, State of the Art, Benchmark and Future Opportunities
Mandi, Jayanta, Kotary, James, Berden, Senne, Mulamba, Maxime, Bucarey, Victor, Guns, Tias, Fioretto, Ferdinando
Decision-focused learning (DFL) is an emerging paradigm in machine learning which trains a model to optimize decisions, integrating prediction and optimization in an end-to-end system. This paradigm holds the promise to revolutionize decision-making in many real-world applications which operate under uncertainty, where the estimation of unknown parameters within these decision models often becomes a substantial roadblock. This paper presents a comprehensive review of DFL. It provides an in-depth analysis of the various techniques devised to integrate machine learning and optimization models, introduces a taxonomy of DFL methods distinguished by their unique characteristics, and conducts an extensive empirical evaluation of these methods proposing suitable benchmark dataset and tasks for DFL. Finally, the study provides valuable insights into current and potential future avenues in DFL research.