Mathematical & Statistical Methods
Online Learning Guided Quasi-Newton Methods with Global Non-Asymptotic Convergence
Jiang, Ruichen, Mokhtari, Aryan
In this paper, we propose a quasi-Newton method for solving smooth and monotone nonlinear equations, including unconstrained minimization and minimax optimization as special cases. For the strongly monotone setting, we establish two global convergence bounds: (i) a linear convergence rate that matches the rate of the celebrated extragradient method, and (ii) an explicit global superlinear convergence rate that provably surpasses the linear convergence rate after at most ${O}(d)$ iterations, where $d$ is the problem's dimension. In addition, for the case where the operator is only monotone, we prove a global convergence rate of ${O}(\min\{{1}/{k},{\sqrt{d}}/{k^{1.25}}\})$ in terms of the duality gap. This matches the rate of the extragradient method when $k = {O}(d^2)$ and is faster when $k = \Omega(d^2)$. These results are the first global convergence results to demonstrate a provable advantage of a quasi-Newton method over the extragradient method, without querying the Jacobian of the operator. Unlike classical quasi-Newton methods, we achieve this by using the hybrid proximal extragradient framework and a novel online learning approach for updating the Jacobian approximation matrices. Specifically, guided by the convergence analysis, we formulate the Jacobian approximation update as an online convex optimization problem over non-symmetric matrices, relating the regret of the online problem to the convergence rate of our method. To facilitate efficient implementation, we further develop a tailored online learning algorithm based on an approximate separation oracle, which preserves structures such as symmetry and sparsity in the Jacobian matrices.
A Proximal Modified Quasi-Newton Method for Nonsmooth Regularized Optimization
Diouane, Youssef, Habiboullah, Mohamed Laghdaf, Orban, Dominique
We develop R2N, a modified quasi-Newton method for minimizing the sum of a $\mathcal{C}^1$ function $f$ and a lower semi-continuous prox-bounded $h$. Both $f$ and $h$ may be nonconvex. At each iteration, our method computes a step by minimizing the sum of a quadratic model of $f$, a model of $h$, and an adaptive quadratic regularization term. A step may be computed by a variant of the proximal-gradient method. An advantage of R2N over trust-region (TR) methods is that proximal operators do not involve an extra TR indicator. We also develop the variant R2DH, in which the model Hessian is diagonal, which allows us to compute a step without relying on a subproblem solver when $h$ is separable. R2DH can be used as standalone solver, but also as subproblem solver inside R2N. We describe non-monotone variants of both R2N and R2DH. Global convergence of a first-order stationarity measure to zero holds without relying on local Lipschitz continuity of $\nabla f$, while allowing model Hessians to grow unbounded, an assumption particularly relevant to quasi-Newton models. Under Lipschitz-continuity of $\nabla f$, we establish a tight worst-case complexity bound of $O(1 / \epsilon^{2/(1 - p)})$ to bring said measure below $\epsilon > 0$, where $0 \leq p < 1$ controls the growth of model Hessians. The latter must not diverge faster than $|\mathcal{S}_k|^p$, where $\mathcal{S}_k$ is the set of successful iterations up to iteration $k$. When $p = 1$, we establish the tight exponential complexity bound $O(\exp(c \epsilon^{-2}))$ where $c > 0$ is a constant. We describe our Julia implementation and report numerical experience on a basis-pursuit problem, image denoising, minimum-rank matrix completion, and a nonlinear support vector machine. In particular, the minimum-rank problem cannot be solved directly at this time by a TR approach as corresponding proximal operators are not known analytically.
Automated conjecturing in mathematics with \emph{TxGraffiti}
\emph{TxGraffiti} is a data-driven, heuristic-based computer program developed to automate the process of generating conjectures across various mathematical domains. Since its creation in 2017, \emph{TxGraffiti} has contributed to numerous mathematical publications, particularly in graph theory. In this paper, we present the design and core principles of \emph{TxGraffiti}, including its roots in the original \emph{Graffiti} program, which pioneered the automation of mathematical conjecturing. We describe the data collection process, the generation of plausible conjectures, and methods such as the \emph{Dalmatian} heuristic for filtering out redundant or transitive conjectures. Additionally, we highlight its contributions to the mathematical literature and introduce a new web-based interface that allows users to explore conjectures interactively. While we focus on graph theory, the techniques demonstrated extend to other areas of mathematics.
Quantum Algorithms for Drone Mission Planning
Davies, Ethan, Kalidindi, Pranav
Mission planning often involves optimising the use of ISR (Intelligence, Surveillance and Reconnaissance) assets in order to achieve a set of mission objectives within allowed parameters subject to constraints. The missions of interest here, involve routing multiple UAVs visiting multiple targets, utilising sensors to capture data relating to each target. Finding such solutions is often an NP-Hard problem and cannot be solved efficiently on classical computers. Furthermore, during the mission new constraints and objectives may arise, requiring a new solution to be computed within a short time period. To achieve this we investigate near term quantum algorithms that have the potential to offer speed-ups against current classical methods. We demonstrate how a large family of these problems can be formulated as a Mixed Integer Linear Program (MILP) and then converted to a Quadratic Unconstrained Binary Optimisation (QUBO). The formulation provided is versatile and can be adapted for many different constraints with clear qubit scaling provided. We discuss the results of solving the QUBO formulation using commercial quantum annealers and compare the solutions to current edge classical solvers. We also analyse the results from solving the QUBO using Quantum Approximate Optimisation Algorithms (QAOA) and discuss their results. Finally, we also provide efficient methods to encode to the problem into the Variational Quantum Eigensolver (VQE) formalism, where we have tailored the ansatz to the problem making efficient use of the qubits available.
Democratizing Signal Processing and Machine Learning: Math Learning Equity for Elementary and Middle School Students
Vaswani, Namrata, Selim, Mohamed Y., Gibert, Renee Serrell
Signal Processing (SP) and Machine Learning (ML) rely on good math and coding knowledge, in particular, linear algebra, probability, and complex numbers. A good grasp of these relies on scalar algebra learned in middle school. The ability to understand and use scalar algebra well, in turn, relies on a good foundation in basic arithmetic. Because of various systemic barriers, many students are not able to build a strong foundation in arithmetic in elementary school. This leads them to struggle with algebra and everything after that. Since math learning is cumulative, the gap between those without a strong early foundation and everyone else keeps increasing over the school years and becomes difficult to fill in college. In this article we discuss how SP faculty and graduate students can play an important role in starting, and participating in, university-run (or other) out-of-school math support programs to supplement students' learning. Two example programs run by the authors (CyMath at ISU and Ab7G at Purdue) are briefly described. The second goal of this article is to use our perspective as SP, and engineering, educators who have seen the long-term impact of elementary school math teaching policies, to provide some simple almost zero cost suggestions that elementary schools could adopt to improve math learning: (i) more math practice in school, (ii) send small amounts of homework (individual work is critical in math), and (iii) parent awareness (math resources, need for early math foundation, clear in-school test information and sharing of feedback from the tests). In summary, good early math support (in school and through out-of-school programs) can help make SP and ML more accessible.
Non-asymptotic convergence analysis of the stochastic gradient Hamiltonian Monte Carlo algorithm with discontinuous stochastic gradient with applications to training of ReLU neural networks
Liang, Luxu, Neufeld, Ariel, Zhang, Ying
In this paper, we provide a non-asymptotic analysis of the convergence of the stochastic gradient Hamiltonian Monte Carlo (SGHMC) algorithm to a target measure in Wasserstein-1 and Wasserstein-2 distance. Crucially, compared to the existing literature on SGHMC, we allow its stochastic gradient to be discontinuous. This allows us to provide explicit upper bounds, which can be controlled to be arbitrarily small, for the expected excess risk of non-convex stochastic optimization problems with discontinuous stochastic gradients, including, among others, the training of neural networks with ReLU activation function. To illustrate the applicability of our main results, we consider numerical experiments on quantile estimation and on several optimization problems involving ReLU neural networks relevant in finance and artificial intelligence.
Testing Dependency of Weighted Random Graphs
Oren, Mor, Paslev, Vered, Huleihel, Wasim
Consider the following decision problem. We observe two weighted random graphs that are either generated independently at random or are edge-dependent due to some latent vertex correspondence or permutation. For this basic problem, two natural questions arise: the detection problem, which concerns whether the graphs exhibit dependence, and the recovery problem, which concerns identifying the latent correspondence between vertices. Here, we address the former question, specifically, we aim to understand under what conditions, in terms of the number of vertices and the generative distributions, one can distinguish between the two hypotheses and detect whether these graphs are dependent or not, say, with high probability? The fundamental question above was first introduced and analyzed in [1], where for Gaussian-weighted and dense Erdős-Rényi random graphs on n vertices, sharp informationtheoretic thresholds were developed, revealing the exact barrier at which the asymptotic optimal detection error probability undergoes a phase transition from zero to one as n approaches infinity. For sparse Erdős-Rényi random graphs this threshold was initially determined within a constant factor in the same paper.
Rigid Body Path Planning using Mixed-Integer Linear Programming
Navigating rigid body objects through crowded environments can be challenging, especially when narrow passages are presented. Existing sampling-based planners and optimization-based methods like mixed integer linear programming (MILP) formulations, suffer from limited scalability with respect to either the size of the workspace or the number of obstacles. In order to address the scalability issue, we propose a three-stage algorithm that first generates a graph of convex polytopes in the workspace free of collision, then poses a large set of small MILPs to generate viable paths between polytopes, and finally queries a pair of start and end configurations for a feasible path online. The graph of convex polytopes serves as a decomposition of the free workspace and the number of decision variables in each MILP is limited by restricting the subproblem within two or three free polytopes rather than the entire free region. Our simulation results demonstrate shorter online computation time compared to baseline methods and scales better with the size of the environment and tunnel width than sampling-based planners in both 2D and 3D environments.
Balancing Optimality and Diversity: Human-Centered Decision Making through Generative Curation
Li, Michael Lingzhi, Zhu, Shixiang
The surge in data availability has inundated decision-makers with an overwhelming array of choices. While existing approaches focus on optimizing decisions based on quantifiable metrics, practical decision-making often requires balancing measurable quantitative criteria with unmeasurable qualitative factors embedded in the broader context. In such cases, algorithms can generate high-quality recommendations, but the final decision rests with the human, who must weigh both dimensions. We define the process of selecting the optimal set of algorithmic recommendations in this context as human-centered decision making. To address this challenge, we introduce a novel framework called generative curation, which optimizes the true desirability of decision options by integrating both quantitative and qualitative aspects. Our framework uses a Gaussian process to model unknown qualitative factors and derives a diversity metric that balances quantitative optimality with qualitative diversity. This trade-off enables the generation of a manageable subset of diverse, near-optimal actions that are robust to unknown qualitative preferences. To operationalize this framework, we propose two implementation approaches: a generative neural network architecture that produces a distribution $\pi$ to efficiently sample a diverse set of near-optimal actions, and a sequential optimization method to iteratively generates solutions that can be easily incorporated into complex optimization formulations. We validate our approach with extensive datasets, demonstrating its effectiveness in enhancing decision-making processes across a range of complex environments, with significant implications for policy and management.
HJ-sampler: A Bayesian sampler for inverse problems of a stochastic process by leveraging Hamilton-Jacobi PDEs and score-based generative models
Meng, Tingwei, Zou, Zongren, Darbon, Jérôme, Karniadakis, George Em
The interplay between stochastic processes and optimal control has been extensively explored in the literature. With the recent surge in the use of diffusion models, stochastic processes have increasingly been applied to sample generation. This paper builds on the log transform, known as the Cole-Hopf transform in Brownian motion contexts, and extends it within a more abstract framework that includes a linear operator. Within this framework, we found that the well-known relationship between the Cole-Hopf transform and optimal transport is a particular instance where the linear operator acts as the infinitesimal generator of a stochastic process. We also introduce a novel scenario where the linear operator is the adjoint of the generator, linking to Bayesian inference under specific initial and terminal conditions. Leveraging this theoretical foundation, we develop a new algorithm, named the HJ-sampler, for Bayesian inference for the inverse problem of a stochastic differential equation with given terminal observations. The HJ-sampler involves two stages: (1) solving the viscous Hamilton-Jacobi partial differential equations, and (2) sampling from the associated stochastic optimal control problem. Our proposed algorithm naturally allows for flexibility in selecting the numerical solver for viscous HJ PDEs. We introduce two variants of the solver: the Riccati-HJ-sampler, based on the Riccati method, and the SGM-HJ-sampler, which utilizes diffusion models. We demonstrate the effectiveness and flexibility of the proposed methods by applying them to solve Bayesian inverse problems involving various stochastic processes and prior distributions, including applications that address model misspecifications and quantifying model uncertainty.