Mathematical & Statistical Methods
From Target Tracking to Targeting Track -- Part III: Stochastic Process Modeling and Online Learning
Li, Tiancheng, Wang, Jingyuan, Li, Guchong, Gao, Dengwei
--This is the third part of a series of studies that model the target trajectory, which describes the target state evolution over continuous time, as a sample path of a stochastic process (SP). By adopting a deterministic-stochastic decomposition framework, we decompose the learning of the trajectory SP into two sequential stages: the first fits the deterministic trend of the trajectory using a curve function of time, while the second estimates the residual stochastic component through parametric learning of either a Gaussian process (GP) or Student's-t process (StP). This leads to a Markov-free data-driven tracking approach that produces the continuous-time trajectory with minimal prior knowledge of the target dynamics. It does not only take advantage of the smooth trend of the target but also makes use of the long-term temporal correlation of both the data noise and the model fitting error . Simulations in four maneuvering target tracking scenarios have demonstrated its effectiveness and superiority in comparison with existing approaches. ARGET tracking that involves the online estimation of the trajectory of a target has been a long-standing research topic and plays a significant role in aerospace, traffic, defense, robotics, etc. [1] In essence, target tracking is more about estimating the continuous-time trajectory of the target rather than merely a finite number of point states. The continuous-time trajectory enables the acquisition of a point estimate of the state at any time in the trajectory period. However, the converse is not true. X, defined in spatio-temporal space, where X denotes the state space. Manuscript created Feb 2025; This work was supported in part by the National Natural Science Foundation of China under Grants 62422117 and 62201316 and in part by the Fundamental Research Funds for the Central Universities.
Apollo-MILP: An Alternating Prediction-Correction Neural Solving Framework for Mixed-Integer Linear Programming
Liu, Haoyang, Wang, Jie, Geng, Zijie, Li, Xijun, Zong, Yuxuan, Zhu, Fangzhou, Hao, Jianye, Wu, Feng
Leveraging machine learning (ML) to predict an initial solution for mixed-integer linear programming (MILP) has gained considerable popularity in recent years. These methods predict a solution and fix a subset of variables to reduce the problem dimension. Then, they solve the reduced problem to obtain the final solutions. However, directly fixing variable values can lead to low-quality solutions or even infeasible reduced problems if the predicted solution is not accurate enough. To address this challenge, we propose an A lternating p redictio n-correction neural sol ving framewo rk (Apollo-MILP) that can identify and select accurate and reliable predicted values to fix. In each iteration, Apollo-MILP conducts a prediction step for the unfixed variables, followed by a correction step to obtain an improved solution (called reference solution) through a trust-region search. By incorporating the predicted and reference solutions, we introduce a novel U ncertainty-based E rror upper BO und (UEBO) to evaluate the uncertainty of the predicted values and fix those with high confidence. A notable feature of Apollo-MILP is the superior ability for problem reduction while preserving optimality, leading to high-quality final solutions. Experiments on commonly used benchmarks demonstrate that our proposed Apollo-MILP significantly outperforms other ML-based approaches in terms of solution quality, achieving over a 50% reduction in the solution gap. Mixed-integer linear programming (MILP) is one of the most fundamental models for combinatorial optimization with broad applications in operations research (Bixby et al., 2004), engineering (Ma et al., 2019), and daily scheduling or planning (Li et al., 2024b). However, solving large-size MILPs remains time-consuming and computationally expensive, as many are NP-hard and have exponential expansion of search spaces as instance sizes grow. To mitigate this challenge, researchers have explored a wide suite of machine learning (ML) methods (Gasse et al., 2022). In practice, MILP instances from the same scenario often share similar patterns and structures, which ML models can capture to achieve improved performance (Bengio et al., 2021). Recently, extensive research has focused on using ML models to predict solutions for MILPs. Notable approaches include Neural Diving (ND) (Nair et al., 2020; Y oon, 2021; Paulus & Krause, 2023) and Predict-and-Search (PS) (Han et al., 2023; Huang et al., 2024), as illustrated in Figure 1. Given a MILP instance, ND and PS begin by employing an ML model to predict an initial solution. ND with SelectiveNet (Nair et al., 2020) assigns fixed values to a subset of variables based on the prediction, thereby constructing a reduced MILP problem with a reduced dimensionality of decision variables. Then, ND solves the reduced problem to obtain the final solutions.
The Robustness of Structural Features in Species Interaction Networks
Fard, Sanaz Hasanzadeh, Dolson, Emily
Species interaction networks are a powerful tool for describing ecological communities; they typically contain nodes representing species, and edges representing interactions between those species. For the purposes of drawing abstract inferences about groups of similar networks, ecologists often use graph topology metrics to summarize structural features. However, gathering the data that underlies these networks is challenging, which can lead to some interactions being missed. Thus, it is important to understand how much different structural metrics are affected by missing data. To address this question, we analyzed a database of 148 real-world bipartite networks representing four different types of species interactions (pollination, host-parasite, plant-ant, and seed-dispersal). For each network, we measured six different topological properties: number of connected components, variance in node betweenness, variance in node PageRank, largest Eigenvalue, the number of non-zero Eigenvalues, and community detection as determined by four different algorithms. We then tested how these properties change as additional edges -- representing data that may have been missed -- are added to the networks. We found substantial variation in how robust different properties were to the missing data. For example, the Clauset-Newman-Moore and Louvain community detection algorithms showed much more gradual change as edges were added than the label propagation and Girvan-Newman algorithms did, suggesting that the former are more robust. Robustness also varied for some metrics based on interaction type. These results provide a foundation for selecting network properties to use when analyzing messy ecological network data.
From Target Tracking to Targeting Track -- Part II: Regularized Polynomial Trajectory Optimization
Li, Tiancheng, Song, Yan, Li, Guchong, Li, Hao
Target tracking entails the estimation of the evolution of the target state over time, namely the target trajectory. Different from the classical state space model, our series of studies, including this paper, model the collection of the target state as a stochastic process (SP) that is further decomposed into a deterministic part which represents the trend of the trajectory and a residual SP representing the residual fitting error. Subsequently, the tracking problem is formulated as a learning problem regarding the trajectory SP for which a key part is to estimate a trajectory FoT (T-FoT) best fitting the measurements in time series. For this purpose, we consider the polynomial T-FoT and address the regularized polynomial T-FoT optimization employing two distinct regularization strategies seeking trade-off between the accuracy and simplicity. One limits the order of the polynomial and then the best choice is determined by grid searching in a narrow, bounded range while the other adopts $\ell_0$ norm regularization for which the hybrid Newton solver is employed. Simulation results obtained in both single and multiple maneuvering target scenarios demonstrate the effectiveness of our approaches.
Fundamental Bias in Inverting Random Sampling Matrices with Application to Sub-sampled Newton
Niu, Chengmei, Liao, Zhenyu, Ling, Zenan, Mahoney, Michael W.
A substantial body of work in machine learning (ML) and randomized numerical linear algebra (RandNLA) has exploited various sorts of random sketching methodologies, including random sampling and random projection, with much of the analysis using Johnson--Lindenstrauss and subspace embedding techniques. Recent studies have identified the issue of inversion bias -- the phenomenon that inverses of random sketches are not unbiased, despite the unbiasedness of the sketches themselves. This bias presents challenges for the use of random sketches in various ML pipelines, such as fast stochastic optimization, scalable statistical estimators, and distributed optimization. In the context of random projection, the inversion bias can be easily corrected for dense Gaussian projections (which are, however, too expensive for many applications). Recent work has shown how the inversion bias can be corrected for sparse sub-gaussian projections. In this paper, we show how the inversion bias can be corrected for random sampling methods, both uniform and non-uniform leverage-based, as well as for structured random projections, including those based on the Hadamard transform. Using these results, we establish problem-independent local convergence rates for sub-sampled Newton methods.
Towards Environment-Sensitive Molecular Inference via Mixed Integer Linear Programming
Zhu, Jianshen, Takekida, Mao, Azam, Naveed Ahmed, Haraguchi, Kazuya, Zhao, Liang, Akutsu, Tatsuya
Traditional QSAR/QSPR and inverse QSAR/QSPR methods often assume that chemical properties are dictated by single molecules, overlooking the influence of molecular interactions and environmental factors. In this paper, we introduce a novel QSAR/QSPR framework that can capture the combined effects of multiple molecules (e.g., small molecules or polymers) and experimental conditions on property values. We design a feature function to integrate the information of multiple molecules and the environment. Specifically, for the property Flory-Huggins $\chi$-parameter, which characterizes the thermodynamic properties between the solute and the solvent, and varies in temperatures, we demonstrate through computational experimental results that our approach can achieve a competitively high learning performance compared to existing works on predicting $\chi$-parameter values, while inferring the solute polymers with up to 50 non-hydrogen atoms in their monomer forms in a relatively short time. A comparison study with the simulation software J-OCTA demonstrates that the polymers inferred by our methods are of high quality.
A Regularized Newton Method for Nonconvex Optimization with Global and Local Complexity Guarantees
Zhou, Yuhao, Xu, Jintao, Bao, Chenglong, Ding, Chao, Zhu, Jun
We consider the problem of finding an $\epsilon$-stationary point of a nonconvex function with a Lipschitz continuous Hessian and propose a quadratic regularized Newton method incorporating a new class of regularizers constructed from the current and previous gradients. The method leverages a recently developed linear conjugate gradient approach with a negative curvature monitor to solve the regularized Newton equation. Notably, our algorithm is adaptive, requiring no prior knowledge of the Lipschitz constant of the Hessian, and achieves a global complexity of $O(\epsilon^{-\frac{3}{2}}) + \tilde O(1)$ in terms of the second-order oracle calls, and $\tilde O(\epsilon^{-\frac{7}{4}})$ for Hessian-vector products, respectively. Moreover, when the iterates converge to a point where the Hessian is positive definite, the method exhibits quadratic local convergence. Preliminary numerical results illustrate the competitiveness of our algorithm.
Algorithmic contiguity from low-degree conjecture and applications in correlated random graphs
In this paper, assuming a natural strengthening of the low-degree conjecture, we provide evidence of computational hardness for two problems: (1) the (partial) matching recovery problem in the sparse correlated Erd\H{o}s-R\'enyi graphs $\mathcal G(n,q;\rho)$ when the edge-density $q=n^{-1+o(1)}$ and the correlation $\rho<\sqrt{\alpha}$ lies below the Otter's threshold, solving a remaining problem in \cite{DDL23+}; (2) the detection problem between the correlated sparse stochastic block model $\mathcal S(n,\tfrac{\lambda}{n};k,\epsilon;s)$ and a pair of independent stochastic block models $\mathcal S(n,\tfrac{\lambda s}{n};k,\epsilon)$ when $\epsilon^2 \lambda s<1$ lies below the Kesten-Stigum (KS) threshold and $s<\sqrt{\alpha}$ lies below the Otter's threshold, solving a remaining problem in \cite{CDGL24+}. One of the main ingredient in our proof is to derive certain forms of \emph{algorithmic contiguity} between two probability measures based on bounds on their low-degree advantage. To be more precise, consider the high-dimensional hypothesis testing problem between two probability measures $\mathbb{P}$ and $\mathbb{Q}$ based on the sample $\mathsf Y$. We show that if the low-degree advantage $\mathsf{Adv}_{\leq D} \big( \frac{\mathrm{d}\mathbb{P}}{\mathrm{d}\mathbb{Q}} \big)=O(1)$, then (assuming the low-degree conjecture) there is no efficient algorithm $\mathcal A$ such that $\mathbb{Q}(\mathcal A(\mathsf Y)=0)=1-o(1)$ and $\mathbb{P}(\mathcal A(\mathsf Y)=1)=\Omega(1)$. This framework provides a useful tool for performing reductions between different inference tasks.
Flow Matching: Markov Kernels, Stochastic Processes and Transport Plans
Wald, Christian, Steidl, Gabriele
Among generative neural models, flow matching techniques stand out for their simple applicability and good scaling properties. Here, velocity fields of curves connecting a simple latent and a target distribution are learned. Then the corresponding ordinary differential equation can be used to sample from a target distribution, starting in samples from the latent one. This paper reviews from a mathematical point of view different techniques to learn the velocity fields of absolutely continuous curves in the Wasserstein geometry. We show how the velocity fields can be characterized and learned via i) transport plans (couplings) between latent and target distributions, ii) Markov kernels and iii) stochastic processes, where the latter two include the coupling approach, but are in general broader. Besides this main goal, we show how flow matching can be used for solving Bayesian inverse problems, where the definition of conditional Wasserstein distances plays a central role. Finally, we briefly address continuous normalizing flows and score matching techniques, which approach the learning of velocity fields of curves from other directions.
Learning from Rational Behavior: Predicting Solutions to Unknown Linear Programs
We define and study the problem of predicting the solution to a linear program (LP) given only partial information about its objective and constraints. This generalizes the problem of learning to predict the purchasing behavior of a rational agent who has an unknown objective function, that has been studied under the name "Learning from Revealed Preferences". We give mistake bound learning algorithms in two settings: in the first, the objective of the LP is known to the learner but there is an arbitrary, fixed set of constraints which are unknown. Each example is defined by an additional known constraint and the goal of the learner is to predict the optimal solution of the LP given the union of the known and unknown constraints. This models the problem of predicting the behavior of a rational agent whose goals are known, but whose resources are unknown.