Goto

Collaborating Authors

 Mathematical & Statistical Methods


Learning Randomized Reductions and Program Properties

arXiv.org Artificial Intelligence

The correctness of computations remains a significant challenge in computer science, with traditional approaches relying on automated testing or formal verification. Self-testing/correcting programs introduce an alternative paradigm, allowing a program to verify and correct its own outputs via randomized reductions, a concept that previously required manual derivation. In this paper, we present Bitween, a method and tool for automated learning of randomized (self)-reductions and program properties in numerical programs. Bitween combines symbolic analysis and machine learning, with a surprising finding: polynomial-time linear regression, a basic optimization method, is not only sufficient but also highly effective for deriving complex randomized self-reductions and program invariants, often outperforming sophisticated mixed-integer linear programming solvers. We establish a theoretical framework for learning these reductions and introduce RSR-Bench, a benchmark suite for evaluating Bitween's capabilities on scientific and machine learning functions. Our empirical results show that Bitween surpasses state-of-the-art tools in scalability, stability, and sample efficiency when evaluated on nonlinear invariant benchmarks like NLA-DigBench. Bitween is open-source as a Python package and accessible via a web interface that supports C language programs.


Age Optimal Sampling for Unreliable Channels under Unknown Channel Statistics

arXiv.org Artificial Intelligence

In this paper, we study a system in which a sensor forwards status updates to a receiver through an error-prone channel, while the receiver sends the transmission results back to the sensor via a reliable channel. Both channels are subject to random delays. To evaluate the timeliness of the status information at the receiver, we use the Age of Information (AoI) metric. The objective is to design a sampling policy that minimizes the expected time-average AoI, even when the channel statistics (e.g., delay distributions) are unknown. We first review the threshold structure of the optimal offline policy under known channel statistics and then reformulate the design of the online algorithm as a stochastic approximation problem. We propose a Robbins-Monro algorithm to solve this problem and demonstrate that the optimal threshold can be approximated almost surely. Moreover, we prove that the cumulative AoI regret of the online algorithm increases with rate $\mathcal{O}(\ln K)$, where $K$ is the number of successful transmissions. In addition, our algorithm is shown to be minimax order optimal, in the sense that for any online learning algorithm, the cumulative AoI regret up to the $K$-th successful transmissions grows with the rate at least $\Omega(\ln K)$ in the worst case delay distribution. Finally, we improve the stability of the proposed online learning algorithm through a momentum-based stochastic gradient descent algorithm. Simulation results validate the performance of our proposed algorithm.


Distributionally Robust Instrumental Variables Estimation

arXiv.org Machine Learning

Instrumental variables (IV) estimation, also known as IV regression, is a fundamental method in econometrics and statistics to infer causal relationships in observational data with unobserved confounding. It leverages access to additional variables (instruments) that affect the outcome exogenously and exclusively through the endogenous regressor to yield consistent causal estimates, even when the standard ordinary least squares (OLS) estimator is biased by unobserved confounding (Imbens and Angrist, 1994; Angrist et al., 1996; Imbens and Rubin, 2015). Over the years, IV estimation has become an indispensable tool for causal inference in empirical works in economics (Card and Krueger, 1994), as well as in the study of genetic and epidemiological data (Davey Smith and Ebrahim, 2003). Despite the widespread use of IV in empirical and applied works, it has important limitations and challenges, such as invalid instruments (Sargan, 1958; Murray, 2006), weak instruments (Staiger and Stock, 1997), non-compliance (Imbens and Angrist, 1994), and heteroskedasticity, especially in settings with weak instruments or highly leveraged datasets (Andrews et al., 2019; Young, 2022). These issues could significantly impact the validity and quality of estimation and inference using instrumental variables (Jiang, 2017). Many works have since been devoted to assessing and addressing these issues, such as statistical tests (Hansen, 1982; Stock and Yogo, 2002), sensitivity analysis (Rosenbaum and Rubin, 1983; Bonhomme and Weidner, 2022), and additional assumptions or structures on the data generating process (Kolesár et al., 2015; Kang et al., 2016; Guo et al., 2018b). Recently, an emerging line of works have highlighted interesting connections between causality and the concepts of invariance and robustness (Peters et al., 2016; Meinshausen, 2018; Rothenhäusler et al., 2021; Bühlmann, 2020; Jakobsen and Peters, 2022; Fan et al., 2024). Their guiding philosophy is that causal properties can be viewed as robustness against changes across heterogeneous environments, represented by a set P of data distributions.


Effective and Efficient Representation Learning for Flight Trajectories

arXiv.org Artificial Intelligence

Flight trajectory data plays a vital role in the traffic management community, especially for downstream tasks such as trajectory prediction, flight recognition, and anomaly detection. Existing works often utilize handcrafted features and design models for different tasks individually, which heavily rely on domain expertise and are hard to extend. We argue that different flight analysis tasks share the same useful features of the trajectory. Jointly learning a unified representation for flight trajectories could be beneficial for improving the performance of various tasks. However, flight trajectory representation learning (TRL) faces two primary challenges, \ie unbalanced behavior density and 3D spatial continuity, which disable recent general TRL methods. In this paper, we propose Flight2Vec , a flight-specific representation learning method to address these challenges. Specifically, a behavior-adaptive patching mechanism is used to inspire the learned representation to pay more attention to behavior-dense segments. Moreover, we introduce a motion trend learning technique that guides the model to memorize not only the precise locations, but also the motion trend to generate better representations. Extensive experimental results demonstrate that Flight2Vec significantly improves performance in downstream tasks such as flight trajectory prediction, flight recognition, and anomaly detection.


Learning Massive-scale Partial Correlation Networks in Clinical Multi-omics Studies with HP-ACCORD

arXiv.org Machine Learning

Graphical model estimation from modern multi-omics data requires a balance between statistical estimation performance and computational scalability. We introduce a novel pseudolikelihood-based graphical model framework that reparameterizes the target precision matrix while preserving sparsity pattern and estimates it by minimizing an $\ell_1$-penalized empirical risk based on a new loss function. The proposed estimator maintains estimation and selection consistency in various metrics under high-dimensional assumptions. The associated optimization problem allows for a provably fast computation algorithm using a novel operator-splitting approach and communication-avoiding distributed matrix multiplication. A high-performance computing implementation of our framework was tested in simulated data with up to one million variables demonstrating complex dependency structures akin to biological networks. Leveraging this scalability, we estimated partial correlation network from a dual-omic liver cancer data set. The co-expression network estimated from the ultrahigh-dimensional data showed superior specificity in prioritizing key transcription factors and co-activators by excluding the impact of epigenomic regulation, demonstrating the value of computational scalability in multi-omic data analysis. %derived from the gene expression data.


Robust random graph matching in dense graphs via vector approximate message passing

arXiv.org Machine Learning

In this paper, we focus on the matching recovery problem between a pair of correlated Gaussian Wigner matrices with a latent vertex correspondence. We are particularly interested in a robust version of this problem such that our observation is a perturbed input $(A+E,B+F)$ where $(A,B)$ is a pair of correlated Gaussian Wigner matrices and $E,F$ are adversarially chosen matrices supported on an unknown $\epsilon n * \epsilon n$ principle minor of $A,B$, respectively. We propose a vector-approximate message passing (vector-AMP) algorithm that succeeds in polynomial time as long as the correlation $\rho$ between $(A,B)$ is a non-vanishing constant and $\epsilon = o\big( \tfrac{1}{(\log n)^{20}} \big)$. The main methodological inputs for our result are the iterative random graph matching algorithm proposed in \cite{DL22+, DL23+} and the spectral cleaning procedure proposed in \cite{IS24+}. To the best of our knowledge, our algorithm is the first efficient random graph matching type algorithm that is robust under any adversarial perturbations of $n^{1-o(1)}$ size.


Formal Mathematical Reasoning: A New Frontier in AI

arXiv.org Artificial Intelligence

AI for Mathematics (AI4Math) is not only intriguing intellectually but also crucial for AI-driven discovery in science, engineering, and beyond. Extensive efforts on AI4Math have mirrored techniques in NLP, in particular, training large language models on carefully curated math datasets in text form. As a complementary yet less explored avenue, formal mathematical reasoning is grounded in formal systems such as proof assistants, which can verify the correctness of reasoning and provide automatic feedback. In this position paper, we advocate for formal mathematical reasoning and argue that it is indispensable for advancing AI4Math to the next level. In recent years, we have seen steady progress in using AI to perform formal reasoning, including core tasks such as theorem proving and autoformalization, as well as emerging applications such as verifiable generation of code and hardware designs. However, significant challenges remain to be solved for AI to truly master mathematics and achieve broader impact. We summarize existing progress, discuss open challenges, and envision critical milestones to measure future success. At this inflection point for formal mathematical reasoning, we call on the research community to come together to drive transformative advancements in this field.


Time-Reversible Bridges of Data with Machine Learning

arXiv.org Machine Learning

The analysis of dynamical systems is a fundamental tool in the natural sciences and engineering. It is used to understand the evolution of systems as large as entire galaxies and as small as individual molecules. With predefined conditions on the evolution of dy-namical systems, the underlying differential equations have to fulfill specific constraints in time and space. This class of problems is known as boundary value problems. This thesis presents novel approaches to learn time-reversible deterministic and stochastic dynamics constrained by initial and final conditions. The dynamics are inferred by machine learning algorithms from observed data, which is in contrast to the traditional approach of solving differential equations by numerical integration. The work in this thesis examines a set of problems of increasing difficulty each of which is concerned with learning a different aspect of the dynamics. Initially, we consider learning deterministic dynamics from ground truth solutions which are constrained by deterministic boundary conditions. Secondly, we study a boundary value problem in discrete state spaces, where the forward dynamics follow a stochastic jump process and the boundary conditions are discrete probability distributions. In particular, the stochastic dynamics of a specific jump process, the Ehrenfest process, is considered and the reverse time dynamics are inferred with machine learning. Finally, we investigate the problem of inferring the dynamics of a continuous-time stochastic process between two probability distributions without any reference information. Here, we propose a novel criterion to learn time-reversible dynamics of two stochastic processes to solve the Schr\"odinger Bridge Problem.


Multi-task Representation Learning for Mixed Integer Linear Programming

arXiv.org Artificial Intelligence

Mixed Integer Linear Programs (MILPs) are highly flexible and powerful tools for modeling and solving complex real-world combinatorial optimization problems. Recently, machine learning (ML)-guided approaches have demonstrated significant potential in improving MILPsolving efficiency. However, these methods typically rely on separate offline data collection and training processes, which limits their scalability and adaptability. This paper introduces the first multi-task learning framework for ML-guided MILP solving. The proposed framework provides MILP embeddings helpful in guiding MILP solving across solvers (e.g., Gurobi and SCIP) and across tasks (e.g., Branching and Solver configuration). Through extensive experiments on three widely used MILP benchmarks, we demonstrate that our multi-task learning model performs similarly to specialized models within the same distribution. Moreover, it significantly outperforms them in generalization across problem sizes and tasks. Keywords: Deep Learning Mixed Integer Linear Programming Multitask Learning Graph Neural Networks.


A Comprehensive Framework for Analyzing the Convergence of Adam: Bridging the Gap with SGD

arXiv.org Artificial Intelligence

Adaptive Moment Estimation (Adam) is a cornerstone optimization algorithm in deep learning, widely recognized for its flexibility with adaptive learning rates and efficiency in handling large-scale data. However, despite its practical success, the theoretical understanding of Adam's convergence has been constrained by stringent assumptions, such as almost surely bounded stochastic gradients or uniformly bounded gradients, which are more restrictive than those typically required for analyzing stochastic gradient descent (SGD). In this paper, we introduce a novel and comprehensive framework for analyzing the convergence properties of Adam. This framework offers a versatile approach to establishing Adam's convergence. Specifically, we prove that Adam achieves asymptotic (last iterate sense) convergence in both the almost sure sense and the \(L_1\) sense under the relaxed assumptions typically used for SGD, namely \(L\)-smoothness and the ABC inequality. Meanwhile, under the same assumptions, we show that Adam attains non-asymptotic sample complexity bounds similar to those of SGD.