definition 2
Algorithmic Contiguity from Low-Degree Heuristic II: Predicting Detection-Recovery Gaps
The low-degree polynomial framework has emerged as a powerful tool for providing evidence of statistical-computational gaps in high-dimensional inference. For detection problems, the standard approach bounds the low-degree advantage through an explicit orthonormal basis. However, this method does not extend naturally to estimation tasks, and thus fails to capture the \emph{detection-recovery gap phenomenon} that arises in many high-dimensional problems. Although several important advances have been made to overcome this limitation \cite{SW22, SW25, CGGV25+}, the existing approaches often rely on delicate, model-specific combinatorial arguments. In this work, we develop a general approach for obtaining \emph{conditional computational lower bounds} for recovery problems from mild bounds on low-degree testing advantage. Our method combines the notion of algorithmic contiguity in \cite{Li25} with a cross-validation reduction in \cite{DHSS25} that converts successful recovery into a hypothesis test with lopsided success probabilities. In contrast to prior unconditional lower bounds, our argument is conceptually simple, flexible, and largely model-independent. We apply this framework to several canonical inference problems, including planted submatrix, planted dense subgraph, stochastic block model, multi-frequency angular synchronization, orthogonal group synchronization, and multi-layer stochastic block model. In the first three settings, our method recovers existing low-degree lower bounds for recovery in \cite{SW22, SW25} via a substantially simpler argument. In the latter three, it gives new evidence for conjectured computational thresholds including the persistence of detection-recovery gaps. Together, these results suggest that mild control of low-degree advantage is often sufficient to explain computational barriers for recovery in high-dimensional statistical models.
- North America > United States (0.28)
- Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
- Asia > Middle East > Jordan (0.04)
- (3 more...)
Towards Verified and Targeted Explanations through Formal Methods
Wang, Hanchen David, Lopez, Diego Manzanas, Robinette, Preston K., Oguz, Ipek, Johnson, Taylor T., Ma, Meiyi
As deep neural networks are deployed in safety-critical domains such as autonomous driving and medical diagnosis, stakeholders need explanations that are interpretable but also trustworthy with formal guarantees. Existing XAI methods fall short: heuristic attribution techniques (e.g., LIME, Integrated Gradients) highlight influential features but offer no mathematical guarantees about decision boundaries, while formal methods verify robustness yet remain untargeted, analyzing the nearest boundary regardless of whether it represents a critical risk. In safety-critical systems, not all misclassifications carry equal consequences; confusing a "Stop" sign for a "60 kph" sign is far more dangerous than confusing it with a "No Passing" sign. We introduce ViTaX (Verified and Targeted Explanations), a formal XAI framework that generates targeted semifactual explanations with mathematical guarantees. For a given input (class y) and a user-specified critical alternative (class t), ViTaX: (1) identifies the minimal feature subset most sensitive to the y->t transition, and (2) applies formal reachability analysis to guarantee that perturbing these features by epsilon cannot flip the classification to t. We formalize this through Targeted epsilon-Robustness, certifying whether a feature subset remains robust under perturbation toward a specific target class. ViTaX is the first method to provide formally guaranteed explanations of a model's resilience against user-identified alternatives. Evaluations on MNIST, GTSRB, EMNIST, and TaxiNet demonstrate over 30% fidelity improvement with minimal explanation cardinality.
- North America > United States > Tennessee > Davidson County > Nashville (0.05)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Europe > Portugal > Porto > Porto (0.04)
- (3 more...)
Robust Low-Rank Tensor Completion based on M-product with Weighted Correlated Total Variation and Sparse Regularization
Karmakar, Biswarup, Behera, Ratikanta
The robust low-rank tensor completion problem addresses the challenge of recovering corrupted high-dimensional tensor data with missing entries, outliers, and sparse noise commonly found in real-world applications. Existing methodologies have encountered fundamental limitations due to their reliance on uniform regularization schemes, particularly the tensor nuclear norm and $\ell_1$ norm regularization approaches, which indiscriminately apply equal shrinkage to all singular values and sparse components, thereby compromising the preservation of critical tensor structures. The proposed tensor weighted correlated total variation (TWCTV) regularizer addresses these shortcomings through an $M$-product framework that combines a weighted Schatten-$p$ norm on gradient tensors for low-rankness with smoothness enforcement and weighted sparse components for noise suppression. The proposed weighting scheme adaptively reduces the thresholding level to preserve both dominant singular values and sparse components, thus improving the reconstruction of critical structural elements and nuanced details in the recovered signal. Through a systematic algorithmic approach, we introduce an enhanced alternating direction method of multipliers (ADMM) that offers both computational efficiency and theoretical substantiation, with convergence properties comprehensively analyzed within the $M$-product framework.Comprehensive numerical evaluations across image completion, denoising, and background subtraction tasks validate the superior performance of this approach relative to established benchmark methods.
- North America > United States (0.14)
- Asia > India > Karnataka > Bengaluru (0.04)
- Africa > Senegal > Kolda Region > Kolda (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- Information Technology > Data Science > Data Quality > Data Transformation (0.93)
- Information Technology > Data Science > Data Mining (0.67)
Demographic Parity Tails for Regression
Le, Naht Sinh, Denis, Christophe, Hebiri, Mohamed
Demographic parity (DP) is a widely studied fairness criterion in regression, enforcing independence between the predictions and sensitive attributes. However, constraining the entire distribution can degrade predictive accuracy and may be unnecessary for many applications, where fairness concerns are localized to specific regions of the distribution. To overcome this issue, we propose a new framework for regression under DP that focuses on the tails of target distribution across sensitive groups. Our methodology builds on optimal transport theory. By enforcing fairness constraints only over targeted regions of the distribution, our approach enables more nuanced and context-sensitive interventions. Leveraging recent advances, we develop an interpretable and flexible algorithm that leverages the geometric structure of optimal transport. We provide theoretical guarantees, including risk bounds and fairness properties, and validate the method through experiments in regression settings.
- North America > United States > California (0.04)
- Europe > France (0.04)
Sharp Concentration Inequalities: Phase Transition and Mixing of Orlicz Tails with Variance
In this work, we investigate how to develop sharp concentration inequalities for sub-Weibull random variables, including sub-Gaussian and sub-exponential distributions. Although the random variables may not be sub-Guassian, the tail probability around the origin behaves as if they were sub-Gaussian, and the tail probability decays align with the Orlicz $Ψ_α$-tail elsewhere. Specifically, for independent and identically distributed (i.i.d.) $\{X_i\}_{i=1}^n$ with finite Orlicz norm $\|X\|_{Ψ_α}$, our theory unveils that there is an interesting phase transition at $α= 2$ in that $\PPł(ł|\sum_{i=1}^n X_i \r| \geq t\r)$ with $t > 0$ is upper bounded by $2\expł(-C\maxł\{\frac{t^2}{n\|X\|_{Ψ_α}^2},\frac{t^α}{ n^{α-1} \|X\|_{Ψ_α}^α}\r\}\r)$ for $α\geq 2$, and by $2\expł(-C\minł\{\frac{t^2}{n\|X\|_{Ψ_α}^2},\frac{t^α}{ n^{α-1} \|X\|_{Ψ_α}^α}\r\}\r)$ for $1\leq α\leq 2$ with some positive constant $C$. In many scenarios, it is often necessary to distinguish the standard deviation from the Orlicz norm when the latter can exceed the former greatly. To accommodate this, we build a new theoretical analysis framework, and our sharp, flexible concentration inequalities involve the variance and a mixing of Orlicz $Ψ_α$-tails through the min and max functions. Our theory yields new, improved concentration inequalities even for the cases of sub-Gaussian and sub-exponential distributions with $α= 2$ and $1$, respectively. We further demonstrate our theory on martingales, random vectors, random matrices, and covariance matrix estimation. These sharp concentration inequalities can empower more precise non-asymptotic analyses across different statistical and machine learning applications.
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Complete Causal Identification from Ancestral Graphs under Selection Bias
Many causal discovery algorithms, including the celebrated FCI algorithm, output a Partial Ancestral Graph (PAG). PAGs serve as an abstract graphical representation of the underlying causal structure, modeled by directed acyclic graphs with latent and selection variables. This paper develops a characterization of the set of extended-type conditional independence relations that are invariant across all causal models represented by a PAG. This theory allows us to formulate a general measure-theoretic version of Pearl's causal calculus and a sound and complete identification algorithm for PAGs under selection bias. Our results also apply when PAGs are learned by certain algorithms that integrate observational data with experimental data and incorporate background knowledge.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.13)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- (4 more...)
Persistence-based topological optimization: a survey
Carriere, Mathieu, Ike, Yuichi, Lacombe, Théo, Nishikawa, Naoki
Computational topology provides a tool, persistent homology, to extract quantitative descriptors from structured objects (images, graphs, point clouds, etc). These descriptors can then be involved in optimization problems, typically as a way to incorporate topological priors or to regularize machine learning models. This is usually achieved by minimizing adequate, topologically-informed losses based on these descriptors, which, in turn, naturally raises theoretical and practical questions about the possibility of optimizing such loss functions using gradient-based algorithms. This has been an active research field in the topological data analysis community over the last decade, and various techniques have been developed to enable optimization of persistence-based loss functions with gradient descent schemes. This survey presents the current state of this field, covering its theoretical foundations, the algorithmic aspects, and showcasing practical uses in several applications. It includes a detailed introduction to persistence theory and, as such, aims at being accessible to mathematicians and data scientists newcomers to the field. It is accompanied by an open-source library which implements the different approaches covered in this survey, providing a convenient playground for researchers to get familiar with the field.
- North America > United States (0.14)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Europe > Germany (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.65)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
ResNets of All Shapes and Sizes: Convergence of Training Dynamics in the Large-scale Limit
Chaintron, Louis-Pierre, Chizat, Lénaïc, Maass, Javier
We establish convergence of the training dynamics of residual neural networks (ResNets) to their joint infinite depth L, hidden width M, and embedding dimension D limit. Specifically, we consider ResNets with two-layer perceptron blocks in the maximal local feature update (MLU) regime and prove that, after a bounded number of training steps, the error between the ResNet and its large-scale limit is O(1/L + sqrt(D/(L M)) + 1/sqrt(D)). This error rate is empirically tight when measured in embedding space. For a budget of P = Theta(L M D) parameters, this yields a convergence rate O(P^(-1/6)) for the scalings of (L, M, D) that minimize the bound. Our analysis exploits in an essential way the depth-two structure of residual blocks and applies formally to a broad class of state-of-the-art architectures, including Transformers with bounded key-query dimension. From a technical viewpoint, this work completes the program initiated in the companion paper [Chi25] where it is proved that for a fixed embedding dimension D, the training dynamics converges to a Mean ODE dynamics at rate O(1/L + sqrt(D)/sqrt(L M)). Here, we study the large-D limit of this Mean ODE model and establish convergence at rate O(1/sqrt(D)), yielding the above bound by a triangle inequality. To handle the rich probabilistic structure of the limit dynamics and obtain one of the first rigorous quantitative convergence for a DMFT-type limit, we combine the cavity method with propagation of chaos arguments at a functional level on so-called skeleton maps, which express the weight updates as functions of CLT-type sums from the past.
- Europe > Switzerland > Vaud > Lausanne (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Asia > Singapore > Central Region > Singapore (0.04)
Robust Automatic Differentiation of Square-Root Kalman Filters via Gramian Differentials
Square-root Kalman filters propagate state covariances in Cholesky-factor form for numerical stability, and are a natural target for gradient-based parameter learning in state-space models. Their core operation, triangularization of a matrix $M \in \mathbb{R}^{n \times m}$, is computed via a QR decomposition in practice, but naively differentiating through it causes two problems: the semi-orthogonal factor is non-unique when $m > n$, yielding undefined gradients; and the standard Jacobian formula involves inverses, which diverges when $M$ is rank-deficient. Both are resolved by the observation that all filter outputs relevant to learning depend on the input matrix only through the Gramian $MM^\top$, so the composite loss is smooth in $M$ even where the triangularization is not. We derive a closed-form chain-rule directly from the differential of this Gramian identity, prove it exact for the Kalman log-marginal likelihood and filtered moments, and extend it to rank-deficient inputs via a two-component decomposition: a column-space term based on the Moore--Penrose pseudoinverse, and a null-space correction for perturbations outside the column space of $M$.
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > West Midlands > Coventry (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- South America > Brazil (0.04)
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (3 more...)