propagation
Iterative Causal Discovery: Per-Edge Impossibility Certificates, Tier-Aware Oracle Queries, and the $1+K$ Lower Bound
Causal-discovery algorithms return a directed graph, yet provide no principled means of distinguishing edge directions identified by the data from those assigned without an identifying assumption. Under the standard Markov and faithfulness conditions, the observational distribution identifies only a Markov equivalence class; orientations within that class are not determined by the joint distribution and cannot be recovered from additional samples alone, but require either a functional restriction or an intervention. We introduce a protocol for observational causal discovery on continuous data that attaches to each candidate edge a discrete impossibility certificate: a RESOLVED code records the identifiability theorem under which the direction was committed, while an IMPOSSIBLE code records the failure mode together with the specific question a domain expert must answer to resolve it. The bivariate cascade is extended with five gated identifiability tiers LSNM, IGCI, Stein, MDL, and PEIT that abstain when their precondition test rejects. Two oracle primitives, the meta-hub query and the node-children query, jointly establish an upper bound of $1+K$ expert interactions sufficient to recover any DAG, where $K$ denotes the number of non-leaf vertices. Under an ideal-oracle assumption, the bound is met exactly on the asia, sachs, child, and alarm benchmarks.
Memory, Roughness, and Information Persistence in Financial Markets: A Structural Approach to Volatility Forecasting
Deep, Akash, Appiah, Nicholas, Rachev, Svetlozar T.
This paper studies the joint role of long-memory dynamics,rough-volatility behavior, and persistence-based forecasting features in equity volatility modeling. We combine semiparametric long-memory estimation, rough-volatility diagnostics, and structured forecasting regressions to examine whether persistence measures contain economically meaningful forecasting information beyond conventional volatility predictors. Using a panel of 115 S&P500 constituents from November 2001 through April 2026, we document that volatility proxies exhibit substantial long-memory behavior and locally rough dynamics. The cross-sectional mean Geweke-Porter-Hudak estimate of the memory parameter is $\hat{d} = 0.226$, while the corresponding local-Whittle estimate is $\hat{d} = 0.440$, with statistical significance observed across nearly the entire panel. Rolling estimates of persistence rise substantially during the global financial crisis and the COVID period and display a positive contemporaneous association with the VIX. We then examine whether persistence-related features improve out-of-sample volatility forecasts beyond standard HAR and HAR-X benchmarks. Incorporating cross-sectional persistence aggregates, sectoral persistence measures, and persistence-by-stress interaction terms produces moderate but statistically significant forecasting improvements, particularly at longer horizons and during stress regimes. Forecast gains are strongest during periods of elevated market volatility and in volatility-managed portfolio applications. The results suggest that persistence measures may serve as useful reduced-form indicators of the duration and propagation of uncertainty in financial markets, although the paper does not claim structural identification of the economic mechanisms generating persistence.
Dropout Universality: Scaling Laws and Optimal Scheduling at the Edge-of-Chaos
We develop a mean-field theory of dropout as a perturbation of critical signal propagation at the edge of chaos. Dropout shifts the perfect-alignment fixed point, making the depth scale for information propagation finite even at critical initialization. We derive critical and crossover scaling laws for correlation decay and establish that smooth activations and kinked, ReLU-like activations constitute distinct universality classes, with different critical exponents and a universal two-parameter scaling collapse in detuning and dropout strength. The distinction traces to the analytic structure of the correlation map: smooth activations admit a Taylor expansion near perfect alignment, while kinked activations develop a branch point with universal non-analyticity. As a corollary, the framework yields saturated dropout profiles under fixed budget; a rank-flow tie-breaker then selects front-loaded schedules, substantially reducing held-out test loss at no extra computational cost, with accuracy gains as a consistent secondary effect. We test the predictions in MLPs and Vision Transformers and discuss CNN/ResNet extensions.
Uniform-in-Time Weak Propagation-of-Chaos in Shallow Neural Networks
Glasgow, Margalit, Bruna, Joan
We consider one-hidden layer neural networks trained in the feature-learning regime using gradient descent, and relate the output of the finite-width network $f_{\hatฯ_t^m}$ to its infinite-width counterpart $f_{ฯ_t^{MF}}$, which evolves in the mean-field dynamics. While constant-time horizon bounds for $\|f_{ฯ_t^{MF}} - f_{\hatฯ_t^m}\|$ may be obtained via standard Grรถnwall estimates, the long-time behavior of the fluctuation is a more delicate matter. Uniform-in-time bounds often rely on (local) strong convexity in the landscape or Logarithmic Sobolev inequalities present in noisy gradient dynamics. In this work, we establish non-asymptotic weak propagation-of-chaos that holds uniformly in time, obtained by exploiting instead the convergence rate of the mean-field deterministic Wasserstein-gradient-flow dynamics. Specifically, denoting by $L_t$ the mean-field excess MSE loss at time $t$ and $m$ the number of neurons, under standard regularity assumptions and the condition $\int_0^\infty L_t^{1/2} dt =O(\log d)$, we obtain the uniform in time bound $\|f_{ฯ_t^{MF}}- f_{\hatฯ_t^m}\|^2 \lesssim \text{poly}(d) m^{-\min(1,c/6)}$ whenever $L_t \lesssim t^{-c}$. Our result holds in a noiseless setting and does not make any assumptions on the geometry of the landscape near the optimum, and extends seamlessly to other forms of discretization, including finite number of samples and time discretization. A key takeaway of our result is that whenever the convergence rate of the mean-field, population-loss dynamics is faster than $t^{-2}$, we can attain a loss of $ฮต$ with only $\text{poly}(d/ฮต)$ neurons, training samples, and GD steps.
Estimating the expected output of wide random MLPs more efficiently than sampling
Wu, Wilson, Lecomte, Victor, Winer, Michael, Robinson, George, Hilton, Jacob, Christiano, Paul
By far the most common way to estimate an expected loss in machine learning is to draw samples, compute the loss on each one, and take the empirical average. However, sampling is not necessarily optimal. Given an MLP at initialization, we show how to estimate its expected output over Gaussian inputs without running samples through the network at all. Instead, we produce approximate representations of the distributions of activations at each layer, leveraging tools such as cumulants and Hermite expansions. We show both theoretically and empirically that for sufficiently wide networks, our estimator achieves a target mean squared error using substantially fewer FLOPs than Monte Carlo sampling. We find moreover that our methods perform particularly well at estimating the probabilities of rare events, and additionally demonstrate how they can be used for model training. Together, these findings suggest a path to producing models with a greatly reduced probability of catastrophic tail risks.
Approximate inference of marginals using the IBIA framework
Exact inference of marginals in probabilistic graphical models (PGM) is known to be intractable, necessitating the use of approximate methods. Most of the existing variational techniques perform iterative message passing in loopy graphs which is slow to converge for many benchmarks. In this paper, we propose a new algorithm for marginal inference that is based on the incremental build-infer-approximate (IBIA) paradigm. Our algorithm converts the PGM into a sequence of linked clique tree forests (SLCTF) with bounded clique sizes, and then uses a heuristic belief update algorithm to infer the marginals. For the special case of Bayesian networks, we show that if the incremental build step in IBIA uses the topological order of variables then (a) the prior marginals are consistent in all CTFs in the SLCTF and (b) the posterior marginals are consistent once all evidence variables are added to the SLCTF. In our approach, the belief propagation step is non-iterative and the accuracy-complexity trade-off is controlled using user-defined clique size bounds. Results for several benchmark sets from recent UAI competitions show that our method gives either better or comparable accuracy than existing variational and sampling based methods, with smaller runtimes.