Goto

Collaborating Authors

 decay


Convergence theory for Hermite approximations under adaptive coordinate transformations

Saleh, Yahya

arXiv.org Machine Learning

Recent work has shown that parameterizing and optimizing coordinate transformations using normalizing flows, i.e., invertible neural networks, can significantly accelerate the convergence of spectral approximations. We present the first error estimates for approximating functions using Hermite expansions composed with adaptive coordinate transformations. Our analysis establishes an equivalence principle: approximating a function $f$ in the span of the transformed basis is equivalent to approximating the pullback of $f$ in the span of Hermite functions. This allows us to leverage the classical approximation theory of Hermite expansions to derive error estimates in transformed coordinates in terms of the regularity of the pullback. We present an example demonstrating how a nonlinear coordinate transformation can enhance the convergence of Hermite expansions. Focusing on smooth functions decaying along the real axis, we construct a monotone transport map that aligns the decay of the target function with the Hermite basis. This guarantees spectral convergence rates for the corresponding Hermite expansion. Our analysis provides theoretical insight into the convergence behavior of adaptive Hermite approximations based on normalizing flows, as recently explored in the computational quantum physics literature.


Causal Reconstruction of Sentiment Signals from Sparse News Data

Stan, Stefania, Lunghi, Marzio, Vargetto, Vito, Ricci, Claudio, Repetto, Rolands, Leo, Brayden, Gan, Shao-Hong

arXiv.org Machine Learning

Sentiment signals derived from sparse news are commonly used in financial analysis and technology monitoring, yet transforming raw article-level observations into reliable temporal series remains a largely unsolved engineering problem. Rather than treating this as a classification challenge, we propose to frame it as a causal signal reconstruction problem: given probabilistic sentiment outputs from a fixed classifier, recover a stable latent sentiment series that is robust to the structural pathologies of news data such as sparsity, redundancy, and classifier uncertainty. We present a modular three-stage pipeline that (i) aggregates article-level scores onto a regular temporal grid with uncertainty-aware and redundancy-aware weights, (ii) fills coverage gaps through strictly causal projection rules, and (iii) applies causal smoothing to reduce residual noise. Because ground-truth longitudinal sentiment labels are typically unavailable, we introduce a label-free evaluation framework based on signal stability diagnostics, information preservation lag proxies, and counterfactual tests for causality compliance and redundancy robustness. As a secondary external check, we evaluate the consistency of reconstructed signals against stock-price data for a multi-firm dataset of AI-related news titles (November 2024 to February 2026). The key empirical finding is a three-week lead lag pattern between reconstructed sentiment and price that persists across all tested pipeline configurations and aggregation regimes, a structural regularity more informative than any single correlation coefficient. Overall, the results support the view that stable, deployable sentiment indicators require careful reconstruction, not only better classifiers.


A 1/R Law for Kurtosis Contrast in Balanced Mixtures

Bi, Yuda, Xiao, Wenjun, Bai, Linhao, Calhoun, Vince D

arXiv.org Machine Learning

Abstract--Kurtosis-based Independent Component Analysis (ICA) weakens in wide, balanced mixtures. We also show that purification--selecting m R sign-consistent sources--restores R-independent contrast Ω(1/m), with a simple data-driven heuristic. Synthetic experiments validate the predicted decay, the T crossover, and contrast recovery. Independent Component Analysis (ICA) recovers statistically independent latent sources from linear mixtures and is identifiable whenever at most one source is Gaussian [1]. Excess kurtosis--the standardized fourth cumulant--is a central contrast function [9], and kurtosis-type nonlinearities remain standard in FastICA.







Deriving Neural Scaling Laws from the statistics of natural language

Cagnetta, Francesco, Raventós, Allan, Ganguli, Surya, Wyart, Matthieu

arXiv.org Machine Learning

Despite the fact that experimental neural scaling laws have substantially guided empirical progress in large-scale machine learning, no existing theory can quantitatively predict the exponents of these important laws for any modern LLM trained on any natural language dataset. We provide the first such theory in the case of data-limited scaling laws. We isolate two key statistical properties of language that alone can predict neural scaling exponents: (i) the decay of pairwise token correlations with time separation between token pairs, and (ii) the decay of the next-token conditional entropy with the length of the conditioning context. We further derive a simple formula in terms of these statistics that predicts data-limited neural scaling exponents from first principles without any free parameters or synthetic data models. Our theory exhibits a remarkable match with experimentally measured neural scaling laws obtained from training GPT-2 and LLaMA style models from scratch on two qualitatively different benchmarks, TinyStories and WikiText.