AITopics | proposition 5

Collaborating Authors

proposition 5

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Triangular-Reference Schrödinger Bridges for Time Series Generation

Bocchi, Gabriele

arXiv.org Machine LearningMay-28-2026

We introduce Triangular-Reference Schrödinger Bridges for Time Series (TR-SBTS), a conservative extension of the SBTS framework in which the Brownian reference is replaced by an intervalwise frozen, possibly degenerate diffusion reference, triangular across a hierarchy of latent volatility levels. The construction is a single entropy projection on the augmented state space, with the variational constraint imposed jointly across time and the latent levels and unfolded hierarchically by the disintegration of relative entropy. The variational core of SBTS is preserved: the entropy minimiser is the h-transform of the reference, and on each frozen interval the optimal dynamics admit a logarithmic-gradient drift formula on the affine leaves of the active covariance directions, valid even when the frozen covariance is rank-deficient. We establish stability of the frozen approximation and convergence of the corresponding regularised kernel estimators. The construction is realised through a finite-dimensional conditioning map assembled from three complementary reductions of the past -- a block PCR summary, a reference-aware Mahalanobis kernel on past increments induced by the runtime frozen covariance cumulants, and a past-window WLS drift regressor under the same reference metric -- together with a coupled state-covariance bridge step in which each latent level produces a dynamic reference for the level above, summarised by a covariance descriptor; the construction is evaluated on numerical experiments.

artificial intelligence, convergence, machine learning, (17 more...)

arXiv.org Machine Learning

2605.27478

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.87)

Add feedback

Randomized Subspace Nesterov Accelerated Gradient

Omiya, Gaku, Poirion, Pierre-Louis, Takeda, Akiko

arXiv.org Machine LearningMay-4-2026

Randomized-subspace methods reduce the cost of first-order optimization by using only low-dimensional projected-gradient information, a feature that is attractive in forward-mode automatic differentiation and communication-limited settings. While Nesterov acceleration is well understood for full-gradient and coordinate-based methods, obtaining accelerated methods for general subspace sketches that use only projected-gradient information and can improve over full-dimensional Nesterov acceleration in oracle complexity is technically nontrivial. We develop randomized-subspace Nesterov accelerated gradient methods for smooth convex and smooth strongly convex optimization under matrix smoothness and generic sketch moment assumptions. The key technical ingredient is a three-sequence formulation tailored to matrix smoothness, which recovers the corresponding classical Nesterov methods in the full-dimensional case. The resulting theory establishes accelerated oracle-complexity guarantees and makes explicit how matrix smoothness and the sketch distribution enter the complexity. It also provides a unified basis for comparing sketch families and identifying when randomized-subspace acceleration improves over full-dimensional Nesterov acceleration in oracle complexity.

artificial intelligence, machine learning, sketch, (17 more...)

arXiv.org Machine Learning

2605.0074

Country:

Asia > Japan (0.28)
North America > United States (0.28)

Genre: Research Report (0.83)

Industry: Health & Medicine (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Add feedback

Transformer Approximations from ReLUs

Hu, Jerry Yao-Chieh, Lu, Mingcheng, Lee, Yi-Chen, Liu, Han

arXiv.org Machine LearningApr-29-2026

We present a systematic recipe for translating ReLU approximation results to softmax Transformers1. Given a constructive ReLU approximator for a target, we construct an explicit softmax transformer with the same accuracy. The recipe applies to many common approximation targets and yields quantitative resource bounds beyond universal approximation statements. This matters because broad Universal Approximation Properties (UAP) still dominate Transformer approximation theory. For softmax Transformer, many universality results provide explicit constructions and quantitative resource bounds (e.g., parameters, depth, width...etc) [Yun et al., 2020, Kajitsuka and Sato, 2023, Takakura and Suzuki, 2023, Jiang and Li, 2024, Hu et al., 2025,

approximation, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

2604.24878

Country:

North America > United States (0.28)
Asia > Taiwan (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.77)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

When Does Dynamic Preconditioning Preserve the Polyak-Ruppert CLT? A Stabilization Threshold

An, Sunyoung, Huo, Xiaoming

arXiv.org Machine LearningApr-28-2026

The central limit theorem (CLT) is a foundation of statistical inference: it provides the asymptotic distribution needed for confidence intervals, hypothesis tests, and efficiency comparisons [24, 42]. For iterate-averaged stochastic gradient methods, it specifies both a Gaussian limit and its sandwich covariance in a single theorem statement. This foundation now underpins inference in streaming and online settings--online A/B testing, continual monitoring of treatment effects, and streaming M-estimation, for example--where the estimator is updated one observation at a time and inference must be performed in real time. A line of recent work develops online inference procedures for averaged SGD [10, 23, 46]. In practice, one-pass stochastic optimization is routinely combined with adaptive preconditioning, which improves computational efficiency and is believed to sharpen the resulting Gaussian approximation in finite samples. If the CLT fails or the asymptotic variance is altered by the adaptive preconditioning, all downstream inference-- coverage of confidence intervals, size of hypothesis tests, consistency of plug-in covariance estimators--is compromised. A rigorous understanding of when adaptive preconditioning preserves the CLT is, therefore, a prerequisite for reliable inference in these settings.

artificial intelligence, hypothesis, machine learning, (19 more...)

arXiv.org Machine Learning

2604.23498

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback

Parameter Tuning

Neural Information Processing SystemsApr-27-2026, 09:50:56 GMT

If observations from the joint distribution of (A,Y,Z,W) are available in both stages, we can tune the regularization parameters λ1,λ2 using the approach proposed in Singh et al. [30], Xu et al. [35]. Let the complete data of stage 1 and stage 2 be denoted as (ai,yi,zi,wi) and ( ai, yi, zi, wi). Then, we can use the data not used in each stage to evaluate the out-of-sample performance of the other stage. A(2), ˆV(T),u(T) are the learned parameters by Algorithm 1. In this appendix, we prove propositions given in the main text. In the following, we assume that the spaces U, A, Z,W are separable and completely metrizable topological spaces and equipped with Borel σ-algebras. In this section, we use the notation PA|Z=z to express the distribution of a random variable Agiven another variable Z = z.

artificial intelligence, log 2, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

27e9661e033a73a6ad8cefcde965c54d-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 04:57:07 GMT

artificial intelligence, decision tree, hypothesis, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)

Add feedback

Learning with little mixing

Neural Information Processing SystemsApr-24-2026, 23:50:23 GMT

We study square loss in a realizable time-series framework with martingale difference noise. Our main result is a fast rate excess risk bound which shows that whenever a trajectory hypercontractivity condition holds, the risk of the leastsquares estimator on dependent data matches the iid rate order-wise after a burn-in time. In comparison, many existing results in learning from dependent data have rates where the effective sample size is deflated by a factor of the mixing-time of the underlying process, even after the burn-in time. Furthermore, our results allow the covariate process to exhibit long range correlations which are substantially weaker than geometric ergodicity. We call this phenomenon learning with little mixing, and present several examples for when it occurs: bounded function classes for which the L2 and L2+ε norms are equivalent, ergodic finite state Markov chains, various parametric models, and a broad family of infinite dimensional ℓ2(N)ellipsoids. By instantiating our main result to system identification of nonlinear dynamics with generalized linear model transitions, we obtain a nearly minimax optimal excess risk bound after only a polynomial burn-in time.

artificial intelligence, machine learning, theorem 4, (18 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Learning to Emulate Chaos: Adversarial Optimal Transport Regularization

Melo, Gabriel, Santiago, Leonardo, Lu, Peter Y.

arXiv.org Machine LearningApr-24-2026

Chaos arises in many complex dynamical systems, from weather to power grids, but is difficult to accurately model using data-driven emulators, including neural operator architectures. For chaotic systems, the inherent sensitivity to initial conditions makes exact long-term forecasts theoretically infeasible, meaning that traditional squared-error losses often fail when trained on noisy data. Recent work has focused on training emulators to match the statistical properties of chaotic attractors by introducing regularization based on handcrafted local features and summary statistics, as well as learned statistics extracted from a diverse dataset of trajectories. In this work, we propose a family of adversarial optimal transport objectives that jointly learn high-quality summary statistics and a physically consistent emulator. We theoretically analyze and experimentally validate a Sinkhorn divergence formulation (2-Wasserstein) and a WGAN-style dual formulation (1-Wasserstein). Our experiments across a variety of chaotic systems, including systems with high-dimensional chaotic attractors, show that emulators trained with our approach exhibit significantly improved long-term statistical fidelity.

artificial intelligence, learningtoemulatechao, machine learning, (19 more...)

arXiv.org Machine Learning

2604.21097

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Horospherical Depth and Busemann Median on Hadamard Manifolds

Jiang, Yangdi, Chang, Xiaotian, Mostajeran, Cyrus

arXiv.org Machine LearningApr-21-2026

\We introduce the horospherical depth, an intrinsic notion of statistical depth on Hadamard manifolds, and define the Busemann median as the set of its maximizers. The construction exploits the fact that the linear functionals appearing in Tukey's half-space depth are themselves limits of renormalized distance functions; on a Hadamard manifold the same limiting procedure produces Busemann functions, whose sublevel sets are horoballs, the intrinsic replacements for halfspaces. The resulting depth is parametrized by the visual boundary, is isometry-equivariant, and requires neither tangent-space linearization nor a chosen base point.For arbitrary Hadamard manifolds, we prove that the depth regions are nested and geodesically convex, that a centerpoint of depth at least $1/(d+1)$ exists, and hence that the Busemann median exists for every Borel probability measure. Under strictly negative sectional curvature and mild regularity assumptions, the depth is strictly quasi-concave and the median is unique. We also establish robustness: the depth is stable under total-variation perturbations, and under contamination escaping to infinity the limiting median depends on the escape direction but not on how far the contaminating mass has moved along the geodesic ray, in contrast with the Fréchet mean. Finally, we establish uniform consistency of the sample depth and convergence of sample depth regions and sample Busemann medians; on symmetric spaces of noncompact type, the argument proceeds through a VC analysis of upper horospherical halfspaces, while on general Hadamard manifolds it follows from a compactness argument under a mild non-atomicity assumption.

artificial intelligence, hadamard manifold, machine learning, (17 more...)

arXiv.org Machine Learning

2604.18242

Country: Asia > Middle East > Israel (0.04)

Genre: Research Report (0.50)

Industry: Energy (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Filters

Collaborating Authors

proposition 5

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Triangular-Reference Schrödinger Bridges for Time Series Generation

Randomized Subspace Nesterov Accelerated Gradient

Transformer Approximations from ReLUs

When Does Dynamic Preconditioning Preserve the Polyak-Ruppert CLT? A Stabilization Threshold

Parameter Tuning

27e9661e033a73a6ad8cefcde965c54d-Supplemental.pdf

1165af8b913fb836c6280b42d6e0084f-Supplemental-Conference.pdf

Learning with little mixing

Learning to Emulate Chaos: Adversarial Optimal Transport Regularization

Horospherical Depth and Busemann Median on Hadamard Manifolds