AITopics | dependent data

Collaborating Authors

dependent data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

OnEmpiricalRiskMinimizationwithDependent and Heavy-TailedData

Neural Information Processing SystemsFeb-8-2026, 12:56:54 GMT

The above problem requires the knowledge of the distributionπ which is typically unknown in practice.

artificial intelligence, citedonpage2, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Yolo County > Davis (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Learning with little mixing

Neural Information Processing SystemsFeb-7-2026, 19:25:40 GMT

artificial intelligence, burn-in time, machine learning, (16 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
Europe > Netherlands (0.04)

Genre: Research Report > New Finding (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)

Add feedback

Wasserstein-p Central Limit Theorem Rates: From Local Dependence to Markov Chains

Zhang, Yixuan, Xie, Qiaomin

arXiv.org Machine LearningJan-16-2026

Finite-time central limit theorem (CLT) rates play a central role in modern machine learning. In this paper, we study CLT rates for multivariate dependent data in Wasserstein-$p$ ($W_p$) distance, for general $p \geq 1$. We focus on two fundamental dependence structures that commonly arise in machine learning: locally dependent sequences and geometrically ergodic Markov chains. In both settings, we establish the first optimal $O(n^{-1/2})$ rate in $W_1$, as well as the first $W_p$ ($p\ge 2$) CLT rates under mild moment assumptions, substantially improving the best previously known bounds in these dependent-data regimes. As an application of our optimal $W_1$ rate for locally dependent sequences, we further obtain the first optimal $W_1$-CLT rate for multivariate $U$-statistics. On the technical side, we derive a tractable auxiliary bound for $W_1$ Gaussian approximation errors that is well suited for studying dependent data. For Markov chains, we further prove that the regeneration time of the split chain associated with a geometrically ergodic chain has a geometric tail without assuming strong aperiodicity or other restrictive conditions. These tools may be of independent interests and enable our optimal $W_1$ rates and underpin our $W_p$ ($p\ge 2$) results.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

2601.08184

Country: North America > United States > Wisconsin (0.28)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.67)

Add feedback

The noise level in linear regression with dependent data

Neural Information Processing SystemsDec-27-2025, 03:25:27 GMT

We derive upper bounds for random design linear regression with dependent ($\beta$-mixing) data absent any realizability assumptions. In contrast to the strictly realizable martingale noise regime, no sharp \emph{instance-optimal} non-asymptotics are available in the literature. Up to constant factors, our analysis correctly recovers the variance term predicted by the Central Limit Theorem---the noise level of the problem---and thus exhibits graceful degradation as we introduce misspecification. Past a burn-in, our result is sharp in the moderate deviations regime, and in particular does not inflate the leading order term by mixing time factors.

linear regression, name change, noise level, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.47)

Add feedback

Learning with little mixing

Neural Information Processing SystemsDec-23-2025, 21:03:57 GMT

We study square loss in a realizable time-series framework with martingale difference noise. Our main result is a fast rate excess risk bound which shows that whenever a trajectory hypercontractivity condition holds, the risk of the least-squares estimator on dependent data matches the iid rate order-wise after a burn-in time. In comparison, many existing results in learning from dependent data have rates where the effective sample size is deflated by a factor of the mixing-time of the underlying process, even after the burn-in time. Furthermore, our results allow the covariate process to exhibit long range correlations which are substantially weaker than geometric ergodicity. We call this phenomenon learning with little mixing, and present several examples for when it occurs: bounded function classes for which the $L^2$ and $L^{2+\epsilon}$ norms are equivalent, finite state irreducible and aperiodic Markov chains, various parametric models, and a broad family of infinite dimensional $\ell^2(\mathbb{N})$ ellipsoids. By instantiating our main result to system identification of nonlinear dynamics with generalized linear model transitions, we obtain a nearly minimax optimal excess risk bound after only a polynomial burn-in time.

burn-in time, learning, name change, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Architecture-Aware Generalization Bounds for Temporal Networks: Theory and Fair Comparison Methodology

Gahtan, Barak, Bronstein, Alex M.

arXiv.org Artificial IntelligenceDec-9-2025

Deep temporal architectures such as TCNs achieve strong predictive performance on sequential data, yet theoretical understanding of their generalization remains limited. We address this gap through three contributions: introducing an evaluation methodology for temporal models, revealing surprising empirical phenomena about temporal dependence, and the first architecture-aware theoretical framework for dependent sequences. Fair-Comparison Methodology. We introduce evaluation protocols that fix effective sample size $N_{\text{eff}}$ to isolate temporal structure effects from information content. Empirical Findings. Applying this method reveals that under $N_{\text{eff}} = 2000$, strongly dependent sequences ($ρ= 0.8$) exhibit approx' $76\%$ smaller generalization gaps than weakly dependent ones ($ρ= 0.2$), challenging the conventional view that dependence universally impedes learning. However, observed convergence rates ($N_{\text{eff}}^{-1.21}$ to $N_{\text{eff}}^{-0.89}$) significantly exceed theoretical worst-case predictions ($N^{-0.5}$), revealing that temporal architectures exploit problem structure in ways current theory does not capture. Lastly, we develop the first architecture-aware generalization bounds for deep temporal models on exponentially $β$-mixing sequences. By embedding Golowich et al.'s i.i.d. class bound within a novel blocking scheme that partitions $N$ samples into approx' $B \approx N/\log N$ quasi-independent blocks, we establish polynomial sample complexity under convex Lipschitz losses. The framework achieves $\sqrt{D}$ depth scaling alongside the product of layer-wise norms $R = \prod_{\ell=1}^{D} M^{(\ell)}$, avoiding exponential dependence. While these bounds are conservative, they prove learnability and identify architectural scaling laws, providing worst-case baselines that highlight where future theory must improve.

artificial intelligence, complexity, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2508.06066

Country:

Europe (0.45)
North America > United States > Massachusetts (0.27)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Identification and Estimation of Causal Effects from Dependent Data

Neural Information Processing SystemsNov-20-2025, 21:41:22 GMT

The assumption that data samples are independent and identically distributed (iid) is standard in many areas of statistics and machine learning. Nevertheless, in some settings, such as social networks, infectious disease modeling, and reasoning with spatial and temporal data, this assumption is false. An extensive literature exists on making causal inferences under the iid assumption [12, 8, 21, 16], but, as pointed out in [14], causal inference in non-iid contexts is challenging due to the combination of unobserved confounding bias and data dependence. In this paper we develop a general theory describing when causal inferences are possible in such scenarios. We use segregated graphs [15], a generalization of latent projection mixed graphs [23], to represent causal models of this type and provide a complete algorithm for non-parametric identification in these models. We then demonstrate how statistical inferences may be performed on causal parameters identified by this algorithm, even in cases where parts of the model exhibit full interference, meaning only a single sample is available for parts of the model [19]. We apply these techniques to a synthetic data set which considers the adoption of fake news articles given the social network structure, articles read by each person, and baseline demographics and socioeconomic covariates.

causal effect, identification and estimation, name change, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.77)

Add feedback

The noise level in linear regression with dependent data

Neural Information Processing SystemsJan-20-2025, 01:53:15 GMT

We derive upper bounds for random design linear regression with dependent ( \beta -mixing) data absent any realizability assumptions. In contrast to the strictly realizable martingale noise regime, no sharp \emph{instance-optimal} non-asymptotics are available in the literature. Up to constant factors, our analysis correctly recovers the variance term predicted by the Central Limit Theorem---the noise level of the problem---and thus exhibits graceful degradation as we introduce misspecification. Past a burn-in, our result is sharp in the moderate deviations regime, and in particular does not inflate the leading order term by mixing time factors.

dependent data, linear regression, noise level, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Add feedback

Asymptotics of Linear Regression with Linearly Dependent Data

Moniri, Behrad, Hassani, Hamed

arXiv.org Machine LearningDec-7-2024

In this paper we study the asymptotics of linear regression in settings with non-Gaussian covariates where the covariates exhibit a linear dependency structure, departing from the standard assumption of independence. We model the covariates using stochastic processes with spatio-temporal covariance and analyze the performance of ridge regression in the high-dimensional proportional regime, where the number of samples and feature dimensions grow proportionally. A Gaussian universality theorem is proven, demonstrating that the asymptotics are invariant under replacing the non-Gaussian covariates with Gaussian vectors preserving mean and covariance, for which tools from random matrix theory can be used to derive precise characterizations of the estimation error. The estimation error is characterized by a fixed-point equation involving the spectral properties of the spatio-temporal covariance matrices, enabling efficient computation. We then study optimal regularization, overparameterization, and the double descent phenomenon in the context of dependent data. Simulations validate our theoretical predictions, shedding light on how dependencies influence estimation error and the choice of regularization parameters.

artificial intelligence, machine learning, matrix, (18 more...)

arXiv.org Machine Learning

2412.03702

Country: