dobrushin
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Tyne and Wear > Sunderland (0.04)
- Asia > Middle East > Jordan (0.04)
Transformers and Their Roles as Time Series Foundation Models
Wu, Dennis, He, Yihan, Cao, Yuan, Fan, Jianqing, Liu, Han
We give a comprehensive analysis of transformers as time series foundation models, focusing on their approximation and generalization capabilities. First, we demonstrate that there exist transformers that fit an autoregressive model on input univariate time series via gradient descent. We then analyze MOIRAI, a multivariate time series foundation model capable of handling an arbitrary number of covariates. We prove that it is capable of automatically fitting autoregressive models with an arbitrary number of covariates, offering insights into its design and empirical success. For generalization, we establish bounds for pretraining when the data satisfies Dobrushin's condition. Experiments support our theoretical findings, highlighting the efficacy of transformers as time series foundation models.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China > Hong Kong (0.04)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Data Science (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Emergence of meta-stable clustering in mean-field transformer models
Bruno, Giuseppe, Pasqualotto, Federico, Agazzi, Andrea
We model the evolution of tokens within a deep stack of Transformer layers as a continuous-time flow on the unit sphere, governed by a mean-field interacting particle system, building on the framework introduced in (Geshkovski et al., 2023). Studying the corresponding mean-field Partial Differential Equation (PDE), which can be interpreted as a Wasserstein gradient flow, in this paper we provide a mathematical investigation of the long-term behavior of this system, with a particular focus on the emergence and persistence of meta-stable phases and clustering phenomena, key elements in applications like next-token prediction. More specifically, we perform a perturbative analysis of the mean-field PDE around the iid uniform initialization and prove that, in the limit of large number of tokens, the model remains close to a meta-stable manifold of solutions with a given structure (e.g., periodicity). Further, the structure characterizing the meta-stable manifold is explicitly identified, as a function of the inverse temperature parameter of the model, by the index maximizing a certain rescaling of Gegenbauer polynomials.
Reviews: HOGWILD!-Gibbs can be PanAccurate
The authors prove theorems about the accuracy of asynchronous Gibbs sampling in graphical models with discrete variables that satisfy Dobrushin's condition. I am not familiar with this literature, so I'm taking the authors' description of the state of the literature as a given. The authors' results are as follows (let n be the number of variables in the graphical model, let t be the time index, and let tau be the maximum expected read delay in the asynchronous sampler): - Lemma 2. The asynchronous Gibbs sampler can be coupled to a synchronous Gibbs sampler with the same initial state such that the expected Hamming distance between them is bounded by O(tau*log(n)) uniformly in t. Lemma 3 gives an analogous bound for the dth moment of the Hamming distance. If a function f is K-Lipschitz with respect to the dth power of the Hamming distance, the bias of the asynchronous Gibbs sampler for the expectation of f is bounded by log d(n) (plus a constant, times a constant, and for sufficiently large t).
On counterfactual inference with unobserved confounding
Shah, Abhin, Dwivedi, Raaz, Shah, Devavrat, Wornell, Gregory W.
Given an observational study with $n$ independent but heterogeneous units, our goal is to learn the counterfactual distribution for each unit using only one $p$-dimensional sample per unit containing covariates, interventions, and outcomes. Specifically, we allow for unobserved confounding that introduces statistical biases between interventions and outcomes as well as exacerbates the heterogeneity across units. Modeling the conditional distribution of the outcomes as an exponential family, we reduce learning the unit-level counterfactual distributions to learning $n$ exponential family distributions with heterogeneous parameters and only one sample per distribution. We introduce a convex objective that pools all $n$ samples to jointly learn all $n$ parameter vectors, and provide a unit-wise mean squared error bound that scales linearly with the metric entropy of the parameter space. For example, when the parameters are $s$-sparse linear combination of $k$ known vectors, the error is $O(s\log k/p)$. En route, we derive sufficient conditions for compactly supported distributions to satisfy the logarithmic Sobolev inequality. As an application of the framework, our results enable consistent imputation of sparsely missing covariates.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)
Non asymptotic bounds in asynchronous sum-weight gossip protocols
Picard, David, Fellus, Jérôme, Garnier, Stéphane
This paper focuses on non-asymptotic diffusion time in asynchronous gossip protocols. Asynchronous gossip protocols are designed to perform distributed computation in a network of nodes by randomly exchanging messages on the associated graph. To achieve consensus among nodes, a minimal number of messages has to be exchanged. We provides a probabilistic bound to such number for the general case. We provide a explicit formula for fully connected graphs depending only on the number of nodes and an approximation for any graph depending on the spectrum of the graph.
- North America > United States > Texas > Dallas County > Dallas (0.04)
- Europe > France > Île-de-France > Yvelines > Cergy-Pontoise (0.04)
- Europe > France > Île-de-France > Val-d'Oise > Cergy-Pontoise (0.04)
Outlier-Robust Learning of Ising Models Under Dobrushin's Condition
Diakonikolas, Ilias, Kane, Daniel M., Stewart, Alistair, Sun, Yuxin
Probabilistic graphical models [KF09] provide a rich and unifying framework to model structured high-dimensional distributions in terms of the local dependencies between the input variables. The problem of inference in graphical models arises in many applications across scientific disciplines, see, e.g., [WJ08]. In this work, we study the inverse problem of learning graphical models from data. Various formalizations of this general learning problem have been studied during the past five decades, see, e.g., [CL68, Das97, AKN06, WRL06, AHHK12, SW12, LW12, BMS13, BGS14, Bre15, KM17], resulting in general theory and algorithms for various settings. In this work, we focus on learning Ising models [Isi25], the prototypical family of binary undirected graphical models with applications in computer vision, computational biology, and statistical physics [Li09, JEMF06, Fel04, Cha05].
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > New York (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (4 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)
Learning from weakly dependent data under Dobrushin's condition
Dagan, Yuval, Daskalakis, Constantinos, Dikkala, Nishanth, Jayanti, Siddhartha
Statistical learning theory has largely focused on learning and generalization given independent and identically distributed (i.i.d.) samples. Motivated by applications involving time-series data, there has been a growing literature on learning and generalization in settings where data is sampled from an ergodic process. This work has also developed complexity measures, which appropriately extend the notion of Rademacher complexity to bound the generalization error and learning rates of hypothesis classes in this setting. Rather than time-series data, our work is motivated by settings where data is sampled on a network or a spatial domain, and thus do not fit well within the framework of prior work. We provide learning and generalization bounds for data that are complexly dependent, yet their distribution satisfies the standard Dobrushin's condition. Indeed, we show that the standard complexity measures of Gaussian and Rademacher complexities and VC dimension are sufficient measures of complexity for the purposes of bounding the generalization error and learning rates of hypothesis classes in our setting. Moreover, our generalization bounds only degrade by constant factors compared to their i.i.d. analogs, and our learnability bounds degrade by log factors in the size of the training set.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)