probability at-least 1
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- South America > Paraguay > Asunción > Asunción (0.04)
- North America > United States > California > Santa Clara County > Santa Clara (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Workflow (0.46)
- Research Report (0.45)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
- Information Technology > Data Science > Data Mining (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Workflow (0.46)
- Research Report (0.45)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
- Information Technology > Data Science > Data Mining (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- South America > Paraguay > Asunción > Asunción (0.04)
- Oceania > Australia > New South Wales (0.04)
- (2 more...)
Beyond Universal Approximation Theorems: Algorithmic Uniform Approximation by Neural Networks Trained with Noisy Data
Kratsios, Anastasis, Cheng, Tin Sum, Roy, Daniel
At its core, machine learning seeks to train models that reliably generalize beyond noisy observations; however, the theoretical vacuum in which state-of-the-art universal approximation theorems (UATs) operate isolates them from this goal, as they assume noiseless data and allow network parameters to be chosen freely, independent of algorithmic realism. This paper bridges that gap by introducing an architecture-specific randomized training algorithm that constructs a uniform approximator from $N$ noisy training samples on the $d$-dimensional cube $[0,1]^d$. Our trained neural networks attain the minimax-optimal quantity of \textit{trainable} (non-random) parameters, subject to logarithmic factors which vanish under the idealized noiseless sampling assumed in classical UATs. Additionally, our trained models replicate key behaviours of real-world neural networks, absent in standard UAT constructions, by: (1) exhibiting sub-linear parametric complexity when fine-tuning on structurally related and favourable out-of-distribution tasks, (2) exactly interpolating the training data, and (3) maintaining reasonable Lipschitz regularity (after the initial clustering attention layer). These properties bring state-of-the-art UATs closer to practical machine learning, shifting the central open question from algorithmic implementability with noisy samples to whether stochastic gradient descent can achieve comparable guarantees.
- North America > United States (0.14)
- North America > Canada > Ontario > Toronto (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
Adaptive Estimation and Learning under Temporal Distribution Shift
Baby, Dheeraj, Tang, Yifei, Nguyen, Hieu Duy, Wang, Yu-Xiang, Pyati, Rohit
In this paper, we study the problem of estimation and learning under temporal distribution shift. Consider an observation sequence of length $n$, which is a noisy realization of a time-varying groundtruth sequence. Our focus is to develop methods to estimate the groundtruth at the final time-step while providing sharp point-wise estimation error rates. We show that, without prior knowledge on the level of temporal shift, a wavelet soft-thresholding estimator provides an optimal estimation error bound for the groundtruth. Our proposed estimation method generalizes existing researches Mazzetto and Upfal (2023) by establishing a connection between the sequence's non-stationarity level and the sparsity in the wavelet-transformed domain. Our theoretical findings are validated by numerical experiments. Additionally, we applied the estimator to derive sparsity-aware excess risk bounds for binary classification under distribution shift and to develop computationally efficient training objectives. As a final contribution, we draw parallels between our results and the classical signal processing problem of total-variation denoising (Mammen and van de Geer,1997; Tibshirani, 2014), uncovering novel optimal algorithms for such task.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Middle East > UAE > Dubai Emirate > Dubai (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > Canada (0.04)
Online Matrix Completion: A Collaborative Approach with Hott Items
Baby, Dheeraj, Pal, Soumyabrata
We investigate the low rank matrix completion problem in an online setting with ${M}$ users, ${N}$ items, ${T}$ rounds, and an unknown rank-$r$ reward matrix ${R}\in \mathbb{R}^{{M}\times {N}}$. This problem has been well-studied in the literature and has several applications in practice. In each round, we recommend ${S}$ carefully chosen distinct items to every user and observe noisy rewards. In the regime where ${M},{N} >> {T}$, we propose two distinct computationally efficient algorithms for recommending items to users and analyze them under the benign \emph{hott items} assumption.1) First, for ${S}=1$, under additional incoherence/smoothness assumptions on ${R}$, we propose the phased algorithm \textsc{PhasedClusterElim}. Our algorithm obtains a near-optimal per-user regret of $\tilde{O}({N}{M}^{-1}(\Delta^{-1}+\Delta_{{hott}}^{-2}))$ where $\Delta_{{hott}},\Delta$ are problem-dependent gap parameters with $\Delta_{{hott}} >> \Delta$ almost always. 2) Second, we consider a simplified setting with ${S}=r$ where we make significantly milder assumptions on ${R}$. Here, we introduce another phased algorithm, \textsc{DeterminantElim}, to derive a regret guarantee of $\widetilde{O}({N}{M}^{-1/r}\Delta_{det}^{-1}))$ where $\Delta_{{det}}$ is another problem-dependent gap. Both algorithms crucially use collaboration among users to jointly eliminate sub-optimal items for groups of users successively in phases, but with distinctive and novel approaches.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > California > Orange County > Irvine (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
Near Optimal Heteroscedastic Regression with Symbiotic Learning
Baby, Dheeraj, Das, Aniket, Nagaraj, Dheeraj, Netrapalli, Praneeth
We consider the problem of heteroscedastic linear regression, where, given $n$ samples $(\mathbf{x}_i, y_i)$ from $y_i = \langle \mathbf{w}^{*}, \mathbf{x}_i \rangle + \epsilon_i \cdot \langle \mathbf{f}^{*}, \mathbf{x}_i \rangle$ with $\mathbf{x}_i \sim N(0,\mathbf{I})$, $\epsilon_i \sim N(0,1)$, we aim to estimate $\mathbf{w}^{*}$. Beyond classical applications of such models in statistics, econometrics, time series analysis etc., it is also particularly relevant in machine learning when data is collected from multiple sources of varying but apriori unknown quality. Our work shows that we can estimate $\mathbf{w}^{*}$ in squared norm up to an error of $\tilde{O}\left(\|\mathbf{f}^{*}\|^2 \cdot \left(\frac{1}{n} + \left(\frac{d}{n}\right)^2\right)\right)$ and prove a matching lower bound (upto log factors). This represents a substantial improvement upon the previous best known upper bound of $\tilde{O}\left(\|\mathbf{f}^{*}\|^2\cdot \frac{d}{n}\right)$. Our algorithm is an alternating minimization procedure with two key subroutines 1. An adaptation of the classical weighted least squares heuristic to estimate $\mathbf{w}^{*}$, for which we provide the first non-asymptotic guarantee. 2. A nonconvex pseudogradient descent procedure for estimating $\mathbf{f}^{*}$ inspired by phase retrieval. As corollaries, we obtain fast non-asymptotic rates for two important problems, linear regression with multiplicative noise and phase retrieval with multiplicative noise, both of which are of independent interest. Beyond this, the proof of our lower bound, which involves a novel adaptation of LeCam's method for handling infinite mutual information quantities (thereby preventing a direct application of standard techniques like Fano's method), could also be of broader interest for establishing lower bounds for other heteroscedastic or heavy-tailed statistical problems.
- North America > United States > New York > New York County > New York City (0.14)
- Asia > India > Karnataka > Bengaluru (0.04)
- North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Online Label Shift: Optimal Dynamic Regret meets Practical Algorithms
Baby, Dheeraj, Garg, Saurabh, Yen, Tzu-Ching, Balakrishnan, Sivaraman, Lipton, Zachary Chase, Wang, Yu-Xiang
This paper focuses on supervised and unsupervised online label shift, where the class marginals $Q(y)$ varies but the class-conditionals $Q(x|y)$ remain invariant. In the unsupervised setting, our goal is to adapt a learner, trained on some offline labeled data, to changing label distributions given unlabeled online data. In the supervised setting, we must both learn a classifier and adapt to the dynamically evolving class marginals given only labeled online data. We develop novel algorithms that reduce the adaptation problem to online regression and guarantee optimal dynamic regret without any prior knowledge of the extent of drift in the label distribution. Our solution is based on bootstrapping the estimates of \emph{online regression oracles} that track the drifting proportions. Experiments across numerous simulated and real-world online label shift scenarios demonstrate the superior performance of our proposed approaches, often achieving 1-3\% improvement in accuracy while being sample and computationally efficient. Code is publicly available at https://github.com/acmi-lab/OnlineLabelShift.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > New York > New York County > New York City (0.04)
Multi-User Reinforcement Learning with Low Rank Rewards
Agarwal, Naman, Jain, Prateek, Kowshik, Suhas, Nagaraj, Dheeraj, Netrapalli, Praneeth
In this work, we consider the problem of collaborative multi-user reinforcement learning. In this setting there are multiple users with the same state-action space and transition probabilities but with different rewards. Under the assumption that the reward matrix of the $N$ users has a low-rank structure -- a standard and practically successful assumption in the offline collaborative filtering setting -- the question is can we design algorithms with significantly lower sample complexity compared to the ones that learn the MDP individually for each user. Our main contribution is an algorithm which explores rewards collaboratively with $N$ user-specific MDPs and can learn rewards efficiently in two key settings: tabular MDPs and linear MDPs. When $N$ is large and the rank is constant, the sample complexity per MDP depends logarithmically over the size of the state-space, which represents an exponential reduction (in the state-space size) when compared to the standard ``non-collaborative'' algorithms.
- Asia > Middle East > Jordan (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.66)