AITopics

2502.11483

Genre:

Contests & Prizes (0.67)
Research Report > New Finding (0.54)

Industry:

Leisure & Entertainment > Gambling (0.46)
Education > Educational Setting > Online (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.46)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.40)

arXiv.org Artificial IntelligenceFeb-12-2025

Data-dependent Bounds with $T$-Optimal Best-of-Both-Worlds Guarantees in Multi-Armed Bandits using Stability-Penalty Matching

Nguyen, Quan, Ito, Shinji, Komiyama, Junpei, Mehta, Nishant A.

Existing data-dependent and best-of-both-worlds regret bounds for multi-armed bandits problems have limited adaptivity as they are either data-dependent but not best-of-both-worlds (BOBW), BOBW but not data-dependent or have sub-optimal $O(\sqrt{T\ln{T}})$ worst-case guarantee in the adversarial regime. To overcome these limitations, we propose real-time stability-penalty matching (SPM), a new method for obtaining regret bounds that are simultaneously data-dependent, best-of-both-worlds and $T$-optimal for multi-armed bandits problems. In particular, we show that real-time SPM obtains bounds with worst-case guarantees of order $O(\sqrt{T})$ in the adversarial regime and $O(\ln{T})$ in the stochastic regime while simultaneously being adaptive to data-dependent quantities such as sparsity, variations, and small losses. Our results are obtained by extending the SPM technique for tuning the learning rates in the follow-the-regularized-leader (FTRL) framework, which further indicates that the combination of SPM and FTRL is a promising approach for proving new adaptive bounds in online learning problems.

artificial intelligence, data mining, machine learning, (18 more...)

2502.08143

Country:

North America > United States (0.28)
Europe > United Kingdom > Scotland (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Education (0.54)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceOct-1-2024

Beyond Minimax Rates in Group Distributionally Robust Optimization via a Novel Notion of Sparsity

Nguyen, Quan, Mehta, Nishant A., Guzmán, Cristóbal

The minimax sample complexity of group distributionally robust optimization (GDRO) has been determined up to a $\log(K)$ factor, for $K$ the number of groups. In this work, we venture beyond the minimax perspective via a novel notion of sparsity that we dub $(\lambda, \beta)$-sparsity. In short, this condition means that at any parameter $\theta$, there is a set of at most $\beta$ groups whose risks at $\theta$ all are at least $\lambda$ larger than the risks of the other groups. To find an $\epsilon$-optimal $\theta$, we show via a novel algorithm and analysis that the $\epsilon$-dependent term in the sample complexity can swap a linear dependence on $K$ for a linear dependence on the potentially much smaller $\beta$. This improvement leverages recent progress in sleeping bandits, showing a fundamental connection between the two-player zero-sum game optimization framework for GDRO and per-action regret bounds in sleeping bandits. The aforementioned result assumes having a particular $\lambda$ as input. Perhaps surprisingly, we next show an adaptive algorithm which, up to log factors, gets sample complexity that adapts to the best $(\lambda, \beta)$-sparsity condition that holds. Finally, for a particular input $\lambda$, we also show how to get a dimension-free sample complexity result.

artificial intelligence, machine learning, probability, (17 more...)

2410.0069

Country: North America (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.81)

arXiv.org Machine LearningApr-7-2024

On the price of exact truthfulness in incentive-compatible online learning with bandit feedback: A regret lower bound for WSU-UX

Mortazavi, Ali, Lin, Junhao, Mehta, Nishant A.

In one view of the classical game of prediction with expert advice with binary outcomes, in each round, each expert maintains an adversarially chosen belief and honestly reports this belief. We consider a recently introduced, strategic variant of this problem with selfish (reputation-seeking) experts, where each expert strategically reports in order to maximize their expected future reputation based on their belief. In this work, our goal is to design an algorithm for the selfish experts problem that is incentive-compatible (IC, or \emph{truthful}), meaning each expert's best strategy is to report truthfully, while also ensuring the algorithm enjoys sublinear regret with respect to the expert with the best belief. Freeman et al. (2020) recently studied this problem in the full information and bandit settings and obtained truthful, no-regret algorithms by leveraging prior work on wagering mechanisms. While their results under full information match the minimax rate for the classical ("honest experts") problem, the best-known regret for their bandit algorithm WSU-UX is $O(T^{2/3})$, which does not match the minimax rate for the classical ("honest bandits") setting. It was unclear whether the higher regret was an artifact of their analysis or a limitation of WSU-UX. We show, via explicit construction of loss sequences, that the algorithm suffers a worst-case $\Omega(T^{2/3})$ lower bound. Left open is the possibility that a different IC algorithm obtains $O(\sqrt{T})$ regret. Yet, WSU-UX was a natural choice for such an algorithm owing to the limited design room for IC algorithms in this setting.

artificial intelligence, data mining, machine learning, (20 more...)

2404.05155

Genre: Research Report (0.40)

Industry: Education > Educational Setting > Online (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.50)
Information Technology > Data Science > Data Mining > Big Data (0.48)

arXiv.org Machine LearningMar-2-2024

Near-optimal Per-Action Regret Bounds for Sleeping Bandits

Nguyen, Quan, Mehta, Nishant A.

We derive near-optimal per-action regret bounds for sleeping bandits, in which both the sets of available arms and their losses in every round are chosen by an adversary. In a setting with $K$ total arms and at most $A$ available arms in each round over $T$ rounds, the best known upper bound is $O(K\sqrt{TA\ln{K}})$, obtained indirectly via minimizing internal sleeping regrets. Compared to the minimax $\Omega(\sqrt{TA})$ lower bound, this upper bound contains an extra multiplicative factor of $K\ln{K}$. We address this gap by directly minimizing the per-action regret using generalized versions of EXP3, EXP3-IX and FTRL with Tsallis entropy, thereby obtaining near-optimal bounds of order $O(\sqrt{TA\ln{K}})$ and $O(\sqrt{T\sqrt{AK}})$. We extend our results to the setting of bandits with advice from sleeping experts, generalizing EXP4 along the way. This leads to new proofs for a number of existing adaptive and tracking regret bounds for standard non-sleeping bandits. Extending our results to the bandit version of experts that report their confidences leads to new bounds for the confidence regret that depends primarily on the sum of experts' confidences. We prove a lower bound, showing that for any minimax optimal algorithms, there exists an action whose regret is sublinear in $T$ but linear in the number of its active rounds.

bandit, data mining, machine learning, (19 more...)

2403.01315

Country:

North America > United States (0.14)
Europe > France (0.14)

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.54)
Information Technology > Data Science > Data Mining > Big Data (0.46)

arXiv.org Artificial IntelligenceJun-8-2023

Near-Optimal Algorithms for Private Online Learning in a Stochastic Environment

Hu, Bingshan, Huang, Zhiming, Mehta, Nishant A.

We consider two variants of private stochastic online learning. The first variant is differentially private stochastic bandits. Previously, Sajed and Sheffet (2019) devised the DP Successive Elimination (DP-SE) algorithm that achieves the optimal $ O \biggl(\sum\limits_{1\le j \le K: \Delta_j >0} \frac{ \log T}{ \Delta_j} + \frac{ K\log T}{\epsilon} \biggr)$ problem-dependent regret bound, where $K$ is the number of arms, $\Delta_j$ is the mean reward gap of arm $j$, $T$ is the time horizon, and $\epsilon$ is the required privacy parameter. However, like other elimination style algorithms, it is not an anytime algorithm. Until now, it was not known whether UCB-based algorithms could achieve this optimal regret bound. We present an anytime, UCB-based algorithm that achieves optimality. Our experiments show that the UCB-based algorithm is competitive with DP-SE. The second variant is the full information version of private stochastic online learning. Specifically, for the problem of decision-theoretic online learning with stochastic rewards, we present the first algorithm that achieves an $ O \left( \frac{ \log K}{ \Delta_{\min}} + \frac{\log(K) \min\{\log (\frac{1}{\Delta_{\min}}), \log(T)\}}{\epsilon} \right)$ regret bound, where $\Delta_{\min}$ is the minimum mean reward gap. In addition, we also show an $\Omega \left( \max\left\{ \frac{\log K}{\Delta_{\min}}, \frac{\log K}{\epsilon} \right\} \right)$ problem-dependent lower bound. The key idea behind our good theoretical guarantees in both settings is forgetfulness, i.e., decisions are made based on a certain amount of newly obtained observations instead of all the observations obtained from the very beginning.

algorithm, artificial intelligence, machine learning, (13 more...)

2102.07929

Country: North America > Canada (0.14)

Genre: Research Report (0.40)

Industry:

Education > Educational Setting > Online (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.80)

Technology:

Information Technology > Enterprise Applications > Human Resources > Learning Management (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceMay-6-2023

An improved regret analysis for UCB-N and TS-N

Mehta, Nishant A.

In the setting of stochastic online learning with undirected feedback graphs, Lykouris et al. (2020) previously analyzed the pseudo-regret of the upper confidence bound-based algorithm UCB-N and the Thompson Sampling-based algorithm TS-N. In this note, we show how to improve their pseudo-regret analysis. Our improvement involves refining a key lemma of the previous analysis, allowing a $\log(T)$ factor to be replaced by a factor $\log_2(\alpha) + 3$ for $\alpha$ the independence number of the feedback graph.

artificial intelligence, log 2, machine learning, (17 more...)

2305.04093

Genre: Research Report (0.40)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.64)
Education > Educational Setting > Online (0.36)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.72)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.35)

arXiv.org Artificial IntelligenceJan-10-2023

Adversarial Online Multi-Task Reinforcement Learning

Nguyen, Quan, Mehta, Nishant A.

We consider the adversarial online multi-task reinforcement learning setting, where in each of $K$ episodes the learner is given an unknown task taken from a finite set of $M$ unknown finite-horizon MDP models. The learner's objective is to minimize its regret with respect to the optimal policy for each task. We assume the MDPs in $\mathcal{M}$ are well-separated under a notion of $\lambda$-separability, and show that this notion generalizes many task-separability notions from previous works. We prove a minimax lower bound of $\Omega(K\sqrt{DSAH})$ on the regret of any learning algorithm and an instance-specific lower bound of $\Omega(\frac{K}{\lambda^2})$ in sample complexity for a class of uniformly-good cluster-then-learn algorithms. We use a novel construction called 2-JAO MDP for proving the instance-specific lower bound. The lower bounds are complemented with a polynomial time algorithm that obtains $\tilde{O}(\frac{K}{\lambda^2})$ sample complexity guarantee for the clustering phase and $\tilde{O}(\sqrt{MK})$ regret guarantee for the learning phase, indicating that the dependency on $K$ and $\frac{1}{\lambda^2}$ is tight.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

2301.04268

Country: North America > United States > Virginia (0.14)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.84)

arXiv.org Machine LearningDec-7-2017

Fast Rates for General Unbounded Loss Functions: from ERM to Generalized Bayes

Grünwald, Peter D., Mehta, Nishant A.

We present new excess risk bounds for general unbounded loss functions including log loss and squared loss, where the distribution of the losses may be heavy-tailed. The bounds hold for general estimators, but they are optimized when applied to $\eta$-generalized Bayesian, MDL, and ERM estimators. When applied with log loss, the bounds imply convergence rates for generalized Bayesian inference under misspecification in terms of a generalization of the Hellinger metric as long as the learning rate $\eta$ is set correctly. For general loss functions, our bounds rely on two separate conditions: the $v$-GRIP (generalized reversed information projection) conditions, which control the lower tail of the excess loss; and the newly introduced witness condition, which controls the upper tail. The parameter $v$ in the $v$-GRIP conditions determines the achievable rate and is akin to the exponent in the well-known Tsybakov margin condition and the Bernstein condition for bounded losses, which the $v$-GRIP conditions generalize; favorable $v$ in combination with small model complexity leads to $\tilde{O}(1/n)$ rates. The witness condition allows us to connect the excess risk to an 'annealed' version thereof, by which we generalize several previous results connecting Hellinger and R\'enyi divergence to KL divergence.

artificial intelligence, bayesian inference, condition hold, (17 more...)

1605.00252

Country: North America > United States > New Jersey (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.93)

arXiv.org Machine LearningOct-20-2017

A Tight Excess Risk Bound via a Unified PAC-Bayesian-Rademacher-Shtarkov-MDL Complexity

Grünwald, Peter D., Mehta, Nishant A.

We present a novel notion of complexity that interpolates between and generalizes some classic existing complexity notions in learning theory: for estimators like empirical risk minimization (ERM) with arbitrary bounded losses, it is upper bounded in terms of data-independent Rademacher complexity; for generalized Bayesian estimators, it is upper bounded by the data-dependent information complexity (also known as stochastic or PAC-Bayesian, $\mathrm{KL}(\text{posterior} \operatorname{\|} \text{prior})$ complexity. For (penalized) ERM, the new complexity reduces to (generalized) normalized maximum likelihood (NML) complexity, i.e. a minimax log-loss individual-sequence regret. Our first main result bounds excess risk in terms of the new complexity. Our second main result links the new complexity via Rademacher complexity to $L_2(P)$ entropy, thereby generalizing earlier results of Opper, Haussler, Lugosi, and Cesa-Bianchi who did the log-loss case with $L_\infty$. Together, these results recover optimal bounds for VC- and large (polynomial entropy) classes, replacing localized Rademacher complexity by a simpler analysis which almost completely separates the two aspects that determine the achievable rates: 'easiness' (Bernstein) conditions and model complexity.

artificial intelligence, bayesian inference, complexity, (17 more...)

1710.07732

Country:

North America > United States (0.28)
Europe > United Kingdom > England (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)