AITopics

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada > Quebec > Montreal (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

David P. Woodruff, Carnegie Mellon University, dwoodruf@cs.cmu.edu "3026 Fred Zhang, UC Berkeley, z0@berkeley.edu, "3026 Qiuyi (Richard) Zhang, Google Brain, qiuyiz@google.com

Optimal Query Complexities for Dynamic Trace Estimation

Neural Information Processing SystemsFeb-12-2026, 11:18:01 GMT

Inther valued, forsuf"andanyp2[1,2], p log ( 1/ )/" p number isnecessary"kAkp errorwith1 .

artificial intelligence, hutchinson, theorem 5, (12 more...)

Country:

North America > United States > Texas > Brazos County > College Station (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Technology: Information Technology > Artificial Intelligence (0.48)

Neural Information Processing SystemsFeb-11-2026, 03:03:22 GMT

Appendices

Let N(µ,σ2) denote a Gaussian distribution with meanµ and variance σ2. Let χ2(n) denote a χ2 distribution withn degrees of freedom. Our analysis extensively uses the following facts about Gaussian and χ2 distributions: Definition A.1 (Gaussian and Wigner Random Matrices). We let G N(n) denote an n n randomGaussianmatrixwith i.i.d. We let W W(n)=G+GT denotean n n Wigner matrix, where G N(n). Fact A.1 (χ2 TailBound(Lemma 1of[1])).

artificial intelligence, probability, probability 1, (16 more...)

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > New York > New York County > New York City (0.04)

Technology: Information Technology > Artificial Intelligence (0.70)

Neural Information Processing SystemsFeb-11-2026, 03:03:18 GMT

c77bfda61a0204d445185053e6a9a8fe-Paper.pdf

Recently,the Hutch++ algorithm was proposed, which reduces the number of matrix-vector queries fromO(1/2) to the optimalO(1/), and the algorithm succeeds with constant probability.

algorithm, artificial intelligence, machine learning, (18 more...)

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Texas > Brazos County > College Station (0.04)
(3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.47)

Neural Information Processing SystemsFeb-8-2026, 21:58:17 GMT

5d69dc892ba6e79fda0c6a1e286f24c5-Supplemental.pdf

Consider any predictor cM( |i) (as a function of the sample pathX) for theith row ofM, i = 1,2,3. In Section 6.2.2, we make the steps in(29) precise and bound the Bayes risk from below by an appropriate mutual information. In Section 6.2.3, we choose a prior distribution on the transition probabilities and prove a lower bound on the resulting mutual information, thereby completing the proof ofTheorem 1,with the added bonus that the construction isrestricted toirreducible and reversiblechains. Let (X1,...,Xn) be the trajectory of a stationary Markov chain with transition matrixM. We first relate the Bayes estimator ofM and T (given the X and Y chain respectively).

artificial intelligence, loglog, xt 1, (16 more...)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.48)

Neural Information Processing SystemsFeb-8-2026, 12:55:34 GMT

4a5876b450b45371f6cfe5047ac8cd45-Supplemental.pdf

In the following equation, we use the results inAppendix D.1 tocalculate the probability that there exists some arm whose mean value isaboveitsconfidence intervalofwidth

agent, artificial intelligence, njt, (17 more...)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.69)

Shidani, Amitis, Vakili, Sattar

Optimal Regret Bounds for Collaborative Learning in Bandits

arXiv.org Machine LearningDec-15-2023

We consider regret minimization in a general collaborative multi-agent multi-armed bandit model, in which each agent faces a finite set of arms and may communicate with other agents through a central controller. The optimal arm for each agent in this model is the arm with the largest expected mixed reward, where the mixed reward of each arm is a weighted average of its rewards across all agents, making communication among agents crucial. While near-optimal sample complexities for best arm identification are known under this collaborative model, the question of optimal regret remains open. In this work, we address this problem and propose the first algorithm with order optimal regret bounds under this collaborative bandit model. Furthermore, we show that only a small constant number of expected communication rounds is needed.

algorithm, artificial intelligence, machine learning, (13 more...)

2312.09674

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Machine LearningFeb-13-2022

Sample-Efficient Reinforcement Learning with loglog(T) Switching Cost

Qiao, Dan, Yin, Ming, Min, Ming, Wang, Yu-Xiang

We study the problem of reinforcement learning (RL) with low (policy) switching cost - a problem well-motivated by real-life RL applications in which deployments of new policies are costly and the number of policy updates must be low. In this paper, we propose a new algorithm based on stage-wise exploration and adaptive policy elimination that achieves a regret of $\widetilde{O}(\sqrt{H^4S^2AT})$ while requiring a switching cost of $O(HSA \log\log T)$. This is an exponential improvement over the best-known switching cost $O(H^2SA\log T)$ among existing methods with $\widetilde{O}(\mathrm{poly}(H,S,A)\sqrt{T})$ regret. In the above, $S,A$ denotes the number of states and actions in an $H$-horizon episodic Markov Decision Process model with unknown transitions, and $T$ is the number of steps. We also prove an information-theoretical lower bound which says that a switching cost of $\Omega(HSA)$ is required for any no-regret algorithm. As a byproduct, our new algorithmic techniques allow us to derive a \emph{reward-free} exploration algorithm with an optimal switching cost of $O(HSA)$.

artificial intelligence, machine learning, sample-efficient reinforcement learning, (2 more...)

2202.06385

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.60)

arXiv.org Machine LearningOct-27-2021

(Almost) Free Incentivized Exploration from Decentralized Learning Agents

Shi, Chengshuai, Xu, Haifeng, Xiong, Wei, Shen, Cong

Incentivized exploration in multi-armed bandits (MAB) has witnessed increasing interests and many progresses in recent years, where a principal offers bonuses to agents to do explorations on her behalf. However, almost all existing studies are confined to temporary myopic agents. In this work, we break this barrier and study incentivized exploration with multiple and long-term strategic agents, who have more complicated behaviors that often appear in real-world applications. An important observation of this work is that strategic agents' intrinsic needs of learning benefit (instead of harming) the principal's explorations by providing "free pulls". Moreover, it turns out that increasing the population of agents significantly lowers the principal's burden of incentivizing. The key and somewhat surprising insight revealed from our results is that when there are sufficiently many learning agents involved, the exploration process of the principal can be (almost) free. Our main results are built upon three novel components which may be of independent interest: (1) a simple yet provably effective incentive-provision strategy; (2) a carefully crafted best arm identification algorithm for rewards aggregated under unequal confidences; (3) a high-probability finite-time lower bound of UCB algorithms. Experimental results are provided to complement the theoretical analysis.

agent, exploration, incentive, (15 more...)

2110.14628

Country:

North America > United States > Virginia (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Telecommunications (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.67)

Kirschner, Johannes, Lattimore, Tor, Vernade, Claire, Szepesvári, Csaba

Asymptotically Optimal Information-Directed Sampling

arXiv.org Machine LearningNov-11-2020

We introduce a computationally efficient algorithm for finite stochastic linear bandits. The approach is based on the frequentist information-directed sampling (IDS) framework, with an information gain potential that is derived directly from the asymptotic regret lower bound. We establish frequentist regret bounds, which show that the proposed algorithm is both asymptotically optimal and worst-case rate optimal in finite time. Our analysis sheds light on how IDS trades off regret and information to incrementally solve the semi-infinite concave program that defines the optimal asymptotic regret. Along the way, we uncover interesting connections towards a recently proposed two-player game approach and the Bayesian IDS algorithm.

algorithm, information gain, loglog, (16 more...)

2011.05944

Country:

North America > Canada > Alberta (0.14)
Europe > Switzerland > Zürich > Zürich (0.04)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)