logk
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States (0.14)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
874f5e53d7ce44f65fbf27a7b9406983-Supplemental-Conference.pdf
Ensemble sampling serves as apractical approximation to Thompson sampling when maintaining anexact posterior distribution overmodel parameters iscomputationally intractable. In this paper, we establish a regret bound that ensures desirable behavior when ensemble sampling isapplied tothe linear bandit problem.
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.48)
7f9220f90cc85b0da693643add6618e6-Supplemental-Conference.pdf
The hope is that these predictions allow the algorithm to circumvent worst case lower bounds when the predictions are good, and approximately match them otherwise. The precise definitions and guarantees vary with different settings, but there have been significant successes in applying this framework for many different algorithmic problems, ranging from general online problems to classical graph algorithms (see Section 1.2 for a more detailed discussion of related work, and [35] for a survey). In all of these settings it turns out to be possible to define a "prediction" where the "quality" of the algorithm (competitive ratio, running time, etc.) depends the "error" of the prediction.
- Europe > Sweden > Stockholm > Stockholm (0.04)
- North America > United States > Virginia > Alexandria County > Alexandria (0.04)
- North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
- (6 more...)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- South America > Brazil (0.04)
- (7 more...)
Rn(a) with Rn(a): =nX
In particular, Bai et al. [5], Jin et al. [31] developed the first algorithms to beat the curse of multiple agents in twoplayer zero-sum MGs, while Jin et al. [31], Daskalakis et al. [23], Mao and Ba sar [44], Song et al. [63] further demonstrated how to accomplish the same goal when learning other computationally tractable solution concepts (e.g., coarse correlated equilibria) in general-sum multi-player Markov games. We shall also briefly remark on the prior works that concern RL with a generative model. A key term in the regret bound (36) is a weighted sum of the "variance-style" quantities {Varπk(`k)}. While Var(`k) k`kk2 is orderwise tight in the worst-case scenario for a given iteration k, exploiting the problem-specific variance-type structure across time is crucial in sharpening the horizon dependence in many RL problems(e.g.,Azaretal.[3],Jinetal.[30],Lietal.[41,40]). C.1 Preliminariesandnotation Let us start with some preliminary facts and notation.