suboptimality gap
Country:
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
- North America > United States > California (0.04)
Genre:
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Technology:
Country:
- North America > United States > Ohio (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Europe > Austria (0.04)
- (5 more...)
Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)
Achieving Constant Regret in Linear Markov Decision Processes
We study the constant regret guarantees in reinforcement learning (RL). Our objective is to design an algorithm that incurs only finite regret over infinite episodes with high probability. We introduce an algorithm, Cert-LSVI-UCB, for misspec-ified linear Markov decision processes (MDPs) where both the transition kernel and the reward function can be approximated by some linear function up to mis-specification level ζ . At the core of Cert-LSVI-UCB is an innovative certified estimator, which facilitates a fine-grained concentration analysis for multi-phase value-targeted regression, enabling us to establish an instance-dependent regret bound that is constant w.r.t. the number of episodes.
Country:
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Country:
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Texas > Brazos County > College Station (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Industry: Government > Regional Government > North America Government > United States Government (0.93)
Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Country:
- Asia > Singapore (0.04)
- North America > Canada (0.04)
Technology:
Country:
- North America > United States > Virginia (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- (2 more...)
Genre:
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
Technology:
Country:
- North America > United States (0.15)
- Asia > Singapore (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.95)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
- (2 more...)
Technology:
Country:
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Texas (0.04)
- (4 more...)
Technology:
Country:
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- North America > United States > California > San Mateo County > San Mateo (0.04)
- (3 more...)
Technology: