AITopics | Christoph Dann

On Oracle-Efficient PAC RL with Rich Observations

Christoph Dann, Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford, Robert E. Schapire

Neural Information Processing SystemsMay-26-2025, 06:47:07 GMT

We study the computational tractability of PAC reinforcement learning with rich observations. We present new provably sample-efficient algorithms for environments with deterministic hidden state dynamics and stochastic rich observations. These methods operate in an oracle model of computation--accessing policy and value function classes exclusively through standard optimization primitives--and therefore represent computationally efficient alternatives to prior algorithms that require enumeration.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > New York (0.15)
North America > United States > Illinois (0.14)
North America > United States > Pennsylvania (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.68)

Add feedback

Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning

Christoph Dann, Tor Lattimore, Emma Brunskill

Neural Information Processing SystemsOct-7-2024, 14:25:47 GMT

Statistical performance bounds for reinforcement learning (RL) algorithms can be critical for high-stakes applications like healthcare. This paper introduces a new framework for theoretically measuring the performance of such algorithms called Uniform-PAC, which is a strengthening of the classical Probably Approximately Correct (PAC) framework. In contrast to the PAC framework, the uniform version may be used to derive high probability regret guarantees and so forms a bridge between the two setups that has been missing in the literature. We demonstrate the benefits of the new framework for finite-state episodic MDPs with a new algorithm that is Uniform-PAC and simultaneously achieves optimal regret and PAC guarantees except for a factor of the horizon.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.14)
Europe > United Kingdom > England (0.14)

Industry: Health & Medicine (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

Add feedback

Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning

Christoph Dann, Tor Lattimore, Emma Brunskill

Neural Information Processing SystemsOct-2-2024, 20:18:46 GMT

Statistical performance bounds for reinforcement learning (RL) algorithms can be critical for high-stakes applications like healthcare. This paper introduces a new framework for theoretically measuring the performance of such algorithms called Uniform-PAC, which is a strengthening of the classical Probably Approximately Correct (PAC) framework. In contrast to the PAC framework, the uniform version may be used to derive high probability regret guarantees and so forms a bridge between the two setups that has been missing in the literature. We demonstrate the benefits of the new framework for finite-state episodic MDPs with a new algorithm that is Uniform-PAC and simultaneously achieves optimal regret and PAC guarantees except for a factor of the horizon.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.47)

Industry: Health & Medicine (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

Add feedback

On Oracle-Efficient PAC RL with Rich Observations

Christoph Dann, Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford, Robert E. Schapire

Neural Information Processing SystemsOct-2-2024, 09:35:16 GMT

We study the computational tractability of PAC reinforcement learning with rich observations. We present new provably sample-efficient algorithms for environments with deterministic hidden state dynamics and stochastic rich observations. These methods operate in an oracle model of computation--accessing policy and value function classes exclusively through standard optimization primitives--and therefore represent computationally efficient alternatives to prior algorithms that require enumeration.

Add feedback

Filters

Christoph Dann

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

On Oracle-Efficient PAC RL with Rich Observations

Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning

Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning

On Oracle-Efficient PAC RL with Rich Observations