AITopics | psrl algorithm

Scalar Posterior Sampling with Applications

Neural Information Processing SystemsMar-17-2026, 00:35:52 GMT

We propose a practical non-episodic PSRL algorithm that unlike recent state-of-the-art PSRL algorithms uses a deterministic, model-independent episode switching schedule. Our algorithm termed deterministic schedule PSRL (DS-PSRL) is efficient in terms of time, sample, and space complexity. We prove a Bayesian regret bound under mild assumptions. Our result is more generally applicable to multiple parameters and continuous state action problems. We compare our algorithm with state-of-the-art PSRL algorithms on standard discrete and continuous problems from the literature. Finally, we show how the assumptions of our algorithm satisfy a sensible parameterization for a large class of problems in sequential recommendations.

artificial intelligence, machine learning, proceedings, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.42)

Add feedback

Scalar Posterior Sampling with Applications

Neural Information Processing SystemsNov-20-2025, 22:56:52 GMT

We propose a practical non-episodic PSRL algorithm that unlike recent state-of-the-art PSRL algorithms uses a deterministic, model-independent episode switching schedule. Our algorithm termed deterministic schedule PSRL (DS-PSRL) is efficient in terms of time, sample, and space complexity. We prove a Bayesian regret bound under mild assumptions. Our result is more generally applicable to multiple parameters and continuous state action problems. We compare our algorithm with state-of-the-art PSRL algorithms on standard discrete and continuous problems from the literature. Finally, we show how the assumptions of our algorithm satisfy a sensible parameterization for a large class of problems in sequential recommendations.

algorithm, name change, scalar posterior sampling, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.42)

Add feedback

Scalar Posterior Sampling with Applications

Georgios Theocharous, Zheng Wen, Yasin Abbasi Yadkori, Nikos Vlassis

Neural Information Processing SystemsNov-20-2025, 19:46:42 GMT

Our algorithm termed deterministic schedule PSRL (DS-PSRL) is efficient in terms of time, sample, and space complexity. We prove a Bayesian regret bound under mild assumptions. Our result is more generally applicable to multiple parameters and continuous state action problems. We compare our algorithm with state-of-the-art PSRL algorithms on standard discrete and continuous problems from the literature.

algorithm, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

Regret Analysis of the Posterior Sampling-based Learning Algorithm for Episodic POMDPs

Tang, Dengwang, Jain, Rahul, Nayyar, Ashutosh, Nuzzo, Pierluigi

arXiv.org Machine LearningOct-16-2023

Compared to Markov Decision Processes (MDPs), learning in Partially Observable Markov Decision Processes (POMDPs) can be significantly harder due to the difficulty of interpreting observations. In this paper, we consider episodic learning problems in POMDPs with unknown transition and observation models. We consider the Posterior Sampling-based Reinforcement Learning (PSRL) algorithm for POMDPs and show that its Bayesian regret scales as the square root of the number of episodes. In general, the regret scales exponentially with the horizon length $H$, and we show that this is inevitable by providing a lower bound. However, under the condition that the POMDP is undercomplete and weakly revealing, we establish a polynomial Bayesian regret bound that improves the regret bound by a factor of $\Omega(H^2\sqrt{SA})$ over the recent result by arXiv:2204.08967.

artificial intelligence, machine learning, pomdp, (16 more...)

arXiv.org Machine Learning

2310.10107

Country:

North America > United States > California (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Industry: Education (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Safe Posterior Sampling for Constrained MDPs with Bounded Constraint Violation

Kalagarla, Krishna C, Jain, Rahul, Nuzzo, Pierluigi

arXiv.org Artificial IntelligenceJan-27-2023

Constrained Markov decision processes (CMDPs) model scenarios of sequential decision making with multiple objectives that are increasingly important in many applications. However, the model is often unknown and must be learned online while still ensuring the constraint is met, or at least the violation is bounded with time. Some recent papers have made progress on this very challenging problem but either need unsatisfactory assumptions such as knowledge of a safe policy, or have high cumulative regret. We propose the Safe PSRL (posterior sampling-based RL) algorithm that does not need such assumptions and yet performs very well, both in terms of theoretical regret bounds as well as empirically. The algorithm achieves an efficient tradeoff between exploration and exploitation by use of the posterior sampling principle, and provably suffers only bounded constraint violation by leveraging the idea of pessimism. Our approach is based on a primal-dual approach. We establish a sub-linear $\tilde{\mathcal{ O}}\left(H^{2.5} \sqrt{|\mathcal{S}|^2 |\mathcal{A}| K} \right)$ upper bound on the Bayesian reward objective regret along with a bounded, i.e., $\tilde{\mathcal{O}}\left(1\right)$ constraint violation regret over $K$ episodes for an $|\mathcal{S}|$-state, $|\mathcal{A}|$-action and horizon $H$ CMDP.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2301.11547

Country: North America > United States > California (0.46)

Genre: Research Report (0.40)

Industry: Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)

Add feedback

Scalar Posterior Sampling with Applications

Theocharous, Georgios, Wen, Zheng, Yadkori, Yasin Abbasi, Vlassis, Nikos

Neural Information Processing SystemsFeb-14-2020, 19:56:47 GMT

We propose a practical non-episodic PSRL algorithm that unlike recent state-of-the-art PSRL algorithms uses a deterministic, model-independent episode switching schedule. Our algorithm termed deterministic schedule PSRL (DS-PSRL) is efficient in terms of time, sample, and space complexity. We prove a Bayesian regret bound under mild assumptions. Our result is more generally applicable to multiple parameters and continuous state action problems. We compare our algorithm with state-of-the-art PSRL algorithms on standard discrete and continuous problems from the literature.

algorithm, application, scalar posterior sampling, (2 more...)

Neural Information Processing Systems

Genre: Research Report (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.57)

Add feedback

Scalar Posterior Sampling with Applications

Theocharous, Georgios, Wen, Zheng, Abbasi, Yasin, Vlassis, Nikos

Neural Information Processing SystemsDec-31-2018

We propose a practical non-episodic PSRL algorithm that unlike recent state-of-the-art PSRL algorithms uses a deterministic, model-independent episode switching schedule. Our algorithm termed deterministic schedule PSRL (DS-PSRL) is efficient in terms of time, sample, and space complexity. We prove a Bayesian regret bound under mild assumptions. Our result is more generally applicable to multiple parameters and continuous state action problems. We compare our algorithm with state-of-the-art PSRL algorithms on standard discrete and continuous problems from the literature. Finally, we show how the assumptions of our algorithm satisfy a sensible parameterization for a large class of problems in sequential recommendations.

algorithm, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

Scalar Posterior Sampling with Applications

Theocharous, Georgios, Wen, Zheng, Abbasi, Yasin, Vlassis, Nikos

Neural Information Processing SystemsDec-31-2018

We propose a practical non-episodic PSRL algorithm that unlike recent state-of-the-art PSRL algorithms uses a deterministic, model-independent episode switching schedule. Our algorithm termed deterministic schedule PSRL (DS-PSRL) is efficient in terms of time, sample, and space complexity. We prove a Bayesian regret bound under mild assumptions. Our result is more generally applicable to multiple parameters and continuous state action problems. We compare our algorithm with state-of-the-art PSRL algorithms on standard discrete and continuous problems from the literature. Finally, we show how the assumptions of our algorithm satisfy a sensible parameterization for a large class of problems in sequential recommendations.

algorithm, assumption, psrl algorithm, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

Posterior Sampling for Large Scale Reinforcement Learning

Theocharous, Georgios, Wen, Zheng, Abbasi-Yadkori, Yasin, Vlassis, Nikos

arXiv.org Artificial IntelligenceOct-22-2018

We propose a practical non-episodic PSRL algorithm that unlike recent state-of-the-art PSRL algorithms uses a deterministic, model-independent episode switching schedule. Our algorithm termed deterministic schedule PSRL (DS-PSRL) is efficient in terms of time, sample, and space complexity. We prove a Bayesian regret bound under mild assumptions. Our result is more generally applicable to multiple parameters and continuous state action problems. We compare our algorithm with state-of-the-art PSRL algorithms on standard discrete and continuous problems from the literature. Finally, we show how the assumptions of our algorithm satisfy a sensible parametrization for a large class of problems in sequential recommendations.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

1711.07979

Country: North America (0.46)

Genre: Research Report > New Finding (0.48)

Technology: