AITopics | k-learning

Collaborating Authors

k-learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Variational Bayesian Reinforcement Learning with Regret Bounds

Neural Information Processing SystemsDec-25-2025, 04:40:33 GMT

We consider the exploration-exploitation trade-off in reinforcement learning and show that an agent endowed with an exponential epistemic-risk-seeking utility function explores efficiently, as measured by regret. The state-action values induced by the exponential utility satisfy a Bellman recursion, so we can use dynamic programming to compute them. We call the resulting algorithm K-learning (for knowledge) and the risk-seeking utility ensures that the associated state-action values (K-values) are optimistic for the expected optimal Q-values under the posterior. The exponential utility function induces a Boltzmann exploration policy for which the'temperature' parameter is equal to the risk-seeking parameter and is carefully controlled to yield a Bayes regret bound of $\tilde O(L^{3/2} \sqrt{S A T})$, where $L$ is the time horizon, $S$ is the number of states, $A$ is the number of actions, and $T$ is the total number of elapsed timesteps. We conclude with a numerical example demonstrating that K-learning is competitive with other state-of-the-art algorithms in practice.

name change, regret bound, variational bayesian reinforcement learning, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.31)

Add feedback

Variational Bayesian Reinforcement Learning with Regret Bounds

Neural Information Processing SystemsAug-18-2025, 15:15:17 GMT

In reinforcement learning the Q-values summarize the expected future rewards that the agent will attain.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Variational Bayesian Reinforcement Learning with Regret Bounds

Neural Information Processing SystemsJan-19-2025, 12:11:59 GMT

We consider the exploration-exploitation trade-off in reinforcement learning and show that an agent endowed with an exponential epistemic-risk-seeking utility function explores efficiently, as measured by regret. The state-action values induced by the exponential utility satisfy a Bellman recursion, so we can use dynamic programming to compute them. We call the resulting algorithm K-learning (for knowledge) and the risk-seeking utility ensures that the associated state-action values (K-values) are optimistic for the expected optimal Q-values under the posterior. The exponential utility function induces a Boltzmann exploration policy for which the'temperature' parameter is equal to the risk-seeking parameter and is carefully controlled to yield a Bayes regret bound of \tilde O(L {3/2} \sqrt{S A T}), where L is the time horizon, S is the number of states, A is the number of actions, and T is the total number of elapsed timesteps. We conclude with a numerical example demonstrating that K-learning is competitive with other state-of-the-art algorithms in practice.

regret bound, state-action value, variational bayesian reinforcement learning, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.40)

Add feedback

Stochastic matrix games with bandit feedback

O'Donoghue, Brendan, Lattimore, Tor, Osband, Ian

arXiv.org Machine LearningJun-9-2020

We study a version of the classical zero-sum matrix game with unknown payoff matrix and bandit feedback, where the players only observe each others actions and a noisy payoff. This generalizes the usual matrix game, where the payoff matrix is known to the players. Despite numerous applications, this problem has received relatively little attention. Although adversarial bandit algorithms achieve low regret, they do not exploit the matrix structure and perform poorly relative to the new algorithms. The main contributions are regret analyses of variants of UCB and K-learning that hold for any opponent, e.g., even when the opponent adversarially plays the best response to the learner's mixed strategy. Along the way, we show that Thompson fails catastrophically in this setting and provide empirical comparison to existing algorithms.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

2006.05145

Country: