AITopics | master algorithm

We study bandit model selection in stochastic environments. Our approach relies on a master algorithm that selects between candidate base algorithms. We develop a master-base algorithm abstraction that can work with general classes of base algorithms and different type of adversarial master algorithms. Our methods rely on a novel and generic smoothing transformation for bandit algorithms that permits us to obtain optimal $O(\sqrt{T})$ model selection guarantees for stochastic contextual bandit problems as long as the optimal base algorithm satisfies a high probability regret guarantee. We show through a lower bound that even when one of the base algorithms has $O(\log T)$ regret, in general it is impossible to get better than $\Omega(\sqrt{T})$ regret in model selection, even asymptotically.

algorithm, contextual stochastic bandit problem, model selection, (9 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.63)

Add feedback

Playing the Player: A Heuristic Framework for Adaptive Poker AI

Paterson, Andrew, Sanders, Carl

arXiv.org Artificial IntelligenceDec-5-2025

For years, the discourse around poker AI has been dominated by the concept of solvers and the pursuit of unexploitable, machine-perfect play. This paper challenges that orthodoxy. It presents Patrick, an AI built on the contrary philosophy: that the path to victory lies not in being unexploitable, but in being maximally exploitative. Patrick's architecture is a purpose-built engine for understanding and attacking the flawed, psychological, and often irrational nature of human opponents. Through detailed analysis of its design, its novel prediction-anchored learning method, and its profitable performance in a 64,267-hand trial, this paper makes the case that the solved myth is a distraction from the real, far more interesting challenge: creating AI that can master the art of human imperfection.

artificial intelligence, machine learning, opponent, (19 more...)

arXiv.org Artificial Intelligence

2512.04714

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.46)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Supplement to " Model Selection in Contextual Stochastic Bandit Problems "

Neural Information Processing SystemsOct-3-2025, 06:29:16 GMT

In Section D we present the proofs for Section 5.1 In Section H we show the proofs of the lower bounds in Section 6. We outline briefly some other direct applications of our results. CORRAL will achieve regret O ( p | L | dT) . B.1 Original Corral The original Corral algorithm [2] is reproduced below. We reproduce the EXP3.P algorithm (Figure 3.1 in [ 's expected replay regret satisfies: Therefore total regret is bounded by 6 U ( T,) log( T) D.2 Applications of Proposition 5.1 We now show that several algorithms are ( U,, T) bounded: Lemma D.2.

algorithm, base algorithm, theorem 5, (15 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.41)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.41)

Add feedback

751d51528afe5e6f7fe95dece4ed32ba-Paper.pdf

Neural Information Processing SystemsOct-3-2025, 06:29:09 GMT

algorithm, base algorithm, corral, (15 more...)

Neural Information Processing Systems

Country:

North America > Canada > Alberta (0.14)
North America > United States > Massachusetts (0.04)
North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.48)

Add feedback

Adapting to Misspecification in Contextual Bandits

Neural Information Processing SystemsAug-22-2025, 00:22:07 GMT

However, they left the adapting to unknown ε as an open question.

algorithm, contextual bandit, reg sq, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada (0.04)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Data Science > Data Mining > Big Data (0.46)

Add feedback

Near-Optimal Goal-Oriented Reinforcement Learning in Non-Stationary Environments

Neural Information Processing SystemsAug-19-2025, 09:59:09 GMT

These algorithms combine the ideas of finite-horizon approximation [Chen et al., 2022a], special Bernstein-style bonuses of the MVP algorithm [Zhang et al., 2020], adaptive confidence widening [Wei and Luo, 2021], as

artificial intelligence, machine learning, reinforcement learning, (20 more...)

Neural Information Processing Systems

Country: