AITopics | bernoulli bandit

Collaborating Authors

bernoulli bandit

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Delightful Exploration

Osband, Ian

arXiv.org Machine LearningMay-14-2026

Most exploration algorithms search broadly until uncertainty is resolved. When the action space is too large to resolve within budget, practitioners default to $\varepsilon$-greedy, which bounds disruption but spends its override blindly. We introduce \textit{Delight-gated exploration} (DE), a host--override rule that spends exploratory actions only when their prospective delight (expected improvement times surprisal) exceeds a gate price. This practical heuristic recovers a classical result: Pandora's reservation-value rule for costly search, with surprisal setting the effective inspection cost. Resolved arms exit the gate, fresh arms shut off above a prior-determined threshold, and selected linear-bandit overrides consume finite information budget. Across Bernoulli bandits, linear bandits, and tabular MDPs, the same hyperparameters transfer without retuning, and DE shows much weaker regret growth than Thompson Sampling and $\varepsilon$-greedy in the tested unresolved regimes. Delight improves acting for the same reason it improves learning: it prices scarce resources by the product of upside and surprisal.

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Machine Learning

2605.13287

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Data Science > Data Mining > Big Data (0.46)

Add feedback

We would like to thank the reviewers for the constructive reviews

Neural Information Processing SystemsNov-18-2025, 10:48:39 GMT

Sec. 3.2 a novel contribution [...] 4) l. 139: Is the teacher's reward the same as the reward previously defined for the

algorithm, constructive review, learner, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

92fde850d824c2ba9b563cb6fa4078c3-Supplemental.pdf

Neural Information Processing SystemsNov-15-2025, 03:21:19 GMT

artificial intelligence, data mining, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Workflow (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Data Science > Data Mining (0.71)

Add feedback

Supplement to " Metadata-based Multi-Task Bandits with Bayesian Hierarchical Models " Anonymous Author(s) Affiliation Address email A Review of Statistical Concepts 1

Neural Information Processing SystemsOct-9-2025, 16:39:38 GMT

Supplement to "Metadata-based Multi-T ask Bandits with Bayesian Hierarchical Models" See [11, 42] for more detailed discussions. Consider a supervised learning problem, where we have N subjects. Finally, these three models are all special case of the following hierarchical model (a.k.a. The aforementioned statistical concepts are typically introduced for supervised learning. It is easy to see this model is a random effect model.

artificial intelligence, bandit, machine learning, (16 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)

Add feedback

92fde850d824c2ba9b563cb6fa4078c3-Paper.pdf

Neural Information Processing SystemsSep-25-2025, 16:12:25 GMT

artificial intelligence, bandit, machine learning, (19 more...)

Neural Information Processing Systems

Country: North America > Canada > Alberta (0.28)

Industry: Energy > Oil & Gas > Upstream (0.36)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.72)
Information Technology > Data Science > Data Mining > Big Data (0.50)
(2 more...)

Add feedback

b98a3773ecf715751d3cf0fb6dcba424-AuthorFeedback.pdf

Neural Information Processing SystemsAug-20-2025, 00:17:56 GMT

active learner, algorithm, learner, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

92fde850d824c2ba9b563cb6fa4078c3-Supplemental.pdf

Neural Information Processing SystemsAug-16-2025, 01:23:49 GMT

artificial intelligence, data mining, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > New York (0.04)
North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Workflow (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Data Science > Data Mining (0.71)

Add feedback

Deep Bandits Show-Off: Simple and Efficient Exploration with Deep Networks

Neural Information Processing SystemsAug-16-2025, 01:23:45 GMT

Designing efficient exploration is central to Reinforcement Learning due to the fundamental problem posed by the exploration-exploitation dilemma.

artificial intelligence, bandit, machine learning, (19 more...)

Neural Information Processing Systems

Country: North America > Canada > Alberta (0.28)

Industry: Energy > Oil & Gas > Upstream (0.36)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.72)
Information Technology > Data Science > Data Mining > Big Data (0.50)
(2 more...)

Add feedback

Optimal Regret of Bernoulli Bandits under Global Differential Privacy

Azize, Achraf, Wu, Yulian, Honda, Junya, Orabona, Francesco, Ito, Shinji, Basu, Debabrota

arXiv.org Machine LearningMay-12-2025

As sequential learning algorithms are increasingly applied to real life, ensuring data privacy while maintaining their utilities emerges as a timely question. In this context, regret minimisation in stochastic bandits under $ε$-global Differential Privacy (DP) has been widely studied. Unlike bandits without DP, there is a significant gap between the best-known regret lower and upper bound in this setting, though they "match" in order. Thus, we revisit the regret lower and upper bounds of $ε$-global DP algorithms for Bernoulli bandits and improve both. First, we prove a tighter regret lower bound involving a novel information-theoretic quantity characterising the hardness of $ε$-global DP in stochastic bandits. Our lower bound strictly improves on the existing ones across all $ε$ values. Then, we choose two asymptotically optimal bandit algorithms, i.e. DP-KLUCB and DP-IMED, and propose their DP versions using a unified blueprint, i.e., (a) running in arm-dependent phases, and (b) adding Laplace noise to achieve privacy. For Bernoulli bandits, we analyse the regrets of these algorithms and show that their regrets asymptotically match our lower bound up to a constant arbitrary close to 1. This refutes the conjecture that forgetting past rewards is necessary to design optimal bandit algorithms under global DP. At the core of our algorithms lies a new concentration inequality for sums of Bernoulli variables under Laplace mechanism, which is a new DP version of the Chernoff bound. This result is universally useful as the DP literature commonly treats the concentrations of Laplace noise and random variables separately, while we couple them to yield a tighter bound.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Machine Learning

2505.05613

Country: