AITopics

Country: North America > United States > California (0.68)

Genre: Research Report (0.68)

Industry: Information Technology (0.46)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Neural Information Processing SystemsDec-23-2025, 17:37:31 GMT

Syndicated Bandits: A Framework for Auto Tuning Hyper-parameters in Contextual Bandit Algorithms

The stochastic contextual bandit problem, which models the trade-off between exploration and exploitation, has many real applications, including recommender systems, online advertising and clinical trials. As many other machine learning algorithms, contextual bandit algorithms often have one or more hyper-parameters. As an example, in most optimal stochastic contextual bandit algorithms, there is an unknown exploration parameter which controls the trade-off between exploration and exploitation. A proper choice of the hyper-parameters is essential for contextual bandit algorithms to perform well. However, it is infeasible to use offline tuning methods to select hyper-parameters in contextual bandit environment since there is no pre-collected dataset and the decisions have to be made in real time. To tackle this problem, we first propose a two-layer bandit structure for auto tuning the exploration parameter and further generalize it to the Syndicated Bandits framework which can learn multiple hyper-parameters dynamically in contextual bandit environment. We derive the regret bounds of our proposed Syndicated Bandits framework and show it can avoid its regret dependent exponentially in the number of hyper-parameters to be tuned. Moreover, it achieves optimal regret bounds under certain scenarios. Syndicated Bandits framework is general enough to handle the tuning tasks in many popular contextual bandit algorithms, such as LinUCB, LinTS, UCB-GLM, etc. Experiments on both synthetic and real datasets validate the effectiveness of our proposed framework.

auto tuning hyper-parameter, contextual bandit algorithm, syndicated bandit framework, (5 more...)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.59)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsSep-24-2025, 09:23:48 GMT

082e82cae0232f45f27fdd2612c31f8a-Paper-Conference.pdf

algorithm, bandit algorithm, exploration parameter, (13 more...)

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > California > Yolo County > Davis (0.05)
North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report (0.68)

Industry: Information Technology (0.46)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Duan, Zhihua, Wang, Jialin

Prompt-Based Monte Carlo Tree Search for Mitigating Hallucinations in Large Models

arXiv.org Artificial IntelligenceJan-17-2025

With the rapid development of large models in the field of artificial intelligence, how to enhance their application capabilities in handling complex problems in the field of scientific research remains a challenging problem to be solved. This study proposes an improved Monte Carlo Tree Search (MCTS) method based on prompt words. In the simulation search stage, it introduces dynamic adjustment of exploration parameters and adaptive selection strategies, which can better balance exploration and exploitation, thereby reducing the hallucination phenomenon. This paper takes the four subsets of the SciEval dataset as the test objects, and compares the Glm-4-flash+Improved MCTS method with the methods of several existing models. The results show that the Improved MCTS method performs better, providing new ideas and methods for the application of large models in the field of scientific research.

artificial intelligence, large language model, natural language, (17 more...)

2501.13942

Country:

Asia > China > Shanghai > Shanghai (0.05)
North America > United States > California > San Mateo County > Burlingame (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre:

Research Report (0.70)
Workflow (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Sharma, Dravyansh, Suggala, Arun Sai

Offline-to-online hyperparameter transfer for stochastic bandits

arXiv.org Artificial IntelligenceJan-6-2025

Classic algorithms for stochastic bandits typically use hyperparameters that govern their critical properties such as the trade-off between exploration and exploitation. Tuning these hyperparameters is a problem of great practical significance. However, this is a challenging problem and in certain cases is information theoretically impossible. To address this challenge, we consider a practically relevant transfer learning setting where one has access to offline data collected from several bandit problems (tasks) coming from an unknown distribution over the tasks. Our aim is to use this offline data to set the hyperparameters for a new task drawn from the unknown distribution. We provide bounds on the inter-task (number of tasks) and intra-task (number of arm pulls for each task) sample complexity for learning near-optimal hyperparameters on unseen tasks drawn from the distribution. Our results apply to several classic algorithms, including tuning the exploration parameters in UCB and LinUCB and the noise parameter in GP-UCB. Our experiments indicate the significance and effectiveness of the transfer of hyperparameters from offline problems in online learning with stochastic bandit feedback.

artificial intelligence, data mining, machine learning, (18 more...)

2501.02926

Genre: Research Report > New Finding (0.66)

Industry:

Health & Medicine (0.89)
Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Neural Information Processing SystemsOct-9-2024, 12:21:34 GMT

Syndicated Bandits: A Framework for Auto Tuning Hyper-parameters in Contextual Bandit Algorithms

artificial intelligence, data mining, machine learning, (8 more...)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.41)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Painter, Michael, Baioumy, Mohamed, Hawes, Nick, Lacerda, Bruno

Monte Carlo Tree Search with Boltzmann Exploration

arXiv.org Artificial IntelligenceApr-11-2024

Monte-Carlo Tree Search (MCTS) methods, such as Upper Confidence Bound applied to Trees (UCT), are instrumental to automated planning techniques. However, UCT can be slow to explore an optimal action when it initially appears inferior to other actions. Maximum ENtropy Tree-Search (MENTS) incorporates the maximum entropy principle into an MCTS approach, utilising Boltzmann policies to sample actions, naturally encouraging more exploration. In this paper, we highlight a major limitation of MENTS: optimal actions for the maximum entropy objective do not necessarily correspond to optimal actions for the original objective. We introduce two algorithms, Boltzmann Tree Search (BTS) and Decaying ENtropy Tree-Search (DENTS), that address these limitations and preserve the benefits of Boltzmann policies, such as allowing actions to be sampled faster by using the Alias method. Our empirical analysis shows that our algorithms show consistent high performance across several benchmark domains, including the game of Go.

algorithm, dent, ment, (15 more...)

2404.07732

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Estonia > Tartu County > Tartu (0.04)

Genre: Research Report (0.63)

Industry: Leisure & Entertainment > Games > Go (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)

Neural Information Processing SystemsMar-15-2024, 09:59:49 GMT

Multi-Bandit Best Arm Identification

We study the problem of identifying the best arm in each of the bandits in a multibandit multi-armed setting. We first propose an algorithm called Gap-based Exploration (GapE) that focuses on the arms whose mean is close to the mean of the best arm in the same bandit (i.e., small gap). We then introduce an algorithm, called GapE-V, which takes into account the variance of the arms in addition to their gap. We prove an upper-bound on the probability of error for both algorithms. Since GapE and GapE-V need to tune an exploration parameter that depends on the complexity of the problem, which is often unknown in advance, we also introduce variations of these algorithms that estimate this complexity online. Finally, we evaluate the performance of these algorithms and compare them to other allocation strategies on a number of synthetic problems.

algorithm, bandit, probability, (15 more...)

Country: Europe > France > Hauts-de-France > Pas-de-Calais (0.04)

Genre: Research Report (0.69)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.33)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.48)

Brégère, Margaux, Keisler, Julie

A Bandit Approach with Evolutionary Operators for Model Selection

arXiv.org Artificial IntelligenceFeb-7-2024

This paper formulates model selection as an infinite-armed bandit problem. The models are arms, and picking an arm corresponds to a partial training of the model (resource allocation). The reward is the accuracy of the selected model after its partial training. In this best arm identification problem, regret is the gap between the expected accuracy of the optimal model and that of the model finally chosen. We first consider a straightforward generalization of UCB-E to the stochastic infinite-armed bandit problem and show that, under basic assumptions, the expected regret order is $T^{-\alpha}$ for some $\alpha \in (0,1/5)$ and $T$ the number of resources to allocate. From this vanilla algorithm, we introduce the algorithm Mutant-UCB that incorporates operators from evolutionary algorithms. Tests carried out on three open source image classification data sets attest to the relevance of this novel combining approach, which outperforms the state-of-the-art for a fixed budget.

algorithm, configuration, mutant-ucb, (14 more...)

2402.05144

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report (1.00)

Industry: Education > Educational Setting > Online (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.91)
Information Technology > Data Science > Data Mining > Big Data (0.88)
(3 more...)

arXiv.org Machine LearningJun-5-2021

Syndicated Bandits: A Framework for Auto Tuning Hyper-parameters in Contextual Bandit Algorithms

Ding, Qin, Liu, Yi-Wei, Hsieh, Cho-Jui, Sharpnack, James

The stochastic contextual bandit problem, which models the trade-off between exploration and exploitation, has many real applications, including recommender systems, online advertising and clinical trials. As many other machine learning algorithms, contextual bandit algorithms often have one or more hyper-parameters. As an example, in most optimal stochastic contextual bandit algorithms, there is an unknown exploration parameter which controls the trade-off between exploration and exploitation. A proper choice of the hyper-parameters is essential for contextual bandit algorithms to perform well. However, it is infeasible to use offline tuning methods to select hyper-parameters in contextual bandit environment since there is no pre-collected dataset and the decisions have to be made in real time. To tackle this problem, we first propose a two-layer bandit structure for auto tuning the exploration parameter and further generalize it to the Syndicated Bandits framework which can learn multiple hyper-parameters dynamically in contextual bandit environment. We show our Syndicated Bandits framework can achieve the optimal regret upper bounds and is general enough to handle the tuning tasks in many popular contextual bandit algorithms, such as LinUCB, LinTS, UCB-GLM, etc. Experiments on both synthetic and real datasets validate the effectiveness of our proposed framework.

exploration parameter, health & medicine, upstream oil & gas, (19 more...)

arXiv.org Machine Learning

2106.02979

Country: North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre: Research Report > New Finding (0.48)

Industry:

Energy > Oil & Gas > Upstream (0.68)
Health & Medicine > Pharmaceuticals & Biotechnology (0.55)
Information Technology (0.48)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)