AITopics | bandit process

We study a system with finitely many groups of multi-action bandit processes, each of which is a Markov decision process (MDP) with finite state and action spaces and potentially different transition matrices when taking different actions. The bandit processes of the same group share the same state and action spaces and, given the same action that is taken, the same transition matrix. All the bandit processes across various groups are subject to multiple weakly coupled constraints over their state and action variables. Unlike the past studies that focused on the offline case, we consider the online case without assuming full knowledge of transition matrices and reward functions a priori and propose an effective scheme that enables simultaneous learning and control. We prove the convergence of the relevant processes in both the timeline and the number of the bandit processes, referred to as the convergence in the time and the magnitude dimensions. Moreover, we prove that the relevant processes converge exponentially fast in the magnitude dimension, leading to exponentially diminishing performance deviation between the proposed online algorithms and offline optimality. Jing Fu is with Department of Electrical and Electronic Engineering, School of Engineering, STEM College, RMIT University, Australia (e-mail: jing.fu@rmit.edu.au). Bill Moran is with Department of Electrical and Electronic Engineering, the University of Melbourne, VIC 3010, Australia (e-mail:wmoran@unimelb.edu.au).

bandit process, convergence, index policy, (14 more...)

arXiv.org Artificial Intelligence

2412.03326

Country:

Oceania > Australia > Victoria > Melbourne (0.24)
Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
North America > United States > New York (0.04)
(5 more...)

Genre: Research Report (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

A General Framework of Multi-Armed Bandit Processes by Arm Switch Restrictions

Bao, Wenqing, Cai, Xiaoqiang, Wu, Xianyi

arXiv.org Machine LearningAug-22-2018

This paper proposes a general framework of multi-armed bandit (MAB) processes by introducing a type of restrictions on the switches among arms evolving in continuous time. The Gittins index process is constructed for any single arm subject to the restrictions on switches and then the optimality of the corresponding Gittins index rule is established. The Gittins indices defined in this paper are consistent with the ones for MAB processes in continuous time, integer time, semi-Markovian setting as well as general discrete time setting, so that the new theory covers the classical models as special cases and also applies to many other situations that have not yet been touched in the literature. While the proof of the optimality of Gittins index policies benefits from ideas in the existing theory of MAB processes in continuous time, new techniques are introduced which drastically simplify the proof.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

1808.06314

Country: Asia > China (0.68)

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.93)

Add feedback

Signatures of Infinity: Nonergodicity and Resource Scaling in Prediction, Complexity, and Learning

Crutchfield, James P., Marzen, Sarah

arXiv.org Machine LearningApr-1-2015

Truly complex stochastic processes--the infinitary processes [1] whose mutual information between past and future diverges--arise in many physical and biological systems [2-5], such as those in critical states. They are implicated in many natural phenomena, from the geophysics of earthquakes [6] and physiological measurements of neural avalanches [7] to semantics in natural language [8] and cascading failure in power transmission grids [9]. Their apparent infinite memory makes empirical estimation and modeling particularly challenging. The difficulty is reflected in the computational complexity of inference [10]: the resources required to predict and model them diverge in sample size, in memory for storing model parameters, and in memory required for prediction. Resource scaling, an analog of the venerable technique of finite-size scaling in statistical mechanics, suggests that for infinitary processes we look for statistical signatures that track divergences. Since resource divergences are sensitive to a process's inherent randomness and organization, one hopes that their scaling forms are uniquely revealing indicators of process complexity and can guide the selection of appropriate models. To date, though, there are few tractable constructions with which to explore possible general relationships between prediction, complexity, and learning for infinitary processes.

artificial intelligence, bayesian inference, machine learning, (17 more...)

arXiv.org Machine Learning

1504.00386

Country:

North America > United States > California (0.46)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report (0.40)

Industry:

Health & Medicine (0.54)
Energy > Power Industry (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

Local Bandit Approximation for Optimal Learning Problems

Duff, Michael O., Barto, Andrew G.

Neural Information Processing SystemsDec-31-1997

A Bayesian formulation of the problem leads to a clear concept of a solution whose computation, however, appears to entail an examination of an intractably-large number of hyperstates. This paper has suggested extending the Gittins index approach (which applies with great power and elegance to the special class of multi-armed bandit processes) to general adaptive MDP's. The hope has been that if certain salient features of the value of information could be captured, even approximately, then one could be led to a reasonable method for avoiding certain defects of certainty-equivalence approaches (problems with identifiability, "metastability"). Obviously, positive evidence, in the form of empirical results from simulation experiments, would lend support to these ideas-work along these lines is underway. Local bandit approximation is but one approximate computational approach for problems of optimal learning and dual control. Most prominent in the literature of control theory is the "wide-sense" approach of [Bar-Shalom & Tse, 1976], which utilizes local quadratic approximations about nominal state/control trajectories. For certain problems, this method has demonstrated superior performance compared to a certainty-equivalence approach, but it is computationally very intensive and unwieldy, particularly for problems with controller dimension greater than one. One could revert to the view of the bandit problem, or general adaptive MDP, as simply a very large MDP defined over hyperstates, and then consider a some- Local Bandit Approximationfor Optimal Learning Problems 1025 what direct approach in which one performs approximate dynamic programming with function approximation over this domain-details of function-approximation, feature-selection, and "training" all become important design issues.

bandit problem, optimal, transition probability, (12 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts > Hampshire County > Amherst (0.14)

Genre: Research Report (0.48)

Industry: Education > Focused Education > Special Education (0.62)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.86)
Information Technology > Data Science > Data Mining > Big Data (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.70)
(2 more...)

Add feedback

Local Bandit Approximation for Optimal Learning Problems

Duff, Michael O., Barto, Andrew G.

Neural Information Processing SystemsDec-31-1997

A Bayesian formulation of the problem leads to a clear concept of a solution whose computation, however, appears to entail an examination of an intractably-large number of hyperstates. This paper has suggested extending the Gittins index approach (which applies with great power and elegance to the special class of multi-armed bandit processes) to general adaptive MDP's. The hope has been that if certain salient features of the value of information could be captured, even approximately, then one could be led to a reasonable method for avoiding certain defects of certainty-equivalence approaches (problems with identifiability, "metastability"). Obviously, positive evidence, in the form of empirical results from simulation experiments, would lend support to these ideas-work along these lines is underway. Local bandit approximation is but one approximate computational approach for problems of optimal learning and dual control. Most prominent in the literature of control theory is the "wide-sense" approach of [Bar-Shalom & Tse, 1976], which utilizes local quadratic approximations about nominal state/control trajectories. For certain problems, this method has demonstrated superior performance compared to a certainty-equivalence approach, but it is computationally very intensive and unwieldy, particularly for problems with controller dimension greater than one. One could revert to the view of the bandit problem, or general adaptive MDP, as simply a very large MDP defined over hyperstates, and then consider a some- Local Bandit Approximationfor Optimal Learning Problems 1025 what direct approach in which one performs approximate dynamic programming with function approximation over this domain-details of function-approximation, feature-selection, and "training" all become important design issues.

bandit problem, optimal, transition probability, (12 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts > Hampshire County > Amherst (0.14)

Genre: Research Report (0.48)

Industry: Education > Focused Education > Special Education (0.62)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.86)
Information Technology > Data Science > Data Mining > Big Data (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.70)
(2 more...)

Add feedback

Local Bandit Approximation for Optimal Learning Problems

Duff, Michael O., Barto, Andrew G.

Neural Information Processing SystemsDec-31-1997

A Bayesian formulation of the problem leads to a clear concept of a solution whose computation, however, appears to entail an examination of an intractably-large number of hyperstates. This paper hassuggested extending the Gittins index approach (which applies with great power and elegance to the special class of multi-armed bandit processes) to general adaptive MDP's. The hope has been that if certain salient features of the value of information could be captured, even approximately, then one could be led to a reasonable method for avoiding certain defects of certainty-equivalence approaches (problems with identifiability, "metastability"). Obviously, positive evidence, in the form of empirical results from simulation experiments, would lend support to these ideas-work along these lines is underway. Local bandit approximation is but one approximate computational approach for problems of optimal learning and dual control. Most prominent in the literature of control theory is the "wide-sense" approach of [Bar-Shalom & Tse, 1976], which utilizes localquadratic approximations about nominal state/control trajectories. For certain problems, this method has demonstrated superior performance compared to a certainty-equivalence approach, but it is computationally very intensive and unwieldy, particularly for problems with controller dimension greater than one. One could revert to the view of the bandit problem, or general adaptive MDP, as simply a very large MDP defined over hyperstates, and then consider a some- Local Bandit Approximationfor Optimal Learning Problems 1025 what direct approach in which one performs approximate dynamic programming with function approximation over this domain-details of function-approximation, feature-selection, and "training" all become important design issues.

data mining, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts > Hampshire County > Amherst (0.14)

Industry: Education > Focused Education > Special Education (0.62)

Technology: