AITopics

arXiv.org Artificial IntelligenceJan-9-2025

We present a Monte-Carlo simulation algorithm for real-time policy improvement of an adaptive controller. In the Monte-Carlo simulation, the long-term expected reward of each possible action is statistically measured, using the initial policy to make decisions in each step of the simulation. The action maximizing the measured expected reward is then taken, resulting in an improved policy. Our algorithm is easily parallelizable and has been implemented on the IBM SP1 and SP2 parallel-RISC supercomputers. We have obtained promising initial results in applying this algorithm to the domain of backgammon. Results are reported for a wide variety of initial policies, ranging from a random policy to TD-Gammon, an extremely strong multi-layer neural network. In each case, the Monte-Carlo algorithm gives a substantial reduction, by as much as a factor of 5 or more, in the error rate of the base players. The algorithm is also potentially useful in many other adaptive control applications in which it is possible to simulate the environment.

machine learning, monte-carlo player, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2501.05407

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games > Backgammon (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Neural Information Processing SystemsApr-6-2023, 18:12:04 GMT

On-line Policy Improvement using Monte-Carlo Search

We present a Monte-Carlo simulation algorithm for real-time policy improvement of an adaptive controller. In the Monte-Carlo sim(cid:173) ulation, the long-term expected reward of each possible action is statistically measured, using the initial policy to make decisions in each step of the simulation. The action maximizing the measured expected reward is then taken, resulting in an improved policy. Our algorithm is easily parallelizable and has been implemented on the IBM SP! and SP2 parallel-RISC supercomputers. We have obtained promising initial results in applying this algo(cid:173) rithm to the domain of backgammon.

algorithm, monte-carlo search, on-line policy improvement, (1 more...)

Industry: Leisure & Entertainment > Games > Backgammon (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.40)

AAAI ConferencesJun-13-2017

Solving Graph Optimization Problems in a Framework for Monte-Carlo Search

Edelkamp, Stefan (Universität Bremen) | Externest, Eike (Universität Bremen) | Kühl, Sebastian (Universität Bremen) | Kuske, Sabine (Universität Bremen)

In this paper we solve fundamental graph optimization problems like Maximum Clique and Minimum Coloring with recent advances of Monte-Carlo Search. The optimization problems are implemented as single-agent games in a generic state-space search framework, roughly comparable to what is encoded in PDDL for an action planner.

algorithm, graph optimization problem, monte-carlo search, (11 more...)

AAAI Conferences

Tenth Annual Symposium on Combinatorial Search

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.82)

Branavan, S.R.K. (Massachusetts Institute of Technology) | Silver, David (University College London) | Barzilay, Regina (Massachusetts Institute of Technology)

Non-Linear Monte-Carlo Search in Civilization II

AAAI ConferencesJul-19-2011

This paper presents a new Monte-Carlo search algorithm for very large sequential decision-making problems. We apply non-linear regression within Monte-Carlo search, online, to estimate a state-action value function from the outcomes of random roll-outs. This value function generalizes between related states and actions, and can therefore provide more accurate evaluations after fewer rollouts. A further significant advantage of this approach is its ability to automatically extract and leverage domain knowledge from external sources such as game manuals. We apply our algorithm to the game of Civilization II, a challenging multi-agent strategy game with an enormous state space and around 10^21 joint actions. We approximate the value function by a neural network, augmented by linguistic knowledge that is extracted automatically from the official game manual. We show that this non-linear value function is significantly more efficient than a linear value function, which is itself more efficient than Monte-Carlo tree search. Our non-linear Monte-Carlo search wins over 78% of games against the built-in AI of Civilization II.

civilization ii, roll-out, value function, (15 more...)

AAAI Conferences

Twenty-Second International Joint Conference on Artificial Intelligence

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Germany > Saxony-Anhalt > Magdeburg (0.04)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

On-line Policy Improvement using Monte-Carlo Search

Neural Information Processing SystemsDec-31-1997

Policy iteration is known to have rapid and robust convergence properties, and for Markov tasks with lookup-table state-space representations, it is guaranteed to convergence to the optimal policy. Online Policy Improvement using Monte-Carlo Search 1069 In typical uses of policy iteration, the policy improvement step is an extensive off-line procedure. For example, in dynamic programming, one performs a sweep through all states in the state space. Reinforcement learning provides another approach to policy improvement; recently, several authors have investigated using RL in conjunction with nonlinear function approximators to represent the value functions and/or policies (Tesauro, 1992; Crites and Barto, 1996; Zhang and Dietterich, 1996). These studies are based on following actual state-space trajectories rather than sweeps through the full state space, but are still too slow to compute improved policies in real time.

algorithm, base player, monte-carlo player, (17 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)

Industry: Leisure & Entertainment > Games > Backgammon (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

On-line Policy Improvement using Monte-Carlo Search

Neural Information Processing SystemsDec-31-1997

algorithm, base player, monte-carlo player, (17 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)

Industry: Leisure & Entertainment > Games > Backgammon (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

On-line Policy Improvement using Monte-Carlo Search

Neural Information Processing SystemsDec-31-1997

Policy iteration is known to have rapid and robust convergence properties, and for Markov tasks with lookup-table state-space representations, it is guaranteed to convergence to the optimal policy. Online Policy Improvement using Monte-Carlo Search 1069 In typical uses of policy iteration, the policy improvement step is an extensive off-line procedure. For example, in dynamic programming, one performs a sweep through all states in the state space. Reinforcement learning provides another approach topolicy improvement; recently, several authors have investigated using RL in conjunction with nonlinear function approximators to represent the value functions and/orpolicies (Tesauro, 1992; Crites and Barto, 1996; Zhang and Dietterich, 1996). These studies are based on following actual state-space trajectories rather than sweeps through the full state space, but are still too slow to compute improved policies in real time.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

Industry: Leisure & Entertainment > Games > Backgammon (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)