AITopics

2006.12632

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.05)
Europe > Germany (0.05)

Genre: Research Report (0.71)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.51)

arXiv.org Artificial IntelligenceJun-21-2020

Continuous Control for Searching and Planning with a Learned Model

Yang, Xuxi, Duvaud, Werner, Wei, Peng

Decision-making agents with planning capabilities have achieved huge success in the challenging domain like Chess, Shogi, and Go. In an effort to generalize the planning ability to the more general tasks where the environment dynamics are not available to the agent, researchers proposed the MuZero algorithm that can learn the dynamical model through the interactions with the environment. In this paper, we provide a way and the necessary theoretical results to extend the MuZero algorithm to more generalized environments with continuous action space. Through numerical results on two relatively low-dimensional MuJoCo environments, we show the proposed algorithm outperforms the soft actor-critic (SAC) algorithm, a state-of-the-art model-free deep reinforcement learning algorithm.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2006.0743

Country:

Europe > France > Pays de la Loire > Loire-Atlantique > Nantes (0.04)
North America > United States > Iowa (0.04)

Genre: Research Report (0.84)

Industry: Leisure & Entertainment > Games > Chess (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Yang, Xiufeng, Aasawat, Tanuj Kr, Yoshizoe, Kazuki

Practical Large-Scale Distributed Parallel Monte-Carlo Tree Search Applied to Molecular Design

arXiv.org Artificial IntelligenceJun-18-2020

It is common practice to use large computational resources to train neural networks, as is known from many examples, such as reinforcement learning applications. However, while massively parallel computing is often used for training models, it is rarely used for searching solutions for combinatorial optimization problems. In this paper, we propose to apply a hash function based distributed parallel Monte-Carlo Tree Search (MCTS) to a real-world problem of molecular design. By running our massively parallel MCTS combined with a simple RNN on 1024 CPU cores for 10 minutes, we achieved a score on a molecular design problem that significantly outperforms existing work. Whereas existing studies on massively scalable parallel MCTS only compare the number of rollouts, we prove the practicality of the algorithm by comparing the quality of the solutions obtained in practice. This method is generic and is expected to speed up other applications of MCTS.

artificial intelligence, machine learning, node, (20 more...)

2006.10504

Country:

North America > United States > Wisconsin (0.04)
North America > United States > Connecticut > New Haven County > Wallingford (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Matrenin, Pavel, Manusov, Vadim

The cyclic job-shop scheduling problem: The new subclass of the job-shop problem and applying the Simulated annealing to solve it

arXiv.org Artificial IntelligenceJun-18-2020

In the paper, the new approach to the scheduling problem are described. The approach deals with the problem of planning the cyclic production and proposes to consider such scheduling problem as the cyclic job-shop problem of the order k, where k is the number of reiterations. It was found out that planning of only one iteration of the loop is less effective than planning of the entire cycle. To the experimental research, a number of test instances of the job-shop scheduling problem by Operation Research Library were used. The Simulated Annealing was applied to solve the instances. The experiments proved that the approach proposed allows increasing the efficiency of cyclic scheduling significantly.

artificial intelligence, machine learning, scheduling problem, (16 more...)

doi: 10.1109/ICIEAM.2016.7911676

2006.10938

Country:

Asia > Russia > Siberian Federal District > Novosibirsk Oblast > Novosibirsk (0.05)
North America > United States > Massachusetts > Middlesex County > Reading (0.04)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
(2 more...)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.62)

Ahmad, Zaheen Farraz, Lelis, Levi H. S., Bowling, Michael

Marginal Utility for Planning in Continuous or Large Discrete Action Spaces

arXiv.org Artificial IntelligenceJun-17-2020

Sample-based planning is a powerful family of algorithms for generating intelligent behavior from a model of the environment. Generating good candidate actions is critical to the success of sample-based planners, particularly in continuous or large action spaces. Typically, candidate action generation exhausts the action space, uses domain knowledge, or more recently, involves learning a stochastic policy to provide such search guidance. In this paper we explore explicitly learning a candidate action generator by optimizing a novel objective, marginal utility. The marginal utility of an action generator measures the increase in value of an action over previously generated actions. We validate our approach in both curling, a challenging stochastic domain with continuous state and action spaces, and a location game with a discrete but large action space. We show that a generator trained with the marginal utility objective outperforms hand-coded schemes built on substantial domain knowledge, trained stochastic policies, and other natural objectives for generating actions for sampled-based planners.

generator, machine learning, reinforcement learning, (20 more...)

2006.06054

Country:

North America > Canada > Alberta (0.14)
North America > Canada > Quebec (0.04)

Genre: Research Report > New Finding (0.93)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(3 more...)

arXiv.org Artificial IntelligenceJun-16-2020

Fault Tolerant Free Gait and Footstep Planning for Hexapod Robot Based on Monte-Carlo Tree

Ding, Liang, Xu, Peng, Gao, Haibo, Wang, Zhikai, Zhou, Ruyi, Gong, Zhaopei, Liu, Guangjun

These authors contributed equally to this work. Abstract--Legged robots can pass through complex field environments by selecting gaits and discrete footholds carefully. Traditional methods plan gait and foothold separately and treat them as the single-step optimal process. However, such processing causes its poor passability in a sparse foothold environment. This paper novelly proposes a coordinative planning method for hexapod robots that regards the planning of gait and foothold as a sequence optimization problem with the consideration of dealing with the harshness of the environment as leg fault. The Monte Carlo tree search algorithm(MCTS) is used to optimize the entire sequence. Two methods, FastMCTS, and SlidingMCTS are proposed to solve some defeats of the standard MCTS applicating in the field of legged robot planning. The proposed planning algorithm combines the fault-tolerant gait method to improve the passability of the algorithm. For rule-based method, when walking in complicated terrain, which leads them to execute motor tasks a periodic gait, assuming that all footsteps are valid, legged on fields such as field rescue and planetary exploration in robots move forward in a fixed swing sequence, which is the future. The hexapod robots that have higher stability usually taken as 3+3 tripod gait, 4+2 quadruped gait or 5+1 and superior load capacity than biped robots and quadruped wave gait for hexapod robots[7]. Because these gaits are robots are widely used[1][2][3].

artificial intelligence, planning & scheduling, robot, (17 more...)

doi: 10.1109/LRA.2021.3133610

2006.0755

Country:

Asia > China > Heilongjiang Province > Harbin (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre:

Workflow (0.67)
Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots > Locomotion (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)

Speck, David, Biedenkapp, André, Hutter, Frank, Mattmüller, Robert, Lindauer, Marius

Learning Heuristic Selection with Dynamic Algorithm Configuration

A key challenge in satisfying planning is to use multiple heuristics within one heuristic search. An aggregation of multiple heuristic estimates, for example by taking the maximum, has the disadvantage that bad estimates of a single heuristic can negatively affect the whole search. Since the performance of a heuristic varies from instance to instance, approaches such as algorithm selection can be successfully applied. In addition, alternating between multiple heuristics during the search makes it possible to use all heuristics equally and improve performance. However, all these approaches ignore the internal search dynamics of a planning system, which can help to select the most helpful heuristics for the current expansion step. We show that dynamic algorithm configuration can be used for dynamic heuristic selection which takes into account the internal search dynamics of a planning system. Furthermore, we prove that this approach generalizes over existing approaches and that it can exponentially improve the performance of the heuristic search. To learn dynamic heuristic selection, we propose an approach based on reinforcement learning and show empirically that domain-wise learned policies, which take the internal search dynamics of a planning system into account, can exceed existing approaches in terms of coverage.

artificial intelligence, machine learning, selection, (18 more...)

2006.08246

Country: Europe > Germany > Baden-Württemberg > Freiburg (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Brown, Kyle, Peltzer, Oriana, Sehr, Martin A., Schwager, Mac, Kochenderfer, Mykel J.

Optimal Sequential Task Assignment and Path Finding for Multi-Agent Robotic Assembly Planning

We study the problem of sequential task assignment and collision-free routing for large teams of robots in applications with inter-task precedence constraints (e.g., task $A$ and task $B$ must both be completed before task $C$ may begin). Such problems commonly occur in assembly planning for robotic manufacturing applications, in which sub-assemblies must be completed before they can be combined to form the final product. We propose a hierarchical algorithm for computing makespan-optimal solutions to the problem. The algorithm is evaluated on a set of randomly generated problem instances where robots must transport objects between stations in a "factory "grid world environment. In addition, we demonstrate in high-fidelity simulation that the output of our algorithm can be used to generate collision-free trajectories for non-holonomic differential-drive robots.

artificial intelligence, constraint, planning & scheduling, (17 more...)

2006.08845

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.94)

On Effective Parallelization of Monte Carlo Tree Search

Liu, Anji, Liang, Yitao, Liu, Ji, Broeck, Guy Van den, Chen, Jianshu

Despite its groundbreaking success in Go and computer games, Monte Carlo Tree Search (MCTS) is computationally expensive as it requires a substantial number of rollouts to construct the search tree, which calls for effective parallelization. However, how to design effective parallel MCTS algorithms has not been systematically studied and remains poorly understood. In this paper, we seek to lay its first theoretical foundations, by examining the potential performance loss caused by parallelization when achieving a desired speedup. In particular, we focus on studying the conditions under which the performance loss (measured in excess regret) vanishes over time. To this end, we propose a general parallel MCTS framework that can be specialized to major existing parallel MCTS algorithms. We derive two necessary conditions for the algorithms covered by the general framework to have vanishing excess regret (i.e. excess regret converges to zero as the total number of rollouts grows). We demonstrate the effectiveness of the necessary conditions by showing that, for depth-2 search trees, the recently developed WU-UCT algorithm satisfies both necessary conditions and has provable vanishing excess regret. Finally, we perform empirical studies to closely examine the necessary conditions under the general tree search setting (with arbitrary tree depth). It shows that the topological discrepancy between the search trees constructed by the parallel and the sequential MCTS algorithms is the main reason for the performance loss.

algorithm, artificial intelligence, planning & scheduling, (18 more...)

2006.08785

Country:

Europe > Germany > Saxony-Anhalt > Magdeburg (0.04)
Europe > Estonia > Tartu County > Tartu (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)

Goodman, James, Lucas, Simon

Does it matter how well I know what you're thinking? Opponent Modelling in an RTS game

Opponent Modelling tries to predict the future actions of opponents, and is required to perform well in multi-player games. There is a deep literature on learning an opponent model, but much less on how accurate such models must be to be useful. We investigate the sensitivity of Monte Carlo Tree Search (MCTS) and a Rolling Horizon Evolutionary Algorithm (RHEA) to the accuracy of their modelling of the opponent in a simple Real-Time Strategy game. We find that in this domain RHEA is much more sensitive to the accuracy of an opponent model than MCTS. MCTS generally does better even with an inaccurate model, while this will degrade RHEA's performance. We show that faced with an unknown opponent and a low computational budget it is better not to use any explicit model with RHEA, and to model the opponent's actions within the tree as part of the MCTS algorithm.

artificial intelligence, machine learning, planning & scheduling, (16 more...)

2006.08659

Country: North America > United States > Texas (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.68)