best sequence
Generalized Nested Rollout Policy Adaptation with Limited Repetitions
Generalized Nested Rollout Policy Adaptation (GNRPA) is a Monte Carlo search algorithm for optimizing a sequence of choices. We propose to improve on GNRPA by avoiding too deterministic policies that find again and again the same sequence of choices. We do so by limiting the number of repetitions of the best sequence found at a given level. Experiments show that it improves the algorithm for three different combinatorial problems: Inverse RNA Folding, the Traveling Salesman Problem with Time Windows and the Weak Schur problem.
Generalized Nested Rollout Policy Adaptation with Dynamic Bias for Vehicle Routing
Sentuc, Julien, Cazenave, Tristan, Lucas, Jean-Yves
In this paper we present an extension of the Nested Rollout Policy Adaptation algorithm (NRPA), namely the Generalized Nested Rollout Policy Adaptation (GNRPA), as well as its use for solving some instances of the Vehicle Routing Problem. We detail some results obtained on the Solomon instances set which is a conventional benchmark for the Capacitated Vehicle Routing Problem with Time Windows (CVRPTW). We show that on all instances, GN-RPA performs better than NRPA. On some instances, it performs better than the Google OR Tool module dedicated to VRP.
Stabilized Nested Rollout Policy Adaptation
Cazenave, Tristan, Sevestre, Jean-Baptiste, Toulemont, Matthieu
Nested Rollout Policy Adaptation (NRPA) is a Monte Carlo search algorithm for single player games. In this paper we propose to modify NRPA in order to improve the stability of the algorithm. Experiments show it improves the algorithm for different application domains: SameGame, Traveling Salesman with Time Windows and Expression Discovery.
The Building Blocks of Artificial Intelligence
Machine vision is the classification and tracking of real-world objects based on visual, x-ray, laser, or other signals. Optical character recognition was an early success of machine vision, but deciphering handwritten text remains a work in progress. The quality of machine vision depends on human labeling of a large quantity of reference images. The simplest way for machines to start learning is through access to this labeled data. Within the next five years, video-based computer vision will be able to recognize actions and predict motion--for example, in surveillance systems.
A Deep Q-learning/genetic Algorithms Based Novel Methodology For Optimizing Covid-19 Pandemic Government Actions
Miralles-Pechuán, Luis, Jiménez, Fernando, Ponce, Hiram, Martínez-Villaseñor, Lourdes
Whenever countries are threatened by a pandemic, as is the case with the COVID-19 virus, governments should take the right actions to safeguard public health as well as to mitigate the negative effects on the economy. In this regard, there are two completely different approaches governments can take: a restrictive one, in which drastic measures such as self-isolation can seriously damage the economy, and a more liberal one, where more relaxed restrictions may put at risk a high percentage of the population. The optimal approach could be somewhere in between, and, in order to make the right decisions, it is necessary to accurately estimate the future effects of taking one or other measures. In this paper, we use the SEIR epidemiological model (Susceptible - Exposed - Infected - Recovered) for infectious diseases to represent the evolution of the virus COVID-19 over time in the population. To optimize the best sequences of actions governments can take, we propose a methodology with two approaches, one based on Deep Q-Learning and another one based on Genetic Algorithms. The sequences of actions (confinement, self-isolation, two-meter distance or not taking restrictions) are evaluated according to a reward system focused on meeting two objectives: firstly, getting few people infected so that hospitals are not overwhelmed with critical patients, and secondly, avoiding taking drastic measures for too long which can potentially cause serious damage to the economy. The conducted experiments prove that our methodology is a valid tool to discover actions governments can take to reduce the negative effects of a pandemic in both senses. We also prove that the approach based on Deep Q-Learning overcomes the one based on Genetic Algorithms for optimizing the sequences of actions.
Generalized Nested Rollout Policy Adaptation
Nested Rollout Policy Adaptation (NRPA) is a Monte Carlo search algorithm for single player games. In this paper we propose to generalize NRPA with a temperature and a bias and to analyze theoretically the algorithms. The generalized algorithm is named GNRPA. Experiments show it improves on NRPA for different application domains: SameGame and the Traveling Salesman Problem with Time Windows.
Nested Monte-Carlo Search
Cazenave, Tristan (Université Paris-Dauphine)
Many problems have a huge state space and no good heuristic to order moves so as to guide the search toward the best positions. Random games can be used to score positions and evaluate their interest. Random games can also be improved using random games to choose a move to try at each step of a game. Nested Monte-Carlo Search addresses the problem of guiding the search toward better states when there is no available heuristic. It uses nested levels of random games in order to guide the search. The algorithm is studied theoretically on simple abstract problems and applied successfully to three different games: Morpion Solitaire, SameGame and 16x16 Sudoku.