AITopics

doi: 10.1145/3396474.3396492

2006.02716

Country:

Asia > Russia (0.15)
Europe > Russia > Volga Federal District > Republic of Tatarstan (0.14)
North America > Canada > Ontario > Niagara Region > St. Catharines (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Feng, Dieqiao, Gomes, Carla P., Selman, Bart

Solving Hard AI Planning Instances Using Curriculum-Driven Deep Reinforcement Learning

arXiv.org Artificial IntelligenceJun-4-2020

Despite significant progress in general AI planning, certain domains remain out of reach of current AI planning systems. Sokoban is a PSPACE-complete planning task and represents one of the hardest domains for current AI planners. Even domain-specific specialized search methods fail quickly due to the exponential search complexity on hard instances. Our approach based on deep reinforcement learning augmented with a curriculum-driven method is the first one to solve hard instances within one day of training while other modern solvers cannot solve these instances within any reasonable time limit. In contrast to prior efforts, which use carefully handcrafted pruning techniques, our approach automatically uncovers domain structure. Our results reveal that deep RL provides a promising framework for solving previously unsolved AI planning problems, provided a proper training curriculum can be devised.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2006.02689

Country:

North America > Canada > Alberta (0.14)
North America > United States (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Leisure & Entertainment > Games > Chess (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Machine LearningJun-4-2020

Sample Efficient Graph-Based Optimization with Noisy Observations

Nguyen, Tan, Shameli, Ali, Abbasi-Yadkori, Yasin, Rao, Anup, Kveton, Branislav

We study sample complexity of optimizing "hill-climbing friendly" functions defined on a graph under noisy observations. We define a notion of convexity, and we show that a variant of best-arm identification can find a near-optimal solution after a small number of queries that is independent of the size of the graph. For functions that have local minima and are nearly convex, we show a sample complexity for the classical simulated annealing under noisy observations. We show effectiveness of the greedy algorithm with restarts and the simulated annealing on problems of graph-based nearest neighbor classification as well as a web document re-ranking application.

algorithm, artificial intelligence, machine learning, (16 more...)

2006.02672

Country:

Oceania > Australia > Queensland (0.04)
North America > United States > New York (0.04)
Europe > France (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Rigas, Emmanouil, Kolios, Panayiotis, Ellinas, Georgios

Extending the Multiple Traveling Salesman Problem for Scheduling a Fleet of Drones Performing Monitoring Missions

arXiv.org Artificial IntelligenceJun-2-2020

In this paper we schedule the travel path of a set of drones across a graph where the nodes need to be visited multiple times at pre-defined points in time. This is an extension of the well-known multiple traveling salesman problem. The proposed formulation can be applied in several domains such as the monitoring of traffic flows in a transportation network, or the monitoring of remote locations to assist search and rescue missions. Aiming to find the optimal schedule, the problem is formulated as an Integer Linear Program (ILP). Given that the problem is highly combinatorial, the optimal solution scales only for small sized problems. Thus, a greedy algorithm is also proposed that uses a one-step look ahead heuristic search mechanism. In a detailed evaluation, it is observed that the greedy algorithm has near-optimal performance as it is on average at 92.06% of the optimal, while it can potentially scale up to settings with hundreds of drones and locations.

agent, algorithm, artificial intelligence, (15 more...)

2006.01473

Country:

Europe > Latvia > Riga Municipality > Riga (0.05)
Europe > Middle East > Cyprus (0.04)

Genre: Research Report (0.40)

Industry:

Transportation > Infrastructure & Services (0.48)
Transportation > Ground > Road (0.47)
Consumer Products & Services > Travel (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Cappart, Quentin, Moisan, Thierry, Rousseau, Louis-Martin, Prémont-Schwarz, Isabeau, Cire, Andre

Combining Reinforcement Learning and Constraint Programming for Combinatorial Optimization

arXiv.org Artificial IntelligenceJun-2-2020

Combinatorial optimization has found applications in numerous fields, from aerospace to transportation planning and economics. The goal is to find an optimal solution among a finite set of possibilities. The well-known challenge one faces with combinatorial optimization is the state-space explosion problem: the number of possibilities grows exponentially with the problem size, which makes solving intractable for large problems. In the last years, deep reinforcement learning (DRL) has shown its promise for designing good heuristics dedicated to solve NP-hard combinatorial optimization problems. However, current approaches have two shortcomings: (1) they mainly focus on the standard travelling salesman problem and they cannot be easily extended to other problems, and (2) they only provide an approximate solution with no systematic ways to improve it or to prove optimality. In another context, constraint programming (CP) is a generic tool to solve combinatorial optimization problems. Based on a complete search procedure, it will always find the optimal solution if we allow an execution time large enough. A critical design choice, that makes CP non-trivial to use in practice, is the branching decision, directing how the search space is explored. In this work, we propose a general and hybrid approach, based on DRL and CP, for solving combinatorial optimization problems. The core of our approach is based on a dynamic programming formulation, that acts as a bridge between both techniques. We experimentally show that our solver is efficient to solve two challenging problems: the traveling salesman problem with time windows, and the 4-moments portfolio optimization problem. Results obtained show that the framework introduced outperforms the stand-alone RL and CP solutions, while being competitive with industrial solvers.

constraint, machine learning, reinforcement learning, (19 more...)

2006.0161

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > Canada > Quebec > Montreal (0.04)
North America > United States > Nevada > Clark County > Las Vegas (0.04)
North America > United States > Arizona > Maricopa County > Phoenix (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
(2 more...)

Georgiev, Dobrik, Liò, Pietro

Neural Bipartite Matching

arXiv.org Machine LearningJun-2-2020

Graph neural networks (GNNs) have found application Performing the reasoning is achieved via neural execution, for learning in the space of algorithms. in a similar fashion to Veličković et al. (2020). GNNs have However, the algorithms chosen by existing research been both empirically (Veličković et al., 2020) and theoretically (sorting, Breadth-First search, shortest path (Xu et al., 2020) shown to be applicable to algorithmic finding, etc.) usually align perfectly with a standard tasks on graphs, strongly generalising on inputs of sizes GNN architecture. This report describes much larger than trained on. However, these algorithms how neural execution is applied to a complex algorithm, rely on a locally contained and fixed dataflow which aligns such as finding maximum bipartite matching perfectly with a standard GNN architecture, making them by reducing it to a flow problem and using easy to model with GNNs (c.f.

algorithm, artificial intelligence, machine learning, (17 more...)

2005.11304

Country:

Oceania > Australia (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(4 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.35)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.34)

Yakovlev, Konstantin, Andreychuk, Anton, Stern, Roni

Revisiting Bounded-Suboptimal Safe Interval Path Planning

arXiv.org Artificial IntelligenceJun-1-2020

Safe-interval path planning (SIPP) is a powerful algorithm for finding a path in the presence of dynamic obstacles. SIPP returns provably optimal solutions. However, in many practical applications of SIPP such as path planning for robots, one would like to trade-off optimality for shorter planning time. In this paper we explore different ways to build a bounded-suboptimal SIPP and discuss their pros and cons. We compare the different bounded-suboptimal versions of SIPP experimentally. While there is no universal winner, the results provide insights into when each method should be used.

algorithm, artificial intelligence, planning & scheduling, (17 more...)

2006.01195

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.82)

arXiv.org Machine LearningJun-1-2020

A Comprehensive Survey of Neural Architecture Search: Challenges and Solutions

Ren, Pengzhen, Xiao, Yun, Chang, Xiaojun, Huang, Po-Yao, Li, Zhihui, Chen, Xiaojiang, Wang, Xin

Deep learning has made major breakthroughs and progress in many fields. This is due to the powerful automatic representation capabilities of deep learning. It has been proved that the design of the network architecture is crucial to the feature representation of data and the final performance. In order to obtain a good feature representation of data, the researchers designed various complex network architectures. However, the design of the network architecture relies heavily on the researchers' prior knowledge and experience. Therefore, a natural idea is to reduce human intervention as much as possible and let the algorithm automatically design the architecture of the network. Thus going further to the strong intelligence. In recent years, a large number of related algorithms for \textit{Neural Architecture Search} (NAS) have emerged. They have made various improvements to the NAS algorithm, and the related research work is complicated and rich. In order to reduce the difficulty for beginners to conduct NAS-related research, a comprehensive and systematic survey on the NAS is essential. Previously related surveys began to classify existing work mainly from the basic components of NAS: search space, search strategy and evaluation strategy. This classification method is more intuitive, but it is difficult for readers to grasp the challenges and the landmark work in the middle. Therefore, in this survey, we provide a new perspective: starting with an overview of the characteristics of the earliest NAS algorithms, summarizing the problems in these early NAS algorithms, and then giving solutions for subsequent related research work. In addition, we conducted a detailed and comprehensive analysis, comparison and summary of these works. Finally, we give possible future research directions.

artificial intelligence, deep learning, machine learning, (16 more...)

2006.02903

Country:

Oceania > Australia > New South Wales (0.04)
Asia > Singapore (0.04)

Genre:

Overview (1.00)
Research Report > New Finding (0.66)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Deng, Yuntian, Rush, Alexander M.

Cascaded Text Generation with Markov Transformers

arXiv.org Machine LearningJun-1-2020

The two dominant approaches to neural text generation are fully autoregressive models, using serial beam search decoding, and non-autoregressive models, using parallel decoding with no output dependencies. This work proposes an autoregressive model with sub-linear parallel time generation. Noting that conditional random fields with bounded context can be decoded in parallel, we propose an efficient cascaded decoding approach for generating high-quality output. To parameterize this cascade, we introduce a Markov transformer, a variant of the popular fully autoregressive model that allows us to simultaneously decode with specific autoregressive context cutoffs. This approach requires only a small modification from standard autoregressive training, while showing competitive accuracy/speed tradeoff compared to existing methods on five machine translation datasets.

arxiv preprint arxiv, machine learning, natural language, (17 more...)

2006.01112

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > Vietnam > Hanoi > Hanoi (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.67)

Soemers, Dennis J. N. J., Piette, Éric, Stephenson, Matthew, Browne, Cameron

Manipulating the Distributions of Experience used for Self-Play Learning in Expert Iteration

arXiv.org Artificial IntelligenceMay-30-2020

Expert Iteration (ExIt) is an effective framework for learning game-playing policies from self-play. ExIt involves training a policy to mimic the search behaviour of a tree search algorithm - such as Monte-Carlo tree search - and using the trained policy to guide it. The policy and the tree search can then iteratively improve each other, through experience gathered in self-play between instances of the guided tree search algorithm. This paper outlines three different approaches for manipulating the distribution of data collected from self-play, and the procedure that samples batches for learning updates from the collected data. Firstly, samples in batches are weighted based on the durations of the episodes in which they were originally experienced. Secondly, Prioritized Experience Replay is applied within the ExIt framework, to prioritise sampling experience from which we expect to obtain valuable training signals. Thirdly, a trained exploratory policy is used to diversify the trajectories experienced in self-play. This paper summarises the effects of these manipulations on training performance evaluated in fourteen different board games. We find major improvements in early training performance in some games, and minor improvements averaged over fourteen games.

agent, artificial intelligence, experience buffer, (14 more...)

2006.00283

Country:

Europe > Netherlands > Limburg > Maastricht (0.04)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)