Goto

Collaborating Authors

 Search


Multi-objective Model-based Policy Search for Data-efficient Learning with Sparse Rewards

arXiv.org Artificial Intelligence

The most data-efficient algorithms for reinforcement learning in robotics are model-based policy search algorithms, which alternate between learning a dynamical model of the robot and optimizing a policy to maximize the expected return given the model and its uncertainties. However, the current algorithms lack an effective exploration strategy to deal with sparse or misleading reward scenarios: if they do not experience any state with a positive reward during the initial random exploration, it is very unlikely to solve the problem. Here, we propose a novel model-based policy search algorithm, Multi-DEX, that leverages a learned dynamical model to efficiently explore the task space and solve tasks with sparse rewards in a few episodes. To achieve this, we frame the policy search problem as a multi-objective, model-based policy optimization problem with three objectives: (1) generate maximally novel state trajectories, (2) maximize the expected return and (3) keep the system in state-space regions for which the model is as accurate as possible. We then optimize these objectives using a Pareto-based multi-objective optimization algorithm. The experiments show that Multi-DEX is able to solve sparse reward scenarios (with a simulated robotic arm) in much lower interaction time than VIME, TRPO, GEP-PG, CMA-ES and Black-DROPS.


Q-DeckRec: A Fast Deck Recommendation System for Collectible Card Games

arXiv.org Artificial Intelligence

Deck building is a crucial component in playing Collectible Card Games (CCGs). The goal of deck building is to choose a fixed-sized subset of cards from a large card pool, so that they work well together in-game against specific opponents. Existing methods either lack flexibility to adapt to different opponents or require large computational resources, still making them unsuitable for any real-time or large-scale application. We propose a new deck recommendation system, named Q-DeckRec, which learns a deck search policy during a training phase and uses it to solve deck building problem instances. Our experimental results demonstrate Q-DeckRec requires less computational resources to build winning-effective decks after a training phase compared to several baseline methods.


Finding Optimal Solutions to Token Swapping by Conflict-based Search and Reduction to SAT

arXiv.org Artificial Intelligence

We study practical approaches to solving the token swapping (TSWAP) problem optimally in this short paper. In TSWAP, we are given an undirected graph with colored vertices. A colored token is placed in each vertex. A pair of tokens can be swapped between adjacent vertices. The goal is to perform a sequence of swaps so that token and vertex colors agree across the graph. The minimum number of swaps is required in the optimization variant of the problem. We observed similarities between the TSWAP problem and multi-agent path finding (MAPF) where instead of tokens we have multiple agents that need to be moved from their current vertices to given unique target vertices. The difference between both problems consists in local conditions that state transitions (swaps/moves) must satisfy. We developed two algorithms for solving TSWAP optimally by adapting two different approaches to MAPF - CBS and MDD- SAT. This constitutes the first attempt to design optimal solving algorithms for TSWAP. Experimental evaluation on various types of graphs shows that the reduction to SAT scales better than CBS in optimal TSWAP solving.


Machine taught itself to solve Rubik's Cube without human help, UC Irvine researchers say

#artificialintelligence

Two algorithms, collectively called Deep Cube, typically can solve the 3-D combination puzzle within 30 moves, which is less than or equal to systems that use human knowledge, according to the research paper. Less than 5.8% of the world's population can solve the Rubik's Cube, according to the Rubik's website.


An Improved Generic Bet-and-Run Strategy for Speeding Up Stochastic Local Search

arXiv.org Artificial Intelligence

A commonly used strategy for improving optimization algorithms is to restart the algorithm when it is believed to be trapped in an inferior part of the search space. Building on the recent success of Bet-and-Run approaches for restarted local search solvers, we introduce an improved generic Bet-and-Run strategy. The goal is to obtain the best possible results within a given time budget t using a given black-box optimization algorithm. If no prior knowledge about problem features and algorithm behavior is available, the question about how to use the time budget most efficiently arises. We propose to first start k>=1 independent runs of the algorithm during an initialization budget t1=0 time units in doing so), and then continuing these runs for the remaining t3=t-t1-t2 time units. In previous Bet-and-Run strategies, the decision maker D=currentBest would simply select the run with the best- so-far results at negligible time. We propose using more advanced methods to discriminate between "good" and "bad" sample runs, with the goal of increasing the correlation of the chosen run with the a-posteriori best one. We test several different approaches, including neural networks trained or polynomials fitted on the current trace of the algorithm to predict which run may yield the best results if granted the remaining budget. We show with extensive experiments that this approach can yield better results than the previous methods, but also find that the currentBest method is a very reliable and robust baseline approach.


Semantic Indexing: Google's Big Data Trick For Multilingual Search Results

#artificialintelligence

Google has perfected its ability to execute web search results for its users all over the world. In the early days of the Internet, the search engine was primarily suited for displaying search results for English users. Non-English-speaking users have complained that search results are often displayed in the wrong language entirely. However, Google is becoming more proficient at providing search results in other languages as well. A lot of factors can play a role, but one of the biggest is its use of deep learning to understand semantic references--enter semantic indexing. This can now be accomplished in any language that Google serves.


Bayesian Optimization of Combinatorial Structures

arXiv.org Machine Learning

The optimization of expensive-to-evaluate black-box functions over combinatorial structures is an ubiquitous task in machine learning, engineering and the natural sciences. The combinatorial explosion of the search space and costly evaluations pose challenges for current techniques in discrete optimization and machine learning, and critically require new algorithmic ideas (NIPS BayesOpt 2017). This article proposes, to the best of our knowledge, the first algorithm to overcome these challenges, based on an adaptive, scalable model that identifies useful combinatorial structure even when data is scarce. Our acquisition function pioneers the use of semidefinite programming to achieve efficiency and scalability. Experimental evaluations demonstrate that this algorithm consistently outperforms other methods from combinatorial and Bayesian optimization.


AD*-Cut: A Search-Tree Cutting Anytime Dynamic A* Algorithm

AAAI Conferences

This paper presents a new anytime incremental search algorithm, AD*-Cut. AD*-Cut is based on two algorithms, namely, Anytime Repairing A* (ARA*) and the novel incremental search algorithm, D* Extra Lite. D* Extra Lite reinitializes (cuts) entire search-tree branches that have been affected by changes in an environment, and D* Extra Lite appears to be quicker than the reinitialization during the search utilized by the popular incremental search algorithm, D* Lite. The search-tree branch cutting is a simple and robust technique that can be easily applied to ARA*. Consequently, AD*-Cut extends D* Extra Lite in the same manner, as the state-of-the-art Anytime D* (AD*) algorithm extends D* Lite. The benchmark results suggest that AD*-Cut is quicker and achieves shorter paths than AD* when used for path planning on 3D state-lattices (a 2D position with rotation).


Local Search for Flowshops with Setup Times and Blocking Constraints

AAAI Conferences

Permutation flowshop scheduling problem (PFSP) is a classical combinatorial optimisation problem. There exist variants of PFSP to capture different realistic scenarios, but significant modelling gaps still remain with respect to real-world industrial applications such as the cider production line. In this paper, we propose a new PFSP variant that adequately models both overlapable sequence-dependent setup times (SDST) and mixed blocking constraints. We propose a computational model for makespan minimisation of the new PFSP variant and show that the time complexity is NP Hard. We then develop a constraint-guided local search algorithm that uses a new intensifying restart technique along with variable neighbourhood descent and greedy selection. The experimental study indicates that the proposed algorithm, on a set of wellknown benchmark instances, significantly outperforms the state-of-the-art search algorithms for PFSP.


Symbolic Planning with Edge-Valued Multi-Valued Decision Diagrams

AAAI Conferences

Symbolic representations have attracted significant attention in optimal planning. Binary Decision Diagrams (BDDs) form the basis for symbolic search algorithms. Closely related are Algebraic Decision Diagrams (ADDs), used to represent heuristic functions. Also, progress was made in dealing with models that take state-dependent action costs into account. Here, costs are represented as Edge-valued Multi-valued Decision Diagrams (EVMDDs), which can be exponentially more compact than the corresponding ADD representation. However, they were not yet considered for symbolic planning. In this work, we study EVMDD-based symbolic search for optimal planning. We define EVMDD-based representations of state sets and transition relations, and show how to compute the necessary operations required for EVMDD-A*. This EVMDD-based version of symbolic A* generalizes its BDD variant, and allows to solve planning tasks with state-dependent action costs. We prove theoretically that our approach is sound, complete and optimal. Additionally, we present an empirical analysis comparing EVMDD-A* to BDD-A* and explicit A* search. Our results underscore the usefulness of symbolic approaches and the feasibility of dealing with models that go beyond unit costs.