Search
Policy Search with Rare Significant Events: Choosing the Right Partner to Cooperate with
Ecoffet, Paul, Fontbonne, Nicolas, André, Jean-Baptiste, Bredeche, Nicolas
This paper focuses on a class of reinforcement learning problems where significant events are rare and limited to a single positive reward per episode. A typical example is that of an agent who has to choose a partner to cooperate with, while a large number of partners are simply not interested in cooperating, regardless of what the agent has to offer. We address this problem in a continuous state and action space with two different kinds of search methods: a gradient policy search method and a direct policy search method using an evolution strategy. We show that when significant events are rare, gradient information is also scarce, making it difficult for policy gradient search methods to find an optimal policy, with or without a deep neural architecture. On the other hand, we show that direct policy search methods are invariant to the rarity of significant events, which is yet another confirmation of the unique role evolutionary algorithms has to play as a reinforcement learning method.
Adapting User Interfaces with Model-based Reinforcement Learning
Todi, Kashyap, Bailly, Gilles, Leiva, Luis A., Oulasvirta, Antti
Adapting an interface requires taking into account both the positive and negative effects that changes may have on the user. A carelessly picked adaptation may impose high costs to the user -- for example, due to surprise or relearning effort -- or "trap" the process to a suboptimal design immaturely. However, effects on users are hard to predict as they depend on factors that are latent and evolve over the course of interaction. We propose a novel approach for adaptive user interfaces that yields a conservative adaptation policy: It finds beneficial changes when there are such and avoids changes when there are none. Our model-based reinforcement learning method plans sequences of adaptations and consults predictive HCI models to estimate their effects. We present empirical and simulation results from the case of adaptive menus, showing that the method outperforms both a non-adaptive and a frequency-based policy.
Monte Carlo Tree Search: A Review of Recent Modifications and Applications
Świechowski, Maciej, Godlewski, Konrad, Sawicki, Bartosz, Mańdziuk, Jacek
Monte Carlo Tree Search (MCTS) is a decision-making algorithm that consists in searching large combinatorial spaces represented by trees. In such trees, nodes denote states, also referred to as configurations of the problem, whereas edges denote transitions (actions) from one state to another. MCTS has been originally proposed in the work by Kocsis and Szepesvári (2006) and by Coulom (2006), as an algorithm for making computer players in Go. It was quickly called a major breakthrough (Gelly et al., 2012) as it allowed for a leap from 14 kyu, which is an average amateur level, to 5 dan, which is considered an advanced level but not professional yet. Before MCTS, bots for combinatorial games had been using various modifications of the min-max alpha-beta pruning algorithm (Junghanns, 1998) such as MTD(f) (Plaat, 2014) and hand-crafted heuristics. In contrast to them, MCTS algorithm is at its core aheuristic, which means that no additional knowledge is required other than just rules of a game (or a problem, generally speaking). However, it is possible to take advantage of heuristics and include them in the MCTS approach to make it more efficient and improve its convergence. Moreover, the given problem often tends to be so complex, from the combinatorial point of view, that some form of external help, e.g.
A Classical Search Game in Discrete Locations
Clarkson, Jake, Lin, Kyle Y., Glazebrook, Kevin D.
Consider a two-person zero-sum search game between a hider and a searcher. The hider hides among $n$ discrete locations, and the searcher successively visits individual locations until finding the hider. Known to both players, a search at location $i$ takes $t_i$ time units and detects the hider -- if hidden there -- independently with probability $q_i$, for $i=1,\ldots,n$. The hider aims to maximize the expected time until detection, while the searcher aims to minimize it. We prove the existence of an optimal strategy for each player. In particular, the hider's optimal mixed strategy hides in each location with a nonzero probability, and the searcher's optimal mixed strategy can be constructed with up to $n$ simple search sequences. We develop an algorithm to compute an optimal strategy for each player, and compare the optimal hiding strategy with the simple hiding strategy which gives the searcher no location preference at the beginning of the search.
Sparsification for Fast Optimal Multi-Robot Path Planning in Lazy Compilation Schemes
Path planning for multiple robots (MRPP) represents a task of finding non-colliding paths for robots through which they can navigate from their initial positions to specified goal positions. The problem is usually modeled using undirected graphs where robots move between vertices across edges. Contemporary optimal solving algorithms include dedicated search-based methods, that solve the problem directly, and compilation-based algorithms that reduce MRPP to a different formalism for which an efficient solver exists, such as constraint programming (CP), mixed integer programming (MIP), or Boolean satisfiability (SAT). In this paper, we enhance existing SAT-based algorithm for MRPP via spar-tification of the set of candidate paths for each robot from which target Boolean encoding is derived. Suggested sparsification of the set of paths led to smaller target Boolean formulae that can be constructed and solved faster while optimality guarantees of the approach have been kept.
Top 8 Approaches For Tuning Hyperparameters Of ML Models
Hyperparameter tuning is one of the fundamental steps in the machine learning routine. Also known as hyperparameter optimisation, the method entails searching for the best configuration of hyperparameters to enable optimal performance. Machine learning algorithms require user-defined inputs to achieve a balance between accuracy and generalisability. This process is known as hyperparameter tuning. There are various tools and approaches available to tune hyperparameters.
Approximation Algorithms for Active Sequential Hypothesis Testing
Gan, Kyra, Jia, Su, Li, Andrew
In the problem of active sequential hypotheses testing (ASHT), a learner seeks to identify the true hypothesis $h^*$ from among a set of hypotheses $H$. The learner is given a set of actions and knows the outcome distribution of any action under any true hypothesis. While repeatedly playing the entire set of actions suffices to identify $h^*$, a cost is incurred with each action. Thus, given a target error $\delta>0$, the goal is to find the minimal cost policy for sequentially selecting actions that identify $h^*$ with probability at least $1 - \delta$. This paper provides the first approximation algorithms for ASHT, under two types of adaptivity. First, a policy is partially adaptive if it fixes a sequence of actions in advance and adaptively decides when to terminate and what hypothesis to return. Under partial adaptivity, we provide an $O\big(s^{-1}(1+\log_{1/\delta}|H|)\log (s^{-1}|H| \log |H|)\big)$-approximation algorithm, where $s$ is a natural separation parameter between the hypotheses. Second, a policy is fully adaptive if action selection is allowed to depend on previous outcomes. Under full adaptivity, we provide an $O(s^{-1}\log (|H|/\delta)\log |H|)$-approximation algorithm. We numerically investigate the performance of our algorithms using both synthetic and real-world data, showing that our algorithms outperform a previously proposed heuristic policy.
Team formation techniques in education
Collaborative learning is gaining acceptance as one of the most successful educational approaches to learning. The basic idea is to organise learners in groups to work together and solve problems or complete tasks. There is ample evidence that when learners actively engage in discussions, listen to different viewpoints, and defend their positions, they better understand new concepts and learn faster. A particular case of collaborative learning is co-operative learning, where each student is responsible for at least one specific aspect or competence needed to solve the problem jointly. The student is improving her understanding through collaboration with others and is also responsible for the group's success concerning the aspect she is responsible for.
Learning to Schedule DAG Tasks
Hua, Zhigang, Qi, Feng, Liu, Gan, Yang, Shuang
Scheduling computational tasks represented by directed acyclic graphs (DAGs) is challenging because of its complexity. Conventional scheduling algorithms rely heavily on simple heuristics such as shortest job first (SJF) and critical path (CP), and are often lacking in scheduling quality. In this paper, we present a novel learning-based approach to scheduling DAG tasks. The algorithm employs a reinforcement learning agent to iteratively add directed edges to the DAG, one at a time, to enforce ordering (i.e., priorities of execution and resource allocation) of "tricky" job nodes. By doing so, the original DAG scheduling problem is dramatically reduced to a much simpler proxy problem, on which heuristic scheduling algorithms such as SJF and CP can be efficiently improved. Our approach can be easily applied to any existing heuristic scheduling algorithms. On the benchmark dataset of TPC-H, we show that our learning based approach can significantly improve over popular heuristic algorithms and consistently achieves the best performance among several methods under a variety of settings.