Goto

Collaborating Authors

 Reinforcement Learning


A Generalized Reinforcement Learning Algorithm for Online 3D Bin-Packing

arXiv.org Artificial Intelligence

We propose a Deep Reinforcement Learning (Deep RL) algorithm for solving the online 3D bin packing problem for an arbitrary number of bins and any bin size. The focus is on producing decisions that can be physically implemented by a robotic loading arm, a laboratory prototype used for testing the concept. The problem considered in this paper is novel in two ways. First, unlike the traditional 3D bin packing problem, we assume that the entire set of objects to be packed is not known a priori. Instead, a fixed number of upcoming objects is visible to the loading system, and they must be loaded in the order of arrival. Second, the goal is not to move objects from one point to another via a feasible path, but to find a location and orientation for each object that maximises the overall packing efficiency of the bin(s). Finally, the learnt model is designed to work with problem instances of arbitrary size without retraining. Simulation results show that the RL-based method outperforms state-of-the-art online bin packing heuristics in terms of empirical competitive ratio and volume efficiency.


Exploring Exploration: Comparing Children with RL Agents in Unified Environments

arXiv.org Artificial Intelligence

Research in developmental psychology consistently shows that children explore the world thoroughly and efficiently and that this exploration allows them to learn. In turn, this early learning supports more robust generalization and intelligent behavior later in life. While much work has gone into developing methods for exploration in machine learning, artificial agents have not yet reached the high standard set by their human counterparts. In this work we propose using DeepMind Lab (Beattie et al., 2016) as a platform to directly compare child and agent behaviors and to develop new exploration techniques. We outline two ongoing experiments to demonstrate the effectiveness of a direct comparison, and outline a number of open research questions that we believe can be tested using this methodology.


Student-Teacher Curriculum Learning via Reinforcement Learning: Predicting Hospital Inpatient Admission Location

arXiv.org Machine Learning

Accurate and reliable prediction of hospital admission location is important due to resource-constraints and space availability in a clinical setting, particularly when dealing with patients who come from the emergency department. In this work we propose a student-teacher network via reinforcement learning to deal with this specific problem. A representation of the weights of the student network is treated as the state and is fed as an input to the teacher network. The teacher network's action is to select the most appropriate batch of data to train the student network on from a training set sorted according to entropy. By validating on three datasets, not only do we show that our approach outperforms state-of-the-art methods on tabular data and performs competitively on image recognition, but also that novel curricula are learned by the teacher network. We demonstrate experimentally that the teacher network can actively learn about the student network and guide it to achieve better performance than if trained alone.


Bandit Linear Control

arXiv.org Machine Learning

Reinforcement learning studies sequential decision making problems where a learning agent repeatedly interacts with an environment and aims to improve her strategy over time based on the received feedback. One of the most fundamental tradeoffs in reinforcement learning theory is the exploration vs. exploitation tradeoff, that arises whenever the learner observes only partial feedback after each of her decisions, thus having to balance between exploring new strategies and exploiting those that are already known to perform well. The most basic and well-studied form of partial feedback is the so-called "bandit" feedback, where the learner only observes the cost of her chosen action on each decision round, while obtaining no information about the performance of other actions. Traditionally, the environment dynamics in reinforcement learning are modeled as a Markov Decision Process (MDP) with a finite number of possible states and actions. The MDP model has been studied and analyzed in numerous different settings and under various assumptions on the transition parameters, the nature of the reward functions, and the feedback model. Recently, a particular focus has been given to continuous state-action MDPs, and in particular, to a specific family of models in classic control where the state transition function is linear.


Sequential Transfer in Reinforcement Learning with a Generative Model

arXiv.org Machine Learning

We are interested in how to design reinforcement learning agents that provably reduce the sample complexity for learning new tasks by transferring knowledge from previously-solved ones. The availability of solutions to related problems poses a fundamental trade-off: whether to seek policies that are expected to achieve high (yet sub-optimal) performance in the new task immediately or whether to seek information to quickly identify an optimal solution, potentially at the cost of poor initial behavior. In this work, we focus on the second objective when the agent has access to a generative model of state-action pairs. First, given a set of solved tasks containing an approximation of the target one, we design an algorithm that quickly identifies an accurate solution by seeking the state-action pairs that are most informative for this purpose. We derive PAC bounds on its sample complexity which clearly demonstrate the benefits of using this kind of prior knowledge. Then, we show how to learn these approximate tasks sequentially by reducing our transfer setting to a hidden Markov model and employing spectral methods to recover its parameters. Finally, we empirically verify our theoretical findings in simple simulated domains.


Interaction-limited Inverse Reinforcement Learning

arXiv.org Machine Learning

Learning from Demonstrations (LfD) is an active research area that addresses the problem of learning how to perform a task by observing the demonstrations provided by an expert. This approach plays an important role in many real-life learning settings, including human-to-robot interaction [1, 2, 3, 4, 5]. The two popular approaches for LfD include (i) behavioral cloning, which directly mimics the expert behavior, without understanding the objective [6], and (ii) inverse reinforcement learning (IRL), which infers the reward function (i.e., the objective of the task) explaining the expert behavior [7]. In this work, we focus on the IRL approach to LfD. Typically, the IRL learner assumes that the demonstrated expert behavior is optimal with respect to some reward function, even if the reward function cannot be specified explicitly as in typical reinforcement learning (RL).


Fundamental Limits of Adversarial Learning

arXiv.org Machine Learning

Robustness of machine learning methods is essential for modern practical applications. Given the arms race between attack and defense methods, one may be curious regarding the fundamental limits of any defense mechanism. In this work, we focus on the problem of learning from noise-injected data, where the existing literature falls short by either assuming a specific attack method or by over-specifying the learning problem. We shed light on the information-theoretic limits of adversarial learning without assuming a particular learning process or attacker. Finally, we apply our general bounds to a canonical set of non-trivial learning problems and provide examples of common types of attacks.


Reinforcement Learning: Scaling Personalized Marketing

#artificialintelligence

Personalized marketing for retail consumers and account-based marketing for B2B customers now have proven value. Online interactions with customers generate large volumes of data for granular learning about consumer behavior for customization of product recommendations, messages, and content. The missing piece is a scalable and just-in-time way to gauge customer preferences and make product recommendations while visitors engage with websites. Deep reinforcement learning algorithms have been trained at the threshold level where they begin to achieve conversion rates to match the costs of data analysis. The touchstone of reinforcement learning (RL) is that it experiments with multiple pathways to achieve the objective of acquiring customers or any other goal.


The ingredients of real world robotic reinforcement learning

AIHub

Robots have been useful in environments that can be carefully controlled, such as those commonly found in industrial settings (e.g. assembly lines). However, in unstructured settings like the home, we need robotic systems that are adaptive to the diversity of the real world. Learning-based algorithms have the potential to enable robots to acquire complex behaviors adaptively in unstructured environments, by leveraging data collected from the environment. In particular, with reinforcement learning, robots learn novel behaviors through trial and error interactions. This is particularly important as we deploy robots in scenarios where the environment may not be known.


Reinforcement Learning: A Brief Introduction to Rules and Applications

#artificialintelligence

The brain of a human child is spectacularly amazing. Even in any previously unknown situation, the brain makes a decision based on its primal knowledge. Depending on the outcome, it learns and remembers the most optimal choices to be taken in that particular scenario. On a high level, this process of learning can be understood as a ’trial and error’ process, where the brain tries to maximise the occurrence of positive outcomes.