Goto

Collaborating Authors

 Reinforcement Learning


Show, Attend and Interact: Perceivable Human-Robot Social Interaction through Neural Attention Q-Network

arXiv.org Machine Learning

For a safe, natural and effective human-robot social interaction, it is essential to develop a system that allows a robot to demonstrate the perceivable responsive behaviors to complex human behaviors. We introduce the Multimodal Deep Attention Recurrent Q-Network using which the robot exhibits human-like social interaction skills after 14 days of interacting with people in an uncontrolled real world. Each and every day during the 14 days, the system gathered robot interaction experiences with people through a hit-and-trial method and then trained the MDARQN on these experiences using end-to-end reinforcement learning approach. The results of interaction based learning indicate that the robot has learned to respond to complex human behaviors in a perceivable and socially acceptable manner.


Asynchronous n-steps Q-learning

#artificialintelligence

Q-learning is the most famous Temporal Difference algorithm. Original Q-learning algorithm tries to determine the state-action value function that minimizes the error below. We will use an optimizer (the simplest one- Gradient Descent) to compute the values of the state-action function. First of all we need to compute the gradient of the loss function. Gradient descent finds the minimum of a function by subtracting the gradient, with respect to the parameters of the function, from the parameters.


Two Timescale Stochastic Approximation with Controlled Markov noise and Off-policy temporal difference learning

arXiv.org Artificial Intelligence

Stochastic approximation algorithms are sequential nonparametric methods for finding a zero or minimum of a function in the situation where only the noisy observations of the function values are available. Two timescale stochastic approximation algorithms represent one of the most general subclasses of stochastic approximation methods. These algorithms consist of two coupled recursions which are updated with different (one is considerably smaller than the other) step sizes which in turn facilitate convergence for such algorithms. Two timescale stochastic approximation algorithms [19] have successfully been applied to several complex problems arising in the areas of reinforcement learning, signal processing and admission control in communication networks. There are many reinforcement learning applications (precisely those where parameterization of value function is implemented) where non-additive Markov noise is present in one or both iterates thus requiring the current two timescale framework to be extended to include Markov noise (for example, in [13, p. 5] it is mentioned that in order to generalize the analysis to Markov noise, the theory of two timescale stochastic approximation needs to include the latter).


MDP and Reinforcement Learning

#artificialintelligence

In this first post, I will write about the basics of Markov Decision Process (MDP) and Reinforcement Learning (RL). Markov Decision Process is a mathematical framework for modeling decision-making. The basic problem in MDP is to find a policy for the decision maker, which is defined as ฯ€(s) P(a s). That means that policy is a function of state s. Our goal is to find the optimal policy.


Video Friday: Giant Robot Bug, SpaceX Rocket Landing, and Flamethrower Drone

IEEE Spectrum Robotics

Video Friday is your weekly selection of awesome robotics videos, collected by your Automaton bloggers. We'll also be posting a weekly calendar of upcoming robotics events for the next two months; here's what we have so far (send us your events!): Let us know if you have suggestions for next week, and enjoy today's videos. It moves just a little too quickly for my general level of comfort around robots that look like giant bugs. I don't think it's recognized widely enough that SpaceX is building giant rockety robots with a whole bunch of very sophisticated autonomy going on: At some point, someone in a meeting said, "The best way to solve this problem is by putting a flamethrower on a drone."


Changing Model Behavior at Test-Time Using Reinforcement Learning

arXiv.org Machine Learning

A computer vision model operating on an embedded device may need to perform real-time inference; a translation model operating on a cell phone may wish to bound its average compute time in order to be power-efficient. In these cases, there is often a tension between satisfying the constraint and achieving acceptable model performance. These constraints need not be restricted to speed and accuracy, but can reflect preferences for model simplicity or other desiderata. One way to deal with constraints is to build them into models explicitly at training time. This has two major disadvantages: First, it requires manually designing and retraining a new model for each use case.


Control of Gene Regulatory Networks with Noisy Measurements and Uncertain Inputs

arXiv.org Machine Learning

This paper is concerned with the problem of stochastic control of gene regulatory networks (GRNs) observed indirectly through noisy measurements and with uncertainty in the intervention inputs. The partial observability of the gene states and uncertainty in the intervention process are accounted for by modeling GRNs using the partially-observed Boolean dynamical system (POBDS) signal model with noisy gene expression measurements. Obtaining the optimal infinite-horizon control strategy for this problem is not attainable in general, and we apply reinforcement learning and Gaussian process techniques to find a near-optimal solution. The POBDS is first transformed to a directly-observed Markov Decision Process in a continuous belief space, and the Gaussian process is used for modeling the cost function over the belief and intervention spaces. Reinforcement learning then is used to learn the cost function from the available gene expression data. In addition, we employ sparsification, which enables the control of large partially-observed GRNs. The performance of the resulting algorithm is studied through a comprehensive set of numerical experiments using synthetic gene expression data generated from a melanoma gene regulatory network.


Robot gains Social Intelligence through Multimodal Deep Reinforcement Learning

arXiv.org Machine Learning

Human-robot interaction (HRI) is an emerging field of research with the aim to integrate robots into human social environments. One of the biggest challenges in the development of social robots is to understand human social norms [1]. It is therefore essential for social robots to possess deep models of social cognition, and be able to learn and adapt in accordance with their shared experiences with human partners. Most of the social robots to date are either preprogrammed, or are controlled by teleoperation or semiautonomous teleoperation [2], and do not possess the ability to learn and update themselves. Designing an adaptable and autonomous sociable robot is particularly challenging, as the robot needs to correctly interpret human behaviors as well as respond appropriately to them.


Consistent On-Line Off-Policy Evaluation

arXiv.org Machine Learning

The problem of on-line off-policy evaluation (OPE) has been actively studied in the last decade due to its importance both as a stand-alone problem and as a module in a policy improvement scheme. However, most Temporal Difference (TD) based solutions ignore the discrepancy between the stationary distribution of the behavior and target policies and its effect on the convergence limit when function approximation is applied. In this paper we propose the Consistent Off-Policy Temporal Difference (COP-TD($\lambda$, $\beta$)) algorithm that addresses this issue and reduces this bias at some computational expense. We show that COP-TD($\lambda$, $\beta$) can be designed to converge to the same value that would have been obtained by using on-policy TD($\lambda$) with the target policy. Subsequently, the proposed scheme leads to a related and promising heuristic we call log-COP-TD($\lambda$, $\beta$). Both algorithms have favorable empirical results to the current state of the art on-line OPE algorithms. Finally, our formulation sheds some new light on the recently proposed Emphatic TD learning.


Deep learning boosted AI. Now the next big thing in machine intelligence is coming

#artificialintelligence

Inside a simple computer simulation, a group of self-driving cars are performing a crazy-looking maneuver on a four-lane virtual highway. Half are trying to move from the right-hand lanes just as the other half try to merge from the left. It seems like just the sort of tricky thing that might flummox a robot vehicle, but they manage it with precision. I'm watching the driving simulation at the biggest artificial-intelligence conference of the year, held in Barcelona this past December. What's most amazing is that the software governing the cars' behavior wasn't programmed in the conventional sense at all.