Goto

Collaborating Authors

 Reinforcement Learning


Sequential Bayesian optimal experimental design via approximate dynamic programming

arXiv.org Machine Learning

The design of multiple experiments is commonly undertaken via suboptimal strategies, such as batch (open-loop) design that omits feedback or greedy (myopic) design that does not account for future effects. This paper introduces new strategies for the optimal design of sequential experiments. First, we rigorously formulate the general sequential optimal experimental design (sOED) problem as a dynamic program. Batch and greedy designs are shown to result from special cases of this formulation. We then focus on sOED for parameter inference, adopting a Bayesian formulation with an information theoretic design objective. To make the problem tractable, we develop new numerical approaches for nonlinear design with continuous parameter, design, and observation spaces. We approximate the optimal policy by using backward induction with regression to construct and refine value function approximations in the dynamic program. The proposed algorithm iteratively generates trajectories via exploration and exploitation to improve approximation accuracy in frequently visited regions of the state space. Numerical results are verified against analytical solutions in a linear-Gaussian setting. Advantages over batch and greedy design are then demonstrated on a nonlinear source inversion problem where we seek an optimal policy for sequential sensing.


Good Robot! Elon Musk's AI Nonprofit Shows Where AI Is Going

#artificialintelligence

The next big trend in AI looks likely to be computers and robots that teach themselves through trial and error. Elon Musk and Sam Altman (of Y Combinator) caused a stir last December by luring several high-profile researchers to join OpenAI, a billion-dollar nonprofit dedicated to releasing cutting-edge artificial intelligence research for free. Today the nonprofit released the first fruits of its work, and it suggests that kind of learning will be important for the future of AI. The nonprofit has released a tool called OpenAI Gym for developing and comparing different so-called reinforcement learning algorithms, which provide a way for a machine to learn through positive and negative feedback. This week OpenAI also announced two new recruits, including Pieter Abbeel, an associate professor at Berkeley and a leading expert on applying reinforcement learning to robots. OpenAI Gym includes code and examples to help others get started with reinforcement learning.


NIPS 2015 Review

#artificialintelligence

NIPS 2015 was bigger than ever, literally: at circa 3700 attendees this was roughly twice as many attendees as last year, which in turn was roughly twice as many as the previous year. This is clearly unsustainable, but given the frenzied level of vendor and recruiting activities, perhaps there is room to grow. The main conference is single track, however, and already 3 days long: so even more action is moving to the poster sessions, which along with the workshops creates the feel of a diverse collection of smaller conferences. Obviously, my view of the action will be highly incomplete and biased towards my own interests. Reinforcement learning continues to ascend, extending the enthusiasm and energy from ICML.


CS 294 Deep Reinforcement Learning, Fall 2015

@machinelearnbot

This course will assume some familiarity with reinforcement learning, numerical optimization and machine learning. Students who are not familiar with the concepts below are encouraged to brush up using the references provided right below this list. We'll review this material in class, but it will be rather cursory. The assignments will be provided as Jupyter (formerly called IPython) notebooks (docs) and will use NumPy (docs) with Python 2.7. You may find the following tutorial helpful (from Stanford CS231): Python/Numpy.


rllab/rllab

@machinelearnbot

Documentation is available online: https://rllab.readthedocs.org/en/latest/.


yenchenlin1994/DeepLearningFlappyBird

#artificialintelligence

This project follows the description of the Deep Q Learning algorithm described in Playing Atari with Deep Reinforcement Learning [2] and shows that this learning algorithm can be further generalized to the notorious Flappy Bird. It is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards. Since deep Q-network is trained on the raw pixel values observed from the game screen at each time step, [3] finds that remove the background appeared in the original game can make it converge faster. The architecture of the network is shown in the figure below. The first layer convolves the input image with an 8x8x4x32 kernel at a stride size of 4. The output is then put through a 2x2 max pooling layer.


General Artificial Intelligence Trading Algorithm

#artificialintelligence

The DeepFund Agent will make trading decisions directly from raw market data using Deep Learning, Deep Reinforcement Learning and Unsupervised Learning. The Agent was ordered to maximize the value of our bank account... The DeepFund Agent learns to trade from its experience and improves itself to a superhuman level.


Affective Personalization of a Social Robot Tutor for Childrenโ€™s Second Language Skills

AAAI Conferences

Though substantial research has been dedicated towards using technology to improve education, no current methods are as effective as one-on-one tutoring. A critical, though relatively understudied, aspect of effective tutoring is modulating the student's affective state throughout the tutoring session in order to maximize long-term learning gains. We developed an integrated experimental paradigm in which children play a second-language learning game on a tablet, in collaboration with a fully autonomous social robotic learning companion. As part of the system, we measured children's valence and engagement via an automatic facial expression analysis system. These signals were combined into a reward signal that fed into the robot's affective reinforcement learning algorithm. Over several sessions, the robot played the game and personalized its motivational strategies (using verbal and non-verbal actions) to each student. We evaluated this system with 34 children in preschool classrooms for a duration of two months. We saw that (1) children learned new words from the repeated tutoring sessions, (2) the affective policy personalized to students over the duration of the study, and (3) students who interacted with a robot that personalized its affective feedback strategy showed a significant increase in valence, as compared to students who interacted with a non-personalizing robot. This integrated system of tablet-based educational content, affective sensing, affective policy learning, and an autonomous social robot holds great promise for a more comprehensive approach to personalized tutoring.


Incremental Stochastic Factorization for Online Reinforcement Learning

AAAI Conferences

A construct that has been receiving attention recently in reinforcement learning is stochastic factorization (SF), a particular case of non-negative factorization (NMF) in which the matrices involved are stochastic. The idea is to use SF to approximate the transition matrices of a Markov decision process (MDP). This is useful for two reasons. First, learning the factors of the SF instead of the transition matrices can reduce significantly the number of parameters to be estimated. Second, it has been shown that SF can be used to reduce the number of operations needed to compute an MDP's value function. Recently, an algorithm called expectation-maximization SF (EMSF) has been proposed to compute a SF directly from transitions sampled from an MDP. In this paper we take a closer look at EMSF. First, by exploiting the assumptions underlying the algorithm, we show that it is possible to reduce it to simple multiplicative update rules similar to the ones that helped popularize NMF. Second, we analyze the optimization process underlying EMSF and find that it minimizes a modified version of the Kullback-Leibler divergence that is particularly well-suited for learning a SF from data sampled from an arbitrary distribution. Third, we build on this improved understanding of EMSF to draw an interesting connection with NMF and probabilistic latent semantic analysis. We also exploit the simplified update rules to introduce a new version of EMSF that generalizes and significantly improves its precursor. This new algorithm provides a practical mechanism to control the trade-off between memory usage and computing time, essentially freeing the space complexity of EMSF from its dependency on the number of sample transitions. The algorithm can also compute its approximation incrementally, which makes it possible to use it concomitantly with the collection of data. This feature makes the new version of EMSF particularly suitable for online reinforcement learning. Empirical results support the utility of the proposed algorithm.


Robust Learning from Demonstration Techniques and Tools

AAAI Conferences

Large state spaces and the curse of dimensionality contribute to the complexity of a task. Learning from demonstration techniques can be combined with reinforcement learning to narrow the exploration space of an agent, but require consistent and accurate demonstrations, as well as the state-action pairs for an entire demonstration. Individuals with severe motor disabilities are often slow and prone to human errors in demonstrations while teaching. My dissertation develops tools to allow persons with severe motor disabilities, and individuals in general, to train these systems. To handle these large state spaces as well as human error, we developed Dimensionality Reduced Reinforcement Learning. To accommodate slower feedback, we will develop a movie-reel style learning from demonstration interface.