Goto

Collaborating Authors

 Reinforcement Learning


Employing Batch Reinforcement Learning to Control Gene Regulation Without Explicitly Constructing Gene Regulatory Networks

AAAI Conferences

The goal of controlling a gene regulatory network (GRN) is to generate an intervention strategy, i.e., a control policy, such that by applying the policy the system will avoid undesirable states. In this work, we propose a method to control GRNs by using Batch Mode Reinforcement Learning (Batch RL). Our idea is based on the fact that time series gene expression data can actually be interpreted as a sequence of experience tuples collected from the environment. Existing studies on this control task try to infer a model using gene expression data and then calculate a control policy over the constructed model. However, we propose a method that can directly use the available gene expression data to obtain an approximated control policy for gene regulation that avoids the time consuming model building phase. Results show that we can obtain policies for gene regulation systems of several thousands of genes just in several seconds while existing solutions get stuck for even tens of genes. Interestingly, the reported results also show that our method produces policies that are almost as good as the ones generated by existing model dependent methods.


Linear Bayesian Reinforcement Learning

AAAI Conferences

This paper proposes a simple linear Bayesian approach to reinforcement learning. We show that with an appropriate basis, a Bayesian linear Gaussian model is sufficient for accurately estimating the system dynamics, and in particular when we allow for correlated noise. Policies are estimated by first sampling a transition model from the current posterior, and then performing approximate dynamic programming on the sampled model. This form of approximate Thompson sampling results in good exploration in unknown environments. The approach can also be seen as a Bayesian generalisation of least-squares policy iteration, where the empirical transition matrix is replaced with a sample from the posterior.


Online Expectation Maximization for Reinforcement Learning in POMDPs

AAAI Conferences

We present online nested expectation maximization for model-free reinforcement learning in a POMDP. The algorithm evaluates the policy only in the current learning episode, discarding the episode after the evaluation and memorizing the sufficient statistic, from which the policy is computed in closed-form. As a result, the online algorithm has a time complexity O ( n ) and a memory complexity O (1), compared to O ( n 2 ) and O ( n ) for the corresponding batch-mode algorithm, where $n$ is the number of learning episodes. The online algorithm, which has a provable convergence, is demonstrated on five benchmark POMDP problems.


A General Framework for Interacting Bayes-Optimally with Self-Interested Agents using Arbitrary Parametric Model and Model Prior

AAAI Conferences

Recent advances in Bayesian reinforcement learning (BRL) have shown that Bayes-optimality is theoretically achievable by modeling the environment's latent dynamics using Flat-Dirichlet-Multinomial (FDM) prior. In self-interested multi-agent environments, the transition dynamics are mainly controlled by the other agent's stochastic behavior for which FDM's independence and modeling assumptions do not hold. As a result, FDM does not allow the other agent's behavior to be generalized across different states nor specified using prior domain knowledge. To overcome these practical limitations of FDM, we propose a generalization of BRL to integrate the general class of parametric models and model priors, thus allowing practitioners' domain knowledge to be exploited to produce a fine-grained and compact representation of the other agent's behavior. Empirical evaluation shows that our approach outperforms existing multi-agent reinforcement learning algorithms.


Bayesian Nonparametric Feature Construction for Inverse Reinforcement Learning

AAAI Conferences

Most of the algorithms for inverse reinforcement learning (IRL) assume that the reward function is a linear function of the pre-defined state and action features. However, it is often difficult to manually specify the set of features that can make the true reward function representable as a linear function. We propose a Bayesian nonparametric approach to identifying useful composite features for learning the reward function. The composite features are assumed to be the logical conjunctions of the predefined atomic features so that we can represent the reward function as a linear function of the composite features. We empirically show that our approach is able to learn composite features that capture important aspects of the reward function on synthetic domains, and predict taxi drivers’ behaviour with high accuracy on a real GPS trace dataset.


Regret Bounds for Reinforcement Learning with Policy Advice

arXiv.org Machine Learning

In some reinforcement learning problems an agent may be provided with a set of input policies, perhaps learned from prior experience or provided by advisors. We present a reinforcement learning with policy advice (RLPA) algorithm which leverages this input set and learns to use the best policy in the set for the reinforcement learning task at hand. We prove that RLPA has a sub-linear regret of \tilde O(\sqrt{T}) relative to the best input policy, and that both this regret and its computational complexity are independent of the size of the state and action space. Our empirical simulations support our theoretical analysis. This suggests RLPA may offer significant advantages in large domains where some prior good policies are provided.


Probabilistic inverse reinforcement learning in unknown environments

arXiv.org Machine Learning

We consider the problem of learning by demonstration from agents acting in unknown stochastic Markov environments or games. Our aim is to estimate agent preferences in order to construct improved policies for the same task that the agents are trying to solve. To do so, we extend previous probabilistic approaches for inverse reinforcement learning in known MDPs to the case of unknown dynamics or opponents. We do this by deriving two simplified probabilistic models of the demonstrator's policy and utility. For tractability, we use maximum a posteriori estimation rather than full Bayesian inference. Under a flat prior, this results in a convex optimisation problem. We find that the resulting algorithms are highly competitive against a variety of other methods for inverse reinforcement learning that do have knowledge of the dynamics.


Computational Model of Music Sight Reading: A Reinforcement Learning Approach

arXiv.org Artificial Intelligence

Although the Music Sight Reading process has been studied from the cognitive psychology view points, but the computational learning methods like the Reinforcement Learning have not yet been used to modeling of such processes. In this paper, with regards to essential properties of our specific problem, we consider the value function concept and will indicate that the optimum policy can be obtained by the method we offer without to be getting involved with computing of the complex value functions. Also, we will offer a normative behavioral model for the interaction of the agent with the musical pitch environment and by using a slightly different version of Partially observable Markov decision processes we will show that our method helps for faster learning of state-action pairs in our implemented agents.


An Ensemble of Linearly Combined Reinforcement-Learning Agents

AAAI Conferences

Reinforcement-learning (RL) algorithms are often tweaked and tunedto specific environments when applied, calling into question whetherlearning can truly be considered autonomous in these cases. In thiswork, we show how more robust learning across environments is possibleby adopting an ensemble approach to reinforcement learning. Our approachlearns a weighted linear combination of Q-values from multiple independentlearning algorithms. In our evaluations in generalized RL environments,we find that the algorithm compares favorably to the best tuned algorithm.Our work provides a promising basis for further study into the useof ensemble methods in RL.


Smart Charging of Electric Vehicles using Reinforcement Learning

AAAI Conferences

The introduction of Electric Vehicles (EVs) in the existing Energy Grid raises many issues regarding Grid stability and charging behavior. Uncontrolled charging on the customer’s side may increase the already high peaks in the energy demand that lead to respective increase in the energy prices.We propose a novel smart charging algorithm that maximizes individual welfare and reduces the individual energy expenses. We use Reinforcement Learning trained on real world data to learn the individual household consumption behavior and propose a charging algorithm with respect to individual welfare maximization objective. Furthermore, we use statistical customer models to simulate the EV customer behavior. We show that the individual customers, represented by intelligent agents, using the proposed charging algorithm reduce their energy expenses. Additionally, we show that the average energy prices, on an aggregated level, are reduced as a result of smarter use of the energy available. Finally we prove that the presented algorithm achieves significant peak reduction and reshaping of the energy demand curve.