AITopics

constructing gene regulatory network, control gene regulation, employing batch reinforcement learning

The goal of controlling a gene regulatory network (GRN) is to generate an intervention strategy, i.e., a control policy, such that by applying the policy the system will avoid undesirable states. In this work, we propose a method to control GRNs by using Batch Mode Reinforcement Learning (Batch RL). Our idea is based on the fact that time series gene expression data can actually be interpreted as a sequence of experience tuples collected from the environment. Existing studies on this control task try to infer a model using gene expression data and then calculate a control policy over the constructed model. However, we propose a method that can directly use the available gene expression data to obtain an approximated control policy for gene regulation that avoids the time consuming model building phase. Results show that we can obtain policies for gene regulation systems of several thousands of genes just in several seconds while existing solutions get stuck for even tens of genes. Interestingly, the reported results also show that our method produces policies that are almost as good as the ones generated by existing model dependent methods.

Twenty-Third International Joint Conference on Artificial Intelligence

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Tziortziotis, Nikolaos (University of Ioannina) | Dimitrakakis, Christos (École Polytechnique Fédérale de Lausanne) | Blekas, Konstantinos (University of Ioannina)

Linear Bayesian Reinforcement Learning

linear bayesian reinforcement learning

This paper proposes a simple linear Bayesian approach to reinforcement learning. We show that with an appropriate basis, a Bayesian linear Gaussian model is sufficient for accurately estimating the system dynamics, and in particular when we allow for correlated noise. Policies are estimated by first sampling a transition model from the current posterior, and then performing approximate dynamic programming on the sampled model. This form of approximate Thompson sampling results in good exploration in unknown environments. The approach can also be seen as a Bayesian generalisation of least-squares policy iteration, where the empirical transition matrix is replaced with a sample from the posterior.

Twenty-Third International Joint Conference on Artificial Intelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)

Online Expectation Maximization for Reinforcement Learning in POMDPs

Liu, Miao (Duke University) | Liao, Xuejun (Duke University) | Carin, Lawrence (Duke University)

We present online nested expectation maximization for model-free reinforcement learning in a POMDP. The algorithm evaluates the policy only in the current learning episode, discarding the episode after the evaluation and memorizing the sufficient statistic, from which the policy is computed in closed-form. As a result, the online algorithm has a time complexity O ( n ) and a memory complexity O (1), compared to O ( n 2 ) and O ( n ) for the corresponding batch-mode algorithm, where $n$ is the number of learning episodes. The online algorithm, which has a provable convergence, is demonstrated on five benchmark POMDP problems.

online expectation maximization, pomdp, reinforcement learning

Twenty-Third International Joint Conference on Artificial Intelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.80)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.60)

Hoang, Trong Nghia (National University of Singapore) | Low, Kian Hsiang (National University of Singapore)

A General Framework for Interacting Bayes-Optimally with Self-Interested Agents using Arbitrary Parametric Model and Model Prior

Recent advances in Bayesian reinforcement learning (BRL) have shown that Bayes-optimality is theoretically achievable by modeling the environment's latent dynamics using Flat-Dirichlet-Multinomial (FDM) prior. In self-interested multi-agent environments, the transition dynamics are mainly controlled by the other agent's stochastic behavior for which FDM's independence and modeling assumptions do not hold. As a result, FDM does not allow the other agent's behavior to be generalized across different states nor specified using prior domain knowledge. To overcome these practical limitations of FDM, we propose a generalization of BRL to integrate the general class of parametric models and model priors, thus allowing practitioners' domain knowledge to be exploited to produce a fine-grained and compact representation of the other agent's behavior. Empirical evaluation shows that our approach outperforms existing multi-agent reinforcement learning algorithms.

arbitrary parametric model, interacting bayes-optimally, self-interested agent, (1 more...)

Twenty-Third International Joint Conference on Artificial Intelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Choi, Jaedeug (Korea Advanced Institute of Science and Technology (KAIST)) | Kim, Kee-Eung (Korea Advanced Institute of Science and Technology (KAIST))

Bayesian Nonparametric Feature Construction for Inverse Reinforcement Learning

bayesian nonparametric feature construction, inverse reinforcement learning

Most of the algorithms for inverse reinforcement learning (IRL) assume that the reward function is a linear function of the pre-defined state and action features. However, it is often difficult to manually specify the set of features that can make the true reward function representable as a linear function. We propose a Bayesian nonparametric approach to identifying useful composite features for learning the reward function. The composite features are assumed to be the logical conjunctions of the predefined atomic features so that we can represent the reward function as a linear function of the composite features. We empirically show that our approach is able to learn composite features that capture important aspects of the reward function on synthetic domains, and predict taxi drivers’ behaviour with high accuracy on a real GPS trace dataset.

Twenty-Third International Joint Conference on Artificial Intelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Azar, Mohammad Gheshlaghi, Lazaric, Alessandro, Brunskill, Emma

Regret Bounds for Reinforcement Learning with Policy Advice

arXiv.org Machine LearningJul-17-2013

In some reinforcement learning problems an agent may be provided with a set of input policies, perhaps learned from prior experience or provided by advisors. We present a reinforcement learning with policy advice (RLPA) algorithm which leverages this input set and learns to use the best policy in the set for the reinforcement learning task at hand. We prove that RLPA has a sub-linear regret of \tilde O(\sqrt{T}) relative to the best input policy, and that both this regret and its computational complexity are independent of the size of the state and action space. Our empirical simulations support our theoretical analysis. This suggests RLPA may offer significant advantages in large domains where some prior good policies are provided.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

1305.1027

Country: North America > United States (0.93)

Genre: Research Report (0.82)

Industry:

Banking & Finance > Trading (0.46)
Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Tossou, Aristide C. Y., Dimitrakakis, Christos

Probabilistic inverse reinforcement learning in unknown environments

arXiv.org Machine LearningJul-14-2013

We consider the problem of learning by demonstration from agents acting in unknown stochastic Markov environments or games. Our aim is to estimate agent preferences in order to construct improved policies for the same task that the agents are trying to solve. To do so, we extend previous probabilistic approaches for inverse reinforcement learning in known MDPs to the case of unknown dynamics or opponents. We do this by deriving two simplified probabilistic models of the demonstrator's policy and utility. For tractability, we use maximum a posteriori estimation rather than full Bayesian inference. Under a flat prior, this results in a convex optimisation problem. We find that the resulting algorithms are highly competitive against a variety of other methods for inverse reinforcement learning that do have knowledge of the dynamics.

algorithm, demonstration, reward function, (13 more...)

arXiv.org Machine Learning

1307.3785

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > San Mateo County > Menlo Park (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)

Yahya, Keyvan, Fard, Pouyan Rafiei

Computational Model of Music Sight Reading: A Reinforcement Learning Approach

arXiv.org Artificial IntelligenceJul-13-2013

Although the Music Sight Reading process has been studied from the cognitive psychology view points, but the computational learning methods like the Reinforcement Learning have not yet been used to modeling of such processes. In this paper, with regards to essential properties of our specific problem, we consider the value function concept and will indicate that the optimum policy can be obtained by the method we offer without to be getting involved with computing of the complex value functions. Also, we will offer a normative behavioral model for the interaction of the agent with the musical pitch environment and by using a slightly different version of Partially observable Markov decision processes we will show that our method helps for faster learning of state-action pairs in our implemented agents.

artificial intelligence, machine learning, reinforcement learning approach, (2 more...)

arXiv.org Artificial Intelligence

1007.0546

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.87)

AAAI ConferencesJul-9-2013

An Ensemble of Linearly Combined Reinforcement-Learning Agents

Marivate, Vukosi Ntsakisi (Rutgers University) | Littman, Michael (Brown University)

Reinforcement-learning (RL) algorithms are often tweaked and tunedto specific environments when applied, calling into question whetherlearning can truly be considered autonomous in these cases. In thiswork, we show how more robust learning across environments is possibleby adopting an ensemble approach to reinforcement learning. Our approachlearns a weighted linear combination of Q-values from multiple independentlearning algorithms. In our evaluations in generalized RL environments,we find that the algorithm compares favorably to the best tuned algorithm.Our work provides a promising basis for further study into the useof ensemble methods in RL.

ensemble, reinforcement-learning agent

Workshops at the Twenty-Seventh AAAI Conference on Artificial Intelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

AAAI ConferencesJul-9-2013

Smart Charging of Electric Vehicles using Reinforcement Learning

Valogianni, Konstantina (Erasmus University) | Ketter, Wolfgang (Erasmus University) | Collins, John (University of Minnesota)

The introduction of Electric Vehicles (EVs) in the existing Energy Grid raises many issues regarding Grid stability and charging behavior. Uncontrolled charging on the customer’s side may increase the already high peaks in the energy demand that lead to respective increase in the energy prices.We propose a novel smart charging algorithm that maximizes individual welfare and reduces the individual energy expenses. We use Reinforcement Learning trained on real world data to learn the individual household consumption behavior and propose a charging algorithm with respect to individual welfare maximization objective. Furthermore, we use statistical customer models to simulate the EV customer behavior. We show that the individual customers, represented by intelligent agents, using the proposed charging algorithm reduce their energy expenses. Additionally, we show that the average energy prices, on an aggregated level, are reduced as a result of smarter use of the energy available. Finally we prove that the presented algorithm achieves significant peak reduction and reshaping of the energy demand curve.

artificial intelligence, machine learning, reinforcement learning, (2 more...)

Workshops at the Twenty-Seventh AAAI Conference on Artificial Intelligence

Industry:

Transportation > Ground > Road (1.00)
Transportation > Electric Vehicle (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.60)