Goto

Collaborating Authors

 Undirected Networks


Safety-Aware Apprenticeship Learning

arXiv.org Artificial Intelligence

Apprenticeship learning (AL) is a kind of Learning from Demonstration techniques where the reward function of a Markov Decision Process (MDP) is unknown to the learning agent and the agent has to derive a good policy by observing an expert's demonstrations. In this paper, we study the problem of how to make AL algorithms inherently safe while still meeting its learning objective. We consider a setting where the unknown reward function is assumed to be a linear combination of a set of state features, and the safety property is specified in Probabilistic Computation Tree Logic (PCTL). By embedding probabilistic model checking inside AL, we propose a novel counterexample-guided approach that can ensure safety while retaining performance of the learnt policy. We demonstrate the effectiveness of our approach on several challenging AL scenarios where safety is essential.


Learning from multivariate discrete sequential data using a restricted Boltzmann machine model

arXiv.org Machine Learning

A restricted Boltzmann machine (RBM) is a generative neural-network model with many novel applications such as collaborative filtering and acoustic modeling. An RBM lacks the capacity to retain memory, making it inappropriate for dynamic data modeling as in time-series analysis. In this paper we address this issue by proposing the p-RBM model, a generalization of the regular RBM model, capable of retaining memory of p past states. We further show how to train the p-RBM model using contrastive divergence and test our model on the problem of predicting the stock market direction considering 100 stocks of the NASDAQ-100 index. Obtained results show that the p-RBM offer promising prediction potential.


Introduction to Learning to Trade with Reinforcement Learning

@machinelearnbot

The academic Deep Learning research community has largely stayed away from the financial markets. Maybe that's because the finance industry has a bad reputation, the problem doesn't seem interesting from a research perspective, or because data is difficult and expensive to obtain. In this post, I'm going to argue that training Reinforcement Learning agents to trade in the financial (and cryptocurrency) markets can be an extremely interesting research problem. I believe that it has not received enough attention from the research community but has the potential to push the state-of-the art of many related fields. It is quite similar to training agents for multiplayer games such as DotA, and many of the same research problems carry over. Knowing virtually nothing about trading, I have spent the past few months working on a project in this field. This is not a "price prediction using Deep Learning" post. So, if you're looking for example code and models you may be disappointed. Instead, I want to talk on a more high level about why learning to trade using Machine Learning is difficult, what some of the challenges are, and where I think Reinforcement Learning fits in. If there's enough interest in this area I may follow up with another post that includes concrete examples. I expect most readers to have no background in trading, just like I didn't, so I will start out with covering some of the basics.


A Guide to Sequence Prediction using Compact Prediction Tree (with codes in Python)

#artificialintelligence

Sequence prediction is one of the hottest application of Deep Learning these days. From building recommendation systems to speech recognition and natural language processing, its potential is seemingly endless. This is enabling never-thought-before solutions to emerge in the industry and is driving innovation. There are many different ways to perform sequence prediction such as using Markov models, Directed Graphs etc. from the Machine Learning domain and RNNs/LSTMs from the Deep Learning domain. In this article, we will see how we can perform sequence prediction using a relatively unknown algorithm called Compact Prediction Tree (CPT).


How does the AI understand what's going on

arXiv.org Artificial Intelligence

The standard approach in AI is to take a set of positive examples and a set of negative examples. We seek for a function that says "YES" for the positive examples given, and "NO" for the negative examples given. Using the function found, we begin to predict the right answer for examples which we do not know whether are positive or negative. In essence, the standard approach in AI represents an approximation. What is sought for is an approximation function. It is usually sought for in a given set of functions.


Scalable Bilinear $\pi$ Learning Using State and Action Features

arXiv.org Machine Learning

Approximate linear programming (ALP) represents one of the major algorithmic families to solve large-scale Markov decision processes (MDP). In this work, we study a primal-dual formulation of the ALP, and develop a scalable, model-free algorithm called bilinear $\pi$ learning for reinforcement learning when a sampling oracle is provided. This algorithm enjoys a number of advantages. First, it adopts (bi)linear models to represent the high-dimensional value function and state-action distributions, using given state and action features. Its run-time complexity depends on the number of features, not the size of the underlying MDPs. Second, it operates in a fully online fashion without having to store any sample, thus having minimal memory footprint. Third, we prove that it is sample-efficient, solving for the optimal policy to high precision with a sample complexity linear in the dimension of the parameter space.


Computational Approaches for Stochastic Shortest Path on Succinct MDPs

arXiv.org Artificial Intelligence

We consider the stochastic shortest path (SSP) problem for succinct Markov decision processes (MDPs), where the MDP consists of a set of variables, and a set of nondeterministic rules that update the variables. First, we show that several examples from the AI literature can be modeled as succinct MDPs. Then we present computational approaches for upper and lower bounds for the SSP problem: (a) for computing upper bounds, our method is polynomial-time in the implicit description of the MDP; (b) for lower bounds, we present a polynomial-time (in the size of the implicit description) reduction to quadratic programming. Our approach is applicable even to infinite-state MDPs. Finally, we present experimental results to demonstrate the effectiveness of our approach on several classical examples from the AI literature.


Shared autonomy via deep reinforcement learning

Robohub

Imagine a drone pilot remotely flying a quadrotor, using an onboard camera to navigate and land. Unfamiliar flight dynamics, terrain, and network latency can make this system challenging for a human to control. One approach to this problem is to train an autonomous agent to perform tasks like patrolling and mapping without human intervention. This strategy works well when the task is clearly specified and the agent can observe all the information it needs to succeed. Unfortunately, many real-world applications that involve human users do not satisfy these conditions: the user's intent is often private information that the agent cannot directly access, and the task may be too complicated for the user to precisely define.


Learning-Based Mean-Payoff Optimization in an Unknown MDP under Omega-Regular Constraints

arXiv.org Artificial Intelligence

We formalize the problem of maximizing the mean-payoff value with high probability while satisfying a parity objective in a Markov decision process (MDP) with unknown probabilistic transition function and unknown reward function. Assuming the support of the unknown transition function and a lower bound on the minimal transition probability are known in advance, we show that in MDPs consisting of a single end component, two combinations of guarantees on the parity and mean-payoff objectives can be achieved depending on how much memory one is willing to use. (i) For all $\epsilon$ and $\gamma$ we can construct an online-learning finite-memory strategy that almost-surely satisfies the parity objective and which achieves an $\epsilon$-optimal mean payoff with probability at least $1 - \gamma$. (ii) Alternatively, for all $\epsilon$ and $\gamma$ there exists an online-learning infinite-memory strategy that satisfies the parity objective surely and which achieves an $\epsilon$-optimal mean payoff with probability at least $1 - \gamma$. We extend the above results to MDPs consisting of more than one end component in a natural way. Finally, we show that the aforementioned guarantees are tight, i.e. there are MDPs for which stronger combinations of the guarantees cannot be ensured.


Personalizing Dialogue Agents: I have a dog, do you have pets too?

arXiv.org Artificial Intelligence

Chit-chat models are known to have several problems: they lack specificity, do not display a consistent personality and are often not very captivating. In this work we present the task of making chit-chat more engaging by conditioning on profile information. We collect data and train models to (i) condition on their given profile information; and (ii) information about the person they are talking to, resulting in improved dialogues, as measured by next utterance prediction. Since (ii) is initially unknown our model is trained to engage its partner with personal topics, and we show the resulting dialogue can be used to predict profile information about the interlocutors.