Dimitrakakis, Christos
Preference elicitation and inverse reinforcement learning
Rothkopf, Constantin, Dimitrakakis, Christos
We state the problem of inverse reinforcement learning in terms of preference elicitation, resulting in a principled (Bayesian) statistical formulation. This generalises previous work on Bayesian inverse reinforcement learning and allows us to obtain a posterior distribution on the agent's preferences, policy and optionally, the obtained reward sequence, from observations. We examine the relation of the resulting approach to other statistical methods for inverse reinforcement learning via analysis and experimental results. We show that preferences can be determined accurately, even if the observed agent's policy is sub-optimal with respect to its own preferences. In that case, significantly improved policies with respect to the agent's preferences are obtained, compared to both other methods and to the performance of the demonstrated policy.
Context models on sequences of covers
Dimitrakakis, Christos
We present a class of models that, via a simple construction, enables exact, incremental, non-parametric, polynomial-time, Bayesian inference of conditional measures. The approach relies upon creating a sequence of covers on the conditioning variable and maintaining a different model for each set within a cover. Inference remains tractable by specifying the probabilistic model in terms of a random walk within the sequence of covers. We demonstrate the approach on problems of conditional density estimation, which, to our knowledge is the first closed-form, non-parametric Bayesian approach to this problem.
Complexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning
Dimitrakakis, Christos
There has been a lot of recent work on Bayesian methods for reinforcement learning exhibiting near-optimal online performance. The main obstacle facing such methods is that in most problems of interest, the optimal solution involves planning in an infinitely large tree. However, it is possible to obtain stochastic lower and upper bounds on the value of each tree node. This enables us to use stochastic branch and bound algorithms to search the tree efficiently. This paper proposes two such algorithms and examines their complexity in this setting.
Statistical Decision Making for Authentication and Intrusion Detection
Dimitrakakis, Christos, Mitrokotsa, Aikaterini
User authentication and intrusion detection differ from standard classification problems in that while we have data generated from legitimate users, impostor or intrusion data is scarce or non-existent. We review existing techniques for dealing with this problem and propose a novel alternative based on a principled statistical decision-making view point. We examine the technique on a toy problem and validate it on complex real-world data from an RFID based access control system. The results indicate that it can significantly outperform the classical world model approach. The method could be more generally useful in other decision-making scenarios where there is a lack of adversary data.
Rollout Sampling Approximate Policy Iteration
Dimitrakakis, Christos, Lagoudakis, Michail G.
Several researchers have recently investigated the connection between reinforcement learning and classification. We are motivated by proposals of approximate policy iteration schemes without value functions which focus on policy representation using classifiers and address policy learning as a supervised learning problem. This paper proposes variants of an improved policy iteration scheme which addresses the core sampling problem in evaluating a policy through simulation as a multi-armed bandit machine. The resulting algorithm offers comparable performance to the previous algorithm achieved, however, with significantly less computational effort. An order of magnitude improvement is demonstrated experimentally in two standard reinforcement learning domains: inverted pendulum and mountain-car.