Goto

Collaborating Authors

 Technische Universität Darmstadt


Inverse Reinforcement Learning via Nonparametric Subgoal Modeling

AAAI Conferences

Recent advances in the field of inverse reinforcement learning (IRL) have yielded sophisticated frameworks which relax the original modeling assumption that the behavior of an observed agent reflects only a single intention. Instead, the demonstration data is separated into parts to account for the fact that different trajectories may correspond to different intentions, e.g., because they were generated by different domain experts. In this work, we go one step further: using the intuitive concept of subgoals, we build upon the premise that even a single trajectory can be explained more efficiently locally within a certain context than globally, enabling a more compact representation of the observed behavior. Based on this assumption, we build an implicit intentional model of the agent's goals to forecast its behavior in unobserved situations. The result is an integrated Bayesian prediction framework which provides spatially smooth policy estimates that are consistent with the expert's plan and significantly outperform existing IRL solutions. In addition, the framework can be naturally extended to handle scenarios with time-varying expert intentions.


A Poisson Gamma Probabilistic Model for Latent Node-Group Memberships in Dynamic Networks

AAAI Conferences

We present a probabilistic model for learning from dynamic relational data, wherein the observed interactions among networked nodes are modeled via the Bernoulli Poisson link function, and the underlying network structure are characterized by nonnegative latent node-group memberships, which are assumed to be gamma distributed. The latent memberships evolve according to Markov processes.The optimal number of latent groups can be determined by data itself. The computational complexity of our method scales with the number of non-zero links, which makes it scalable to large sparse dynamic relational data. We present batch and online Gibbs sampling algorithms to perform model inference. Finally, we demonstrate the model's performance on both synthetic and real-world datasets compared to state-of-the-art methods.


The Kernel Kalman Rule — Efficient Nonparametric Inference with Recursive Least Squares

AAAI Conferences

Nonparametric inference techniques provide promising tools for probabilistic reasoning in high-dimensional nonlinear systems.Most of these techniques embed distributions into reproducing kernel Hilbert spaces (RKHS) and rely on the kernel Bayes' rule (KBR) to manipulate the embeddings. However, the computational demands of the KBR scale poorly with the number of samples and the KBR often suffers from numerical instabilities. In this paper, we present the kernel Kalman rule (KKR) as an alternative to the KBR.The derivation of the KKR is based on recursive least squares, inspired by the derivation of the Kalman innovation update.We apply the KKR to filtering tasks where we use RKHS embeddings to represent the belief state, resulting in the kernel Kalman filter (KKF).We show on a nonlinear state estimation task with high dimensional observations that our approach provides a significantly improved estimation accuracy while the computational demands are significantly decreased.


Stochastic Search In Changing Situations

AAAI Conferences

Stochastic search algorithms are black-box optimizer of an objective function. They have recently gained a lot of attention in operations research, machine learning and policy search of robot motor skills due to their ease of use and their generality. However, when the task or objective function slightly changes, many stochastic search algorithms require complete re-learning in order to adapt thesolution to the new objective function or the new context. As such, we consider the contextual stochastic search paradigm. Here, we want to find good parameter vectors for multiple related tasks, where each task is described by a continuous context vector. Hence, the objective function might change slightly for each parameter vector evaluation. In this paper, we investigate a contextual stochastic search algorithm known as Contextual Relative Entropy Policy Search (CREPS), an information-theoretic algorithm that can learn from multiple tasks simultaneously. We show the application of CREPS for simulated robotic tasks.


Model-Free Preference-Based Reinforcement Learning

AAAI Conferences

Specifying a numeric reward function for reinforcement learning typically requires a lot of hand-tuning from a human expert. In contrast, preference-based reinforcement learning (PBRL) utilizes only pairwise comparisons between trajectories as a feedback signal, which are often more intuitive to specify. Currently available approaches to PBRL for control problems with continuous state/action spaces require a known or estimated model, which is often not available and hard to learn. In this paper, we integrate preference-based estimation of the reward function into a model-free reinforcement learning (RL) algorithm, resulting in a model-free PBRL algorithm. Our new algorithm is based on Relative Entropy Policy Search (REPS), enabling us to utilize stochastic policies and to directly control the greediness of the policy update. REPS decreases exploration of the policy slowly by limiting the relative entropy of the policy update, which ensures that the algorithm is provided with a versatile set of trajectories, and consequently with informative preferences. The preference-based estimation is computed using a sample-based Bayesian method, which can also estimate the uncertainty of the utility. Additionally, we also compare to a linear solvable approximation, based on inverse RL. We show that both approaches perform favourably to the current state-of-the-art. The overall result is an algorithm that can learn non-parametric continuous action policies from a small number of preferences.


All-in Text: Learning Document, Label, and Word Representations Jointly

AAAI Conferences

Conventional multi-label classification algorithms treat the target labels of the classification task as mere symbols that are void of an inherent semantics. However, in many cases textual descriptions of these labels are available or can be easily constructed from public document sources such as Wikipedia. In this paper, we investigate an approach for embedding documents and labels into a joint space while sharing word representations between documents and labels. For finding such embeddings, we rely on the text of documents as well as descriptions for the labels. The use of such label descriptions not only lets us expect an increased performance on conventional multi-label text classification tasks, but can also be used to make predictions for labels that have not been seen during the training phase. The potential of our method is demonstrated on the multi-label classification task of assigning keywords from the Medical Subject Headings (MeSH) to publications in biomedical research, both in a conventional and in a zero-shot learning setting.


Event-Based Clustering for Reducing Labeling Costs of Event-related Microposts

AAAI Conferences

Automatically identifying the event type of event-related information in the sheer amount of social media data makes machine learning inevitable. However, this is highly dependent on (1) the number of correctly labeled instances and (2) labeling costs. Active learning has been proposed to reduce the number of instances to label. Albeit the thematic dimension is already used, other metadata such as spatial and temporal information that is helpful for achieving a more fine-grained clustering is currently not taken into account. In this paper, we present a novel event-based clustering strategy that makes use of temporal, spatial, and thematic metadata to determine instances to label. An evaluation on incident-related tweets shows that our selection strategy for active learning outperforms current state-of-the-art approaches even with few labeled instances.


RoboCup Soccer Leagues

AI Magazine

RoboCup was created in 1996 by a group of Japanese, American, and European artificial intelligence and robotics researchers with a formidable, visionary long-term challenge: By 2050 a team of robot soccer players will beat the human World Cup champion team. In this article, we focus on RoboCup robot soccer, and present its five current leagues, which address complementary scientific challenges through different robot and physical setups. Full details on the status of the RoboCup soccer leagues, including league history and past results, upcoming competitions, and detailed rules and specifications are available from the league homepages and wikis.


RoboCup Soccer Leagues

AI Magazine

RoboCup was created in 1996 by a group of Japanese, American, and European artificial intelligence and robotics researchers with a formidable, visionary long-term challenge: By 2050 a team of robot soccer players will beat the human World Cup champion team. In this article, we focus on RoboCup robot soccer, and present its five current leagues, which address complementary scientific challenges through different robot and physical setups. Full details on the status of the RoboCup soccer leagues, including league history and past results, upcoming competitions, and detailed rules and specifications are available from the league homepages and wikis.


Data-Efficient Generalization of Robot Skills with Contextual Policy Search

AAAI Conferences

In robotics, controllers make the robot solve a task within a specific context. The context can describe the objectives of the robot or physical properties of the environment and is always specified before task execution. To generalize the controller to multiple contexts, we follow a hierarchical approach for policy learning: A lower-level policy controls the robot for a given context and an upper-level policy generalizes among contexts. Current approaches for learning such upper-level policies are based on model-free policy search, which require an excessive number of interactions of the robot with its environment. More data-efficient policy search approaches are model based but, thus far, without the capability of learning hierarchical policies. We propose a new model-based policy search approach that can also learn contextual upper-level policies. Our approach is based on learning probabilistic forward models for long-term predictions. Using these predictions, we use information-theoretic insights to improve the upper-level policy. Our method achieves a substantial improvement in learning speed compared to existing methods on simulated and real robotic tasks.