This top-down planning approach decides what a good subgoal is before planning to achieve it." "For complex, high-dimensional Markov Decision Processes (MDPs), it may be necessary to represent the policy with function approximation. A problem is mis- specified whenever, the representation cannot express any policy with acceptable performance.
We introduce a framework for Compositional Imitation Learning and Execution (CompILE) of hierarchically-structured behavior. CompILE learns reusable, variable-length segments of behavior from demonstration data using a novel unsupervised, fully-differentiable sequence segmentation module. These learned behaviors can then be re-composed and executed to perform new tasks. At training time, CompILE auto-encodes observed behavior into a sequence of latent codes, each corresponding to a variable-length segment in the input sequence. Once trained, our model generalizes to sequences of longer length and from environment instances not seen during training. We evaluate our model in a challenging 2D multi-task environment and show that CompILE can find correct task boundaries and event encodings in an unsupervised manner without requiring annotated demonstration data. Latent codes and associated behavior policies discovered by CompILE can be used by a hierarchical agent, where the high-level policy selects actions in the latent code space, and the low-level, task-specific policies are simply the learned decoders. We found that our agent could learn given only sparse rewards, where agents without task-specific policies struggle.
We describe a framework of hybrid cognition by formulating a hybrid cognitive agent that performs hierarchical active inference across a human and a machine part. We suggest that, in addition to enhancing human cognitive functions with an intelligent and adaptive interface, integrated cognitive processing could accelerate emergent properties within artificial intelligence. To establish this, a machine learning part learns to integrate into human cognition by explaining away multi-modal sensory measurements from the environment and physiology simultaneously with the brain signal. With ongoing training, the amount of predictable brain signal increases. This lends the agent the ability to self-supervise on increasingly high levels of cognitive processing in order to further minimize surprise in predicting the brain signal. Furthermore, with increasing level of integration, the access to sensory information about environment and physiology is substituted with access to their representation in the brain. While integrating into a joint embodiment of human and machine, human action and perception are treated as the machine's own. The framework can be implemented with invasive as well as non-invasive sensors for environment, body and brain interfacing. Online and offline training with different machine learning approaches are thinkable. Building on previous research on shared representation learning, we suggest a first implementation leading towards hybrid active inference with non-invasive brain interfacing and state of the art probabilistic deep learning methods. We further discuss how implementation might have effect on the meta-cognitive abilities of the described agent and suggest that with adequate implementation the machine part can continue to execute and build upon the learned cognitive processes autonomously.
In recent years, deep generative models have been shown to 'imagine' convincing high-dimensional observations such as images, audio, and even video, learning directly from raw data. In this work, we ask how to imagine goal-directed visual plans -- a plausible sequence of observations that transition a dynamical system from its current configuration to a desired goal state, which can later be used as a reference trajectory for control. We focus on systems with high-dimensional observations, such as images, and propose an approach that naturally combines representation learning and planning. Our framework learns a generative model of sequential observations, where the generative process is induced by a transition in a low-dimensional planning model, and an additional noise. By maximizing the mutual information between the generated observations and the transition in the planning model, we obtain a low-dimensional representation that best explains the causal nature of the data. We structure the planning model to be compatible with efficient planning algorithms, and we propose several such models based on either discrete or continuous states. Finally, to generate a visual plan, we project the current and goal observations onto their respective states in the planning model, plan a trajectory, and then use the generative model to transform the trajectory to a sequence of observations. We demonstrate our method on imagining plausible visual plans of rope manipulation.
Reinforcement learning is considered to be a strong AI paradigm which can be used to teach machines through interaction with the environment and learning from their mistakes. Despite its perceived utility, it has not yet been successfully applied in automotive applications. Motivated by the successful demonstrations of learning of Atari games and Go by Google DeepMind, we propose a framework for autonomous driving using deep reinforcement learning. This is of particular relevance as it is difficult to pose autonomous driving as a supervised learning problem due to strong interactions with the environment including other vehicles, pedestrians and roadworks. As it is a relatively new area of research for autonomous driving, we provide a short overview of deep reinforcement learning and then describe our proposed framework. It incorporates Recurrent Neural Networks for information integration, enabling the car to handle partially observable scenarios. It also integrates the recent work on attention models to focus on relevant information, thereby reducing the computational complexity for deployment on embedded hardware. The framework was tested in an open source 3D car racing simulator called TORCS. Our simulation results demonstrate learning of autonomous maneuvering in a scenario of complex road curvatures and simple interaction of other vehicles.