Recent advances in Reinforcement Learning have highlighted the difficulties in learning within complex high dimensional domains. We argue that one of the main reasons that current approaches do not perform well, is that the information is represented sub-optimally. A natural way to describe what we observe, is through natural language. In this paper, we implement a natural language state representation to learn and complete tasks. Our experiments suggest that natural language based agents are more robust, converge faster and perform better than vision based agents, showing the benefit of using natural language representations for Reinforcement Learning.
The naturalness of qualitative reasoning suggests that qualitative representations might be an important component of the semantics of natural language. Prior work showed that frame-based representations of qualitative process theory constructs could indeed be extracted from natural language texts. That technique relied on the parser recognizing specific syntactic constructions, which had limited coverage. This paper describes a new approach, using narrative function to represent the higher-order relationships between the constituents of a sentence and between sentences in a discourse. We outline how narrative function combined with query-driven abduction enables the same kinds of information to be extracted from natural language texts. Moreover, we also show how the same technique can be used to extract type-level qualitative representations from text, and used to improve performance in playing a strategy game.
Language and vision provide complementary information. Integrating both modalities in a single multimodal representation is an unsolved problem with wide-reaching applications to both natural language processing and computer vision. In this paper, we present a simple and effective method that learns a language-to-vision mapping and uses its output visual predictions to build multimodal representations. In this sense, our method provides a cognitively plausible way of building representations, consistent with the inherently re-constructive and associative nature of human memory. Using seven benchmark concept similarity tests we show that the mapped (or imagined) vectors not only help to fuse multimodal information, but also outperform strong unimodal baselines and state-of-the-art multimodal methods, thus exhibiting more human-like judgments.
We present a framework to represent and reason about narratives that combines logical and probabilistic representations of commonsense knowledge. Unlike most natural language understanding systems, which merely extract facts or semantic roles, our system builds probabilistic representations of the temporal sequence of world states and actions implied by a narrative. We use probabilistic actions to represent ambiguities and uncertainties in the narrative. We present algorithms that take a representation of a narrative, derive all possible interpretations of the narrative, and answer probabilistic queries by marginalizing over all the interpretations. With a focus on spatial contexts, we demonstrate our framework on an example narrative. To this end, we apply natural language pro- cessing (NLP) tools together with statistical approaches over common sense knowledge bases.
Current semantic parsers either compute shallow representations over a wide range of input, or deeper representations in very limited domains. We describe a system that provides broad-coverage, deep semantic parsing designed to work in any domain using a core domain-general lexicon, ontology and grammar. This paper discusses how this core system can be customized for a particularly challenging domain, namely reading research papers in biology.