Recent success of pre-trained language models (LMs) has spurred widespread interest in the language capabilities that they possess. However, efforts to understand whether LM representations are useful for symbolic reasoning tasks have been limited and scattered. In this work, we propose eight reasoning tasks, which conceptually require operations such as comparison, conjunction, and composition. A fundamental challenge is to understand whether the performance of a LM on a task should be attributed to the pre-trained representations or to the process of fine-tuning on the task data. To address this, we propose an evaluation protocol that includes both zero-shot evaluation (no fine-tuning), as well as comparing the learning curve of a fine-tuned LM to the learning curve of multiple controls, which paints a rich picture of the LM capabilities. Our main findings are that: (a) different LMs exhibit qualitatively different reasoning abilities, e.g., RoBERTa succeeds in reasoning tasks where BERT fails completely; (b) LMs do not reason in an abstract manner and are context-dependent, e.g., while RoBERTa can compare ages, it can do so only when the ages are in the typical range of human ages; (c) On half of our reasoning tasks all models fail completely. Our findings and infrastructure can help future work on designing new datasets, models and objective functions for pre-training.
There is a growing interest and literature on intrinsic motivations and open-ended learning in both cognitive robotics and machine learning on one side, and in psychology and neuroscience on the other. This paper aims to review some relevant contributions from the two literature threads and to draw links between them. To this purpose, the paper starts by defining intrinsic motivations and by presenting a computationally-driven theoretical taxonomy of their different types. Then it presents relevant contributions from the psychological and neuroscientific literature related to intrinsic motivations, interpreting them based on the grid, and elucidates the mechanisms and functions they play in animals and humans. Endowed with such concepts and their biological underpinnings, the paper next presents a selection of models from cognitive robotics and machine learning that computationally operationalise the concepts of intrinsic motivations and links them to biology concepts. The contribution finally presents some of the open challenges of the field from both the psychological/neuroscientific and computational perspectives.
--There has been a growing concern about the fairness of decision-making systems based on machine learning. The shortage of labeled data has been always a challenging problem facing machine learning based systems. In such scenarios, semi-supervised learning has shown to be an effective way of exploiting unlabeled data to improve upon the performance of model. Notably, unlabeled data do not contain label information which itself can be a significant source of bias in training machine learning systems. This inspired us to tackle the challenge of fairness by formulating the problem in a semi-supervised framework. In this paper, we propose a semi-supervised algorithm using neural networks benefiting from unlabeled data to not just improve the performance but also improve the fairness of the decision-making process. The proposed model, called SSFair, exploits the information in the unlabeled data to mitigate the bias in the training data.
We have proposed going beyond traditional ontologies to use rich semantics implemented in programming languages for modeling. In this paper, we discuss the application of executable semantic models to two examples, first a structured definition of a waterfall and second the cardiopulmonary system. We examine the components of these models and the way those components interact. Ultimately, such models should provide the basis for direct representation.
Many theories, based on neuroscientific and psychological empirical evidence and on computational concepts, have been elaborated to explain the emergence of consciousness in the central nervous system. These theories propose key fundamental mechanisms to explain consciousness, but they only partially connect such mechanisms to the possible functional and adaptive role of consciousness. Recently, some cognitive and neuroscientific models try to solve this gap by linking consciousness to various aspects of goal-directed behaviour, the pivotal cognitive process that allows mammals to flexibly act in challenging environments. Here we propose the Representation Internal-Manipulation (RIM) theory of consciousness, a theory that links the main elements of consciousness theories to components and functions of goal-directed behaviour, ascribing a central role for consciousness to the goal-directed manipulation of internal representations. This manipulation relies on four specific computational operations to perform the flexible internal adaptation of all key elements of goal-directed computation, from the representations of objects to those of goals, actions, and plans. Finally, we propose the concept of `manipulation agency' relating the sense of agency to the internal manipulation of representations. This allows us to propose that the subjective experience of consciousness is associated to the human capacity to generate and control a simulated internal reality that is vividly perceived and felt through the same perceptual and emotional mechanisms used to tackle the external world.
Synthesizing a program that realizes a logical specification is a classical problem in computer science. We examine a particular type of program synthesis, where the objective is to synthesize a strategy that reacts to a potentially adversarial environment while ensuring that all executions satisfy a Linear Temporal Logic (LTL) specification. Unfortunately, exact methods to solve so-called LTL synthesis via logical inference do not scale. In this work, we cast LTL synthesis as an optimization problem. We employ a neural network to learn a Q-function that is then used to guide search, and to construct programs that are subsequently verified for correctness. Our method is unique in combining search with deep learning to realize LTL synthesis. In our experiments the learned Q-function provides effective guidance for synthesis problems with relatively small specifications.
As deep reinforcement learning (RL) is applied to more tasks, there is a need to visualize and understand the behavior of learned agents. Saliency maps explain agent behavior by highlighting the features of the input state that are most relevant for the agent in taking an action. Existing perturbation-based approaches to compute saliency often highlight regions of the input that are not relevant to the action taken by the agent. Our approach generates more focused saliency maps by balancing two aspects (specificity and relevance) that capture different desiderata of saliency. The first captures the impact of perturbation on the relative expected reward of the action to be explained. The second downweights irrelevant features that alter the relative expected rewards of actions other than the action to be explained. We compare our approach with existing approaches on agents trained to play board games (Chess and Go) and Atari games (Breakout, Pong and Space Invaders). We show through illustrative examples (Chess, Atari, Go), human studies (Chess), and automated evaluation methods (Chess) that our approach generates saliency maps that are more interpretable for humans than existing approaches.
A linear algorithm is described for solving the n-Queens Completion problem for an arbitrary composition of k queens, consistently distributed on a chessboard of size n x n. Two important rules are used in the algorithm: a) the rule of sequential risk elimination for the entire system as a whole; b) the rule of formation of minimal damage in the given selection conditions. For any composition of k queens (1<= k
Understanding stories is a challenging reading comprehension problem for machines as it requires reading a large volume of text and following long-range dependencies. In this paper, we introduce the Shmoop Corpus: a dataset of 231 stories that are paired with detailed multi-paragraph summaries for each individual chapter (7,234 chapters), where the summary is chronologically aligned with respect to the story chapter. From the corpus, we construct a set of common NLP tasks, including Cloze-form question answering and a simplified form of abstractive summarization, as benchmarks for reading comprehension on stories. We then show that the chronological alignment provides a strong supervisory signal that learning-based methods can exploit leading to significant improvements on these tasks. We believe that the unique structure of this corpus provides an important foothold towards making machine story comprehension more approachable.
We seek to align agent policy with human expert behavior in a reinforcement learning (RL) setting, without any prior knowledge about dynamics, reward function, and unsafe states. There is a human expert knowing the rewards and unsafe states based on his preference and objective, but querying that human expert is expensive. To address this challenge, we propose a new framework for imitation learning (IL) algorithm that actively and interactively learns a model of the user's reward function with efficient queries. We build an adversarial generative model of states and a successor feature (SR) model trained over transition experience collected by learning policy. Our method uses these models to select state-action pairs, asking the user to comment on the optimality or safety, and trains a adversarial neural network to predict the rewards. Different from previous papers, which are almost all based on uncertainty sampling, the key idea is to actively and efficiently select state-action pairs from both on-policy and off-policy experience, by discriminating the queried (expert) and unqueried (generated) data and maximizing the efficiency of value function learning. We call this method adversarial reward query with successor representation. We evaluate the proposed method with simulated human on a state-based 2D navigation task, robotic control tasks and the image-based video games, which have high-dimensional observation and complex state dynamics. The results show that the proposed method significantly outperforms uncertainty-based methods on learning reward models, achieving better query efficiency, where the adversarial discriminator can make the agent learn human behavior more efficiently and the SR can select states which have stronger impact on value function. Moreover, the proposed method can also learn to avoid unsafe states when training the reward model.