to

### Gradient Episodic Memory with a Soft Constraint for Continual Learning

Catastrophic forgetting in continual learning is a common destructive phenomenon in gradient-based neural networks that learn sequential tasks, and it is much different from forgetting in humans, who can learn and accumulate knowledge throughout their whole lives. Catastrophic forgetting is the fatal shortcoming of a large decrease in performance on previous tasks when the model is learning a novel task. To alleviate this problem, the model should have the capacity to learn new knowledge and preserve learned knowledge. We propose an average gradient episodic memory (A-GEM) with a soft constraint $\epsilon \in [0, 1]$, which is a balance factor between learning new knowledge and preserving learned knowledge; our method is called gradient episodic memory with a soft constraint $\epsilon$ ($\epsilon$-SOFT-GEM). $\epsilon$-SOFT-GEM outperforms A-GEM and several continual learning benchmarks in a single training epoch; additionally, it has state-of-the-art average accuracy and efficiency for computation and memory, like A-GEM, and provides a better trade-off between the stability of preserving learned knowledge and the plasticity of learning new knowledge.

### Learning Structured Representations of Entity Names using Active Learning and Weak Supervision

Structured representations of entity names are useful for many entity-related tasks such as entity normalization and variant generation. Learning the implicit structured representations of entity names without context and external knowledge is particularly challenging. In this paper, we present a novel learning framework that combines active learning and weak supervision to solve this problem. Our experimental evaluation show that this framework enables the learning of high-quality models from merely a dozen or so labeled examples.

### Episodic Memory for Learning Subjective-Timescale Models

In model-based learning, an agent's model is commonly defined over transitions between consecutive states of an environment even though planning often requires reasoning over multi-step timescales, with intermediate states either unnecessary, or worse, accumulating prediction error. In contrast, intelligent behaviour in biological organisms is characterised by the ability to plan over varying temporal scales depending on the context. Inspired by the recent works on human time perception, we devise a novel approach to learning a transition dynamics model, based on the sequences of episodic memories that define the agent's subjective timescale - over which it learns world dynamics and over which future planning is performed. We implement this in the framework of active inference and demonstrate that the resulting subjective-timescale model (STM) can systematically vary the temporal extent of its predictions while preserving the same computational efficiency. Additionally, we show that STM predictions are more likely to introduce future salient events (for example new objects coming into view), incentivising exploration of new areas of the environment. As a result, STM produces more informative action-conditioned roll-outs that assist the agent in making better decisions. We validate significant improvement in our STM agent's performance in the Animal-AI environment against a baseline system, trained using the environment's objective-timescale dynamics. An agent endowed with a model of its environment has the ability to predict the consequences of its actions and perform planning into the future before deciding on its next move. Models can allow agents to simulate the possible action-conditioned futures from their current state, even if the state was never visited during learning. As a result, model-based approaches can provide agents with better generalization abilities across both states and tasks in an environment, compared to their model-free counterparts (Racanière et al., 2017; Mishra et al., 2017).

### Efficient Generation of Structured Objects with Constrained Adversarial Networks

Generative Adversarial Networks (GANs) struggle to generate structured objects like molecules and game maps. The issue is that structured objects must satisfy hard requirements (e.g., molecules must be chemically valid) that are difficult to acquire from examples alone. As a remedy, we propose Constrained Adversarial Networks (CANs), an extension of GANs in which the constraints are embedded into the model during training. This is achieved by penalizing the generator proportionally to the mass it allocates to invalid structures. In contrast to other generative models, CANs support efficient inference of valid structures (with high probability) and allows to turn on and off the learned constraints at inference time. CANs handle arbitrary logical constraints and leverage knowledge compilation techniques to efficiently evaluate the disagreement between the model and the constraints. Our setup is further extended to hybrid logical-neural constraints for capturing very complex constraints, like graph reachability. An extensive empirical analysis shows that CANs efficiently generate valid structures that are both high-quality and novel.

### Episodic Memory in Lifelong Language Learning

We introduce a lifelong language learning setup where a model needs to learn from a stream of text examples without any dataset identifier. We propose an episodic memory model that performs sparse experience replay and local adaptation to mitigate catastrophic forgetting in this setup. Experiments on text classification and question answering demonstrate the complementary benefits of sparse experience replay and local adaptation to allow the model to continuously learn from new datasets. We also show that the space complexity of the episodic memory module can be reduced significantly ( 50-90%) by randomly choosing which examples to store in memory with a minimal decrease in performance. We consider an episodic memory component as a crucial building block of general linguistic intelligence and see our model as a first step in that direction.

### Causal Learning by a Robot with Semantic-Episodic Memory in an Aesop's Fable Experiment

Corvids, apes, and children solve The Crow and The Pitcher task (from Aesop's Fables) indicating a causal understanding of the task. By cumulatively interacting with different objects, how can cognitive agents abstract the underlying cause-effect relations to predict affordances of novel objects? We address this question by re-enacting the Aesop's Fable task on a robot and present a) a brain-guided neural model of semantic-episodic memory; with b) four task-agnostic learning rules that compare expectations from recalled past episodes with the current scenario to progressively extract the hidden causal relations. The ensuing robot behaviours illustrate causal learning; and predictions for novel objects converge to Archimedes' principle, independent of both the objects explored during learning and the order of their cumulative exploration.

### An Overview of Distance and Similarity Functions for Structured Data

The notions of distance and similarity play a key role in many machine learning approaches, and artificial intelligence (AI) in general, since they can serve as an organizing principle by which individuals classify objects, form concepts and make generalizations. While distance functions for propositional representations have been thoroughly studied, work on distance functions for structured representations, such as graphs, frames or logical clauses, has been carried out in different communities and is much less understood. Specifically, a significant amount of work that requires the use of a distance or similarity function for structured representations of data usually employs ad-hoc functions for specific applications. Therefore, the goal of this paper is to provide an overview of this work to identify connections between the work carried out in different areas and point out directions for future work.

### Gradient Episodic Memory for Continual Learning

One major obstacle towards AI is the poor ability of models to solve new problems quicker, and without forgetting previously acquired knowledge. To better understand this issue, we study the problem of continual learning, where the model observes, once and one by one, examples concerning a sequence of tasks. First, we propose a set of metrics to evaluate models learning over a continuum of data. These metrics characterize models not only by their test accuracy, but also in terms of their ability to transfer knowledge across tasks. Second, we propose a model for continual learning, called Gradient Episodic Memory (GEM) that alleviates forgetting, while allowing beneficial transfer of knowledge to previous tasks.

### Learning Hierarchical Semantic Image Manipulation through Structured Representations

Understanding, reasoning, and manipulating semantic concepts of images have been a fundamental research problem for decades. Previous work mainly focused on direct manipulation of natural image manifold through color strokes, key-points, textures, and holes-to-fill. In this work, we present a novel hierarchical framework for semantic image manipulation. Key to our hierarchical framework is that we employ structured semantic layout as our intermediate representations for manipulation. Initialized with coarse-level bounding boxes, our layout generator first creates pixel-wise semantic layout capturing the object shape, object-object interactions, and object-scene relations.

### Learning to See Analogies: A Connectionist Exploration

This dissertation explores the integration of learning and analogy-making through the development of a computer program, called Analogator, that learns to make analogies by example. By "seeing" many different analogy problems, along with possible solutions, Analogator gradually develops an ability to make new analogies. That is, it learns to make analogies by analogy. This approach stands in contrast to most existing research on analogy-making, in which typically the a priori existence of analogical mechanisms within a model is assumed. The present research extends standard connectionist methodologies by developing a specialized associative training procedure for a recurrent network architecture. The network is trained to divide input scenes (or situations) into appropriate figure and ground components. Seeing one scene in terms of a particular figure and ground provides the context for seeing another in an analogous fashion. After training, the model is able to make new analogies between novel situations. Analogator has much in common with lower-level perceptual models of categorization and recognition; it thus serves as a unifying framework encompassing both high-level analogical learning and low-level perception. This approach is compared and contrasted with other computational models of analogy-making. The model's training and generalization performance is examined, and limitations are discussed.