Goto

Collaborating Authors

 Scripts & Frames


Structured Object Language Modeling (SoLM): Native Structured Objects Generation Conforming to Complex Schemas with Self-Supervised Denoising

Tavanaei, Amir, Koo, Kee Kiat, Ceker, Hayreddin, Jiang, Shaobai, Li, Qi, Han, Julien, Bouyarmane, Karim

arXiv.org Artificial Intelligence

In this paper, we study the problem of generating structured objects that conform to a complex schema, with intricate dependencies between the different components (facets) of the object. The facets of the object (attributes, fields, columns, properties) can be a mix of short, structured, type-constrained facts, or long natural-language descriptions. The object has to be self-consistent between the different facets in the redundant information it carries (relative consistency), while being grounded with respect to world knowledge (absolute consistency). We frame the problem as a Language Modeling problem (Structured Object Language Modeling) and train an LLM to perform the task natively, without requiring instructions or prompt-engineering. We propose a self-supervised denoising method to train the model from an existing dataset of such objects. The input query can be the existing object itself, in which case the model acts as a regenerator, completing, correcting, normalizing the input, or any unstructured blurb to be structured. We show that the self-supervised denoising training provides a strong baseline, and that additional supervised fine-tuning with small amount of human demonstrations leads to further improvement. Experimental results show that the proposed method matches or outperforms prompt-engineered general-purpose state-of-the-art LLMs (Claude 3, Mixtral-8x7B), while being order-of-magnitude more cost-efficient.


A Review of Mechanistic Models of Event Comprehension

Nguyen, Tan T.

arXiv.org Artificial Intelligence

This review examines theoretical assumptions and computational models of event comprehension, tracing the evolution from discourse comprehension theories to contemporary event cognition frameworks. The review covers key discourse comprehension accounts, including Construction-Integration, Event Indexing, Causal Network, and Resonance models, highlighting their contributions to understanding cognitive processes in comprehension. I then discuss contemporary theoretical frameworks of event comprehension, including Event Segmentation Theory (Zacks et al., 2007), the Event Horizon Model (Radvansky & Zacks, 2014), and Hierarchical Generative Framework (Kuperberg, 2021), which emphasize prediction, causality, and multilevel representations in event understanding. Building on these theories, I evaluate five computational models of event comprehension: REPRISE (Butz et al., 2019), Structured Event Memory (SEM; Franklin et al., 2020), the Lu model (Lu et al., 2022), the Gumbsch model (Gumbsch et al., 2022), and the Elman and McRae model (2019). The analysis focuses on their approaches to hierarchical processing, prediction mechanisms, and representation learning. Key themes that emerge include the use of hierarchical structures as inductive biases, the importance of prediction in comprehension, and diverse strategies for learning event dynamics. The review identifies critical areas for future research, including the need for more sophisticated approaches to learning structured representations, integrating episodic memory mechanisms, and developing adaptive updating algorithms for working event models. By synthesizing insights from both theoretical frameworks and computational implementations, this review aims to advance our understanding of human event comprehension and guide future modeling efforts in cognitive science.


Episodic Memory in Lifelong Language Learning

Neural Information Processing Systems

We introduce a lifelong language learning setup where a model needs to learn from a stream of text examples without any dataset identifier. We propose an episodic memory model that performs sparse experience replay and local adaptation to mitigate catastrophic forgetting in this setup. Experiments on text classification and question answering demonstrate the complementary benefits of sparse experience replay and local adaptation to allow the model to continuously learn from new datasets. We also show that the space complexity of the episodic memory module can be reduced significantly ( 50-90%) by randomly choosing which examples to store in memory with a minimal decrease in performance. We consider an episodic memory component as a crucial building block of general linguistic intelligence and see our model as a first step in that direction.


Efficient Generation of Structured Objects with Constrained Adversarial Networks

Neural Information Processing Systems

Generative Adversarial Networks (GANs) struggle to generate structured objects like molecules and game maps. The issue is that structured objects must satisfy hard requirements (e.g., molecules must be chemically valid) that are difficult to acquire from examples alone. As a remedy, we propose Constrained Adversarial Networks (CANs), an extension of GANs in which the constraints are embedded into the model during training. This is achieved by penalizing the generator proportionally to the mass it allocates to invalid structures. In contrast to other generative models, CANs support efficient inference of valid structures (with high probability) and allows to turn on and off the learned constraints at inference time.


Improved Schemes for Episodic Memory-based Lifelong Learning

Neural Information Processing Systems

Current deep neural networks can achieve remarkable performance on a single task. However, when the deep neural network is continually trained on a sequence of tasks, it seems to gradually forget the previous learned knowledge. This phenomenon is referred to as catastrophic forgetting and motivates the field called lifelong learning. Recently, episodic memory based approaches such as GEM and A-GEM have shown remarkable performance. In this paper, we provide the first unified view of episodic memory based approaches from an optimization's perspective.


Generalization of Reinforcement Learners with Working and Episodic Memory

Neural Information Processing Systems

Memory is an important aspect of intelligence and plays a role in many deep reinforcement learning models. However, little progress has been made in understanding when specific memory systems help more than others and how well they generalize. The field also has yet to see a prevalent consistent and rigorous approach for evaluating agent performance on holdout data. In this paper, we aim to develop a comprehensive methodology to test different kinds of memory in an agent and assess how well the agent can apply what it learns in training to a holdout set that differs from the training set along dimensions that we suggest are relevant for evaluating memory-specific generalization. To that end, we first construct a diverse set of memory tasks that allow us to evaluate test-time generalization across multiple dimensions. Second, we develop and perform multiple ablations on an agent architecture that combines multiple memory systems, observe its baseline models, and investigate its performance against the task suite.


Reviews: Gradient Episodic Memory for Continual Learning

Neural Information Processing Systems

The authors of the manuscript consider the continuum learning setting, where the learner observes a stream of data points from training, which are ordered according to the tasks they belong to, i.e. the learner encounters any data from the next task only after it has observed all the training data for the current one. The authors propose a set of three metrics for evaluation performance of learning algorithms in this setting, which reflect their ability to transfer information to new tasks and not forget information about the earlier tasks. Could the authors, please, comment on the difference between continuum and lifelong learning (the corresponding sentence in line 254 seems incomplete)? The authors also propose a learning method, termed Gradient of Episodic Memory (GEM). The idea of the method is to keep a set of examples from every observed task and make sure that at each update stage the loss on the observed tasks does not increase.


Reviews: Learning Hierarchical Semantic Image Manipulation through Structured Representations

Neural Information Processing Systems

In this paper a new method for image manipulation is proposed. The proposed method incorporates a hierarchical framework and provides both interactive and automatic semantic object-level image manipulation. In the interactive manipulation setting, the user can select a bounding box where image editing for adding and removing objects will be applied. The proposed network architecture consists of a foreground output stream which produces the predictions on binary object mask and a background output stream for producing per-pixel label maps. As the result, the proposed image manipulation method generates output image by filling in the pixel-level textures guided by the semantic layout.


Gradient Episodic Memory for Continual Learning

David Lopez-Paz, Marc'Aurelio Ranzato

Neural Information Processing Systems

One major obstacle towards AI is the poor ability of models to solve new problems quicker, and without forgetting previously acquired knowledge. To better understand this issue, we study the problem of continual learning, where the model observes, once and one by one, examples concerning a sequence of tasks. First, we propose a set of metrics to evaluate models learning over a continuum of data. These metrics characterize models not only by their test accuracy, but also in terms of their ability to transfer knowledge across tasks. Second, we propose a model for continual learning, called Gradient Episodic Memory (GEM) that alleviates forgetting, while allowing beneficial transfer of knowledge to previous tasks. Our experiments on variants of the MNIST and CIFAR-100 datasets demonstrate the strong performance of GEM when compared to the state-of-the-art.


Data Augmentation for Continual RL via Adversarial Gradient Episodic Memory

Wu, Sihao, Zhao, Xingyu, Huang, Xiaowei

arXiv.org Artificial Intelligence

Data efficiency of learning, which plays a key role in the Reinforcement Learning (RL) training process, becomes even more important in continual RL with sequential environments. In continual RL, the learner interacts with non-stationary, sequential tasks and is required to learn new tasks without forgetting previous knowledge. However, there is little work on implementing data augmentation for continual RL. In this paper, we investigate the efficacy of data augmentation for continual RL. Specifically, we provide benchmarking data augmentations for continual RL, by (1) summarising existing data augmentation methods and (2) including a new augmentation method for continual RL: Adversarial Augmentation with Gradient Episodic Memory (Adv-GEM). Extensive experiments show that data augmentations, such as random amplitude scaling, state-switch, mixup, adversarial augmentation, and Adv-GEM, can improve existing continual RL algorithms in terms of their average performance, catastrophic forgetting, and forward transfer, on robot control tasks. All data augmentation methods are implemented as plug-in modules for trivial integration into continual RL methods.