Goto

Collaborating Authors

 Problem Solving


Reviews: Learning to Predict Without Looking Ahead: World Models Without Forward Prediction

Neural Information Processing Systems

Interesting work that explores whether world model be learned without using a forward-predictive loss, and providing a novel perspective on model-based reinforcement learning. Introducing a method of'observational dropout', the paper presents the first step towards demonstrating the feasibility of learning only the salient features needed for a task. The paper rebuttal has baseline comparisons to model based RL, which will be a valuable addition to the paper.


Reviews: Continuous Hierarchical Representations with Poincarรฉ Variational Auto-Encoders

Neural Information Processing Systems

It uses ideas similar to very recent/concurrent work (Ganea et al., 2018; Ovinnikov, 2018; Nagano et al., 2019), but it is made clear how this work differs from related work. Quality: The submission seems technically sound, with detailed experimental results. The paper empirically compares their approach mostly with their Euclidean counterpart. This is fair, of course, but it would be interesting to see how it compares empirically with the Poincarรฉ Wasserstein Autoencoder (Ovinnikov, 2019) and the hyperboloid model of Nagano et al. (2019), like do they yield similar latent representations, how are the respective sample qualities? The background on Riemannian geometry is to the point, so that the paper is in most parts accessible to readers without training in non-Euclidean geometry. Nevertheless, I feel that readers could benefit from more high-level guidance in Appendix B, like what do we learn from Section B.8 and B.9? -Significance: I feel that this is a significant work and others can build on these ideas either methodologically or experimentally.


Reviews: Continuous Hierarchical Representations with Poincarรฉ Variational Auto-Encoders

Neural Information Processing Systems

This paper examines an alternative latent space, with sensible ablation studies, and sensible proposals for modifying the rest of the architecture to match. Our main complaint is that the paper lacks some empirical comparison with very recent related work (Ovinnikov, 2019, Nagano et al., 2019). However, even without such a comparison, we think it is still a complete and interesting paper.


GLAM: Global-Local Variation Awareness in Mamba-based World Model

arXiv.org Artificial Intelligence

Mimicking the real interaction trajectory in the inference of the world model has been shown to improve the sample efficiency of model-based reinforcement learning (MBRL) algorithms. Many methods directly use known state sequences for reasoning. However, this approach fails to enhance the quality of reasoning by capturing the subtle variation between states. Much like how humans infer trends in event development from this variation, in this work, we introduce Global-Local variation Awareness Mamba-based world model (GLAM) that improves reasoning quality by perceiving and predicting variation between states. GLAM comprises two Mambabased parallel reasoning modules, GMamba and LMamba, which focus on perceiving variation from global and local perspectives, respectively, during the reasoning process. GMamba focuses on identifying patterns of variation between states in the input sequence and leverages these patterns to enhance the prediction of future state variation. LMamba emphasizes reasoning about unknown information, such as rewards, termination signals, and visual representations, by perceiving variation in adjacent states. By integrating the strengths of the two modules, GLAM accounts for highervalue variation in environmental changes, providing the agent with more efficient imagination-based training. We demonstrate that our method outperforms existing methods in normalized human scores on the Atari 100k benchmark.


Reviews: Dense Associative Memory for Pattern Recognition

Neural Information Processing Systems

The theoretical contribution presented in 291--310 is a welcome insight on the computational power of ReLUs. The experimental results for rectified polynomial units reported in figures 2 and 3 are interesting and apparently novel, even in the context of standard feedforward multi-layer networks. Being 291--297 a central point of the paper it should be expanded and better justified. Furthermore, the simple capacity analysis developed in p. 3 for the polynomial energy function is invoked here for the rectified polynomial energy function. This has to be justified. The paper starts from and mostly focuses on the associative memory (Hamiltonian) formulation, but then the findings are restricted to one-step retrieval.


CQM: Curriculum Reinforcement Learning with a Quantized World Model

Neural Information Processing Systems

Recent curriculum Reinforcement Learning (RL) has shown notable progress in solving complex tasks by proposing sequences of surrogate tasks. However, the previous approaches often face challenges when they generate curriculum goals in a high-dimensional space. Thus, they usually rely on manually specified goal spaces. To alleviate this limitation and improve the scalability of the curriculum, we propose a novel curriculum method that automatically defines the semantic goal space which contains vital information for the curriculum process, and suggests curriculum goals over it. To define the semantic goal space, our method discretizes continuous observations via vector quantized-variational autoencoders (VQ-VAE) and restores the temporal relations between the discretized observations by a graph.


Language Models Meet World Models: Embodied Experiences Enhance Language Models

Neural Information Processing Systems

While large language models (LMs) have shown remarkable capabilities across numerous tasks, they often struggle with simple reasoning and planning in physical environments, such as understanding object permanence or planning household activities. The limitation arises from the fact that LMs are trained only on written text and miss essential embodied knowledge and skills. In this paper, we propose a new paradigm of enhancing LMs by finetuning them with world models, to gain diverse embodied knowledge while retaining their general language capabilities. Our approach deploys an embodied agent in a world model, particularly a simulator of the physical world (VirtualHome), and acquires a diverse set of embodied experiences through both goal-oriented planning and random exploration. These experiences are then used to finetune LMs to teach diverse abilities of reasoning and acting in the physical world, e.g., planning and completing goals, object permanence and tracking, etc.


Facing Off World Model Backbones: RNNs, Transformers, and S4

Neural Information Processing Systems

World models are a fundamental component in model-based reinforcement learning (MBRL). To perform temporally extended and consistent simulations of the future in partially observable environments, world models need to possess long-term memory. However, state-of-the-art MBRL agents, such as Dreamer, predominantly employ recurrent neural networks (RNNs) as their world model backbone, which have limited memory capacity. In this paper, we seek to explore alternative world model backbones for improving long-term memory. In particular, we investigate the effectiveness of Transformers and Structured State Space Sequence (S4) models, motivated by their remarkable ability to capture long-range dependencies in low-dimensional sequences and their complementary strengths.


The expressive power of pooling in Graph Neural Networks

Neural Information Processing Systems

In Graph Neural Networks (GNNs), hierarchical pooling operators generate local summaries of the data by coarsening the graph structure and the vertex features. Considerable attention has been devoted to analyzing the expressive power of message-passing (MP) layers in GNNs, while a study on how graph pooling affects the expressiveness of a GNN is still lacking. Additionally, despite the recent advances in the design of pooling operators, there is not a principled criterion to compare them. In this work, we derive sufficient conditions for a pooling operator to fully preserve the expressive power of the MP layers before it. These conditions serve as a universal and theoretically-grounded criterion for choosing among existing pooling operators or designing new ones.


KD-Zero: Evolving Knowledge Distiller for Any Teacher-Student Pairs

Neural Information Processing Systems

Knowledge distillation (KD) has emerged as an effective technique for compressing models that can enhance the lightweight model. Conventional KD methods propose various designs to allow student model to imitate the teacher better. However, these handcrafted KD designs heavily rely on expert knowledge and may be sub-optimal for various teacher-student pairs. In this paper, we present a novel framework, KD-Zero, which utilizes evolutionary search to automatically discover promising distiller from scratch for any teacher-student architectures. Then, we construct our distiller search space by selecting advanced operations for these three components.