Goto

Collaborating Authors

 modulation


DyMoDreamer: World Modeling with Dynamic Modulation

Neural Information Processing Systems

A critical bottleneck in deep reinforcement learning (DRL) is sample inefficiency, as training high-performance agents often demands extensive environmental interactions. Model-based reinforcement learning (MBRL) mitigates this by building world models that simulate environmental dynamics and generate synthetic experience, improving sample efficiency. However, conventional world models process observations holistically, failing to decouple dynamic objects and temporal features from static backgrounds. This approach is computationally inefficient, especially for visual tasks where dynamic objects significantly influence rewards and decisionmaking performance. To address this, we introduce DyMoDreamer, a novel MBRL algorithm that incorporates a dynamic modulation mechanism to improve the extraction of dynamic features and enrich the temporal information. DyMoDreamer employs differential observations derived from a novel inter-frame differencing mask, explicitly encoding object-level motion cues and temporal dynamics. Dynamic modulation is modeled as stochastic categorical distributions and integrated into a recurrent state-space model (RSSM), enhancing the model's focus on rewardrelevant dynamics. Experiments demonstrate that DyMoDreamer sets a new stateof-the-art on the Atari 100k benchmark with a 156.6% mean human-normalized score, establishes a new record of 832 on the DeepMind Visual Control Suite, and gains a 9.5% performance improvement after 1M steps on the Crafter benchmark.


MoRIC: AModular Region-based Implicit Codec for Image Compression

Neural Information Processing Systems

We introduce Modular Region-Based Implicit Codec (MoRIC), a novel image compression algorithm that relies on implicit neural representations (INRs). Unlike previous INR-based codecs that model the entire image with a single neural network, MoRIC assigns dedicated models to distinct regions in the image, each tailored to its local distribution. This region-wise design enhances adaptation to local statistics and enables flexible, single-object compression with fine-grained ratedistortion (RD) control. MoRIC allows regions of arbitrary shapes, and provides the contour information for each region as separate information. In particular, it incorporates adaptive chain coding for lossy and lossless contour compression, and a shared global modulator that injects multi-scale global context into local overfitting processes in a coarse-to-fine manner. MoRIC achieves state-of-the-art performance in single-object compression with significantly lower decoding complexity than existing learned neural codecs, which results in a highly efficient compression approach for fixed-background scenarios, e.g., for surveillance cameras. It also sets a new benchmark among overfitted codecs for standard image compression. Additionally, MoRIC naturally supports semantically meaningful layered compression through selective region refinement, paving the way for scalable and flexible INR-based codecs.





Focal Modulation Networks

Neural Information Processing Systems

We propose focal modulation networks (FocalNets in short), where self-attention (SA) is completely replaced by a focal modulation module for modeling token interactions in vision. Focal modulation comprises three components: (i)hierarchical contextualization, implemented using a stack of depth-wise convolutional layers, to encode visual contexts from short to long ranges, (ii) gated aggregation to selectively gather contexts for each query token based on its content, and (iii) element-wise modulation or affine transformation to fuse the aggregated context into the query. Extensive experiments show FocalNets outperform the state-of-the-art SA counterparts (e.g., Swin and Focal Transformers) with similar computational cost on the tasks of image classification, object detection, and semantic segmentation. Specifically, FocalNets with tiny and base size achieve 82.3% and 83.9% top-1 accuracy on ImageNet-1K.


Appendix - Scalable Bayesian GPFA with automatic relevance determination and discrete noise models AFurther analyses of preparatory dynamics in the primate reaching task max sim

Neural Information Processing Systems

Here we briefly consider why introducing a prior over the factor matrix enables automatic relevance determination. These ideas reflect results by Bishop [1] and our experiments in Section 3.1. For simplicity, we will first consider the case of factor analysis where p(X) = Q d,tN(xdt; 0,1).



Online Adaptation of Language Models with a Memory of Amortized Contexts

Neural Information Processing Systems

Due to the rapid generation and dissemination of information, large language models (LLMs) quickly run out of date despite enormous development costs. To address the crucial need to keep models updated, online learning has emerged as a critical tool when utilizing LLMs for real-world applications. However, given the ever-expanding corpus of unseen documents and the large parameter space of modern LLMs, efficient adaptation is essential. To address these challenges, we propose Memory of Amortized Contexts (MAC), an efficient and effective online adaptation framework for LLMs with strong knowledge retention. We propose a feature extraction and memory-augmentation approach to compress and extract information from new documents into compact modulations stored in a memory bank.


Watch a classified FM radio training video from WW2

Popular Science

That crisp signal was once a really big deal. The film was made for the military in 1944 and released to the public five years later. Breakthroughs, discoveries, and DIY tips sent six days a week. While fewer and fewer people are listening to FM radio today, it was hot stuff amid its widespread rollout during the late 1930s and early 40s. Short for frequency modulation, FM's appeal compared to AM (amplitude modulation) were immediately apparent: a clearer sound, less static, and more reliable transmissions.