Goto

Collaborating Authors

 modulation




Focal Modulation Networks

Neural Information Processing Systems

We propose focal modulation networks (FocalNets in short), where self-attention (SA) is completely replaced by a focal modulation module for modeling token interactions in vision. Focal modulation comprises three components: (i)hierarchical contextualization, implemented using a stack of depth-wise convolutional layers, to encode visual contexts from short to long ranges, (ii) gated aggregation to selectively gather contexts for each query token based on its content, and (iii) element-wise modulation or affine transformation to fuse the aggregated context into the query. Extensive experiments show FocalNets outperform the state-of-the-art SA counterparts (e.g., Swin and Focal Transformers) with similar computational cost on the tasks of image classification, object detection, and semantic segmentation. Specifically, FocalNets with tiny and base size achieve 82.3% and 83.9% top-1 accuracy on ImageNet-1K.


Appendix - Scalable Bayesian GPFA with automatic relevance determination and discrete noise models AFurther analyses of preparatory dynamics in the primate reaching task max sim

Neural Information Processing Systems

Here we briefly consider why introducing a prior over the factor matrix enables automatic relevance determination. These ideas reflect results by Bishop [1] and our experiments in Section 3.1. For simplicity, we will first consider the case of factor analysis where p(X) = Q d,tN(xdt; 0,1).



Online Adaptation of Language Models with a Memory of Amortized Contexts

Neural Information Processing Systems

Due to the rapid generation and dissemination of information, large language models (LLMs) quickly run out of date despite enormous development costs. To address the crucial need to keep models updated, online learning has emerged as a critical tool when utilizing LLMs for real-world applications. However, given the ever-expanding corpus of unseen documents and the large parameter space of modern LLMs, efficient adaptation is essential. To address these challenges, we propose Memory of Amortized Contexts (MAC), an efficient and effective online adaptation framework for LLMs with strong knowledge retention. We propose a feature extraction and memory-augmentation approach to compress and extract information from new documents into compact modulations stored in a memory bank.


Watch a classified FM radio training video from WW2

Popular Science

That crisp signal was once a really big deal. The film was made for the military in 1944 and released to the public five years later. Breakthroughs, discoveries, and DIY tips sent six days a week. While fewer and fewer people are listening to FM radio today, it was hot stuff amid its widespread rollout during the late 1930s and early 40s. Short for frequency modulation, FM's appeal compared to AM (amplitude modulation) were immediately apparent: a clearer sound, less static, and more reliable transmissions.


Online Adaptation of Language Models with a Memory of Amortized Contexts

Neural Information Processing Systems

However, given the ever-expanding corpus of unseen documents and the large parameter space of modern LLMs, efficient adaptation is essential. To address these challenges, we propose Memory of Amortized Contexts (MAC), an efficient and effective online adaptation framework for LLMs with strong knowledge retention.


Neural Modulation for Flash Memory: An Unsupervised Learning Framework for Improved Reliability

Neural Information Processing Systems

The continued scaling of flash memory technology into smaller process nodes, combined with the increased information capacity of each flash cell (i.e, storing more bits per cell), has placed NAND flash memory at the forefront of modern storage technology.


Strong and Precise Modulation of Human Percepts via Robustified ANNs Supplementary Material Pixel budget regimes

Neural Information Processing Systems

Subject screening To gain entry into the study, subjects were required to first perform a "demo" task consisting of 100 We refer to measures of human choice probability that are lapse-rate correct in this manner as "Normalized" (e.g., Supp. The typically observed lapse rates were quite low (median over subjects: 0%; mean 4.9%), indicating Figure 3: Human disruption rates are largely stable across stimulus presentation times. At shorter viewing times, we observed modest or no increases in disruption rate. Source images were captured with a smartphone camera. ImageNet classes, as previously defined in robustness library [2].