Goto

Collaborating Authors

 receptive field growth


$π$-Attention: Periodic Sparse Transformers for Efficient Long-Context Modeling

arXiv.org Artificial Intelligence

Transformers have revolutionized natural language processing, but their quadratic complexity with respect to sequence length remains a fundamental bottleneck for long-range modeling. While sparse attention mechanisms like RingAttention reduce computational costs by restricting attention to local neighborhoods, they suffer from limited receptive fields and lack of adaptability. We present \PiAttention, a periodic sparse Transformer that factorizes attention into ring-local neighborhoods, deterministic $π$-stride skips, and an adaptive fusion gate. The periodic structure provides predictable coverage of distant tokens, while the sparse footprint keeps the per-layer complexity linear in context length. We prove that \PiAttention achieves $\mathcal{O}(kL + π\log L)$ receptive field growth compared to $\mathcal{O}(kL)$ for RingAttention, where $k$ is the local window size, $π$ is the skip period, and $L$ is the sequence length. Extensive experiments on language modeling, retrieval, and vision-language tasks demonstrate that \PiAttention matches or surpasses dense attention quality with 8.3\% lower perplexity than RingAttention while using 50\% fewer GPUs for the same context length. Our detailed ablations and visualizations reveal the importance of periodic skips, adaptive fusion, and head-level sparsity coordination for efficient long-context modeling.


Neural mechanisms of contrast dependent receptive field size in V1

Neural Information Processing Systems

Based on a large scale spiking neuron model of the input layers 4Cα and β of macaque, we identify neural mechanisms for the observed contrast dependent receptive field size of V1 cells. We observe a rich variety of mechanisms for the phenomenon and analyze them based on the relative gain of excitatory and inhibitory synaptic inputs. We observe an average growth in the spatial extent of excitation and inhibition for low contrast, as predicted from phenomenological models. However, contrary to phenomenological models, our simulation results suggest this is neither sufficient nor necessary to explain the phenomenon.


Neural mechanisms of contrast dependent receptive field size in V1

Neural Information Processing Systems

Based on a large scale spiking neuron model of the input layers 4Cα and β of macaque, we identify neural mechanisms for the observed contrast dependent receptive field size of V1 cells. We observe a rich variety of mechanisms for the phenomenon and analyze them based on the relative gain of excitatory and inhibitory synaptic inputs. We observe an average growth in the spatial extent of excitation and inhibition for low contrast, as predicted from phenomenological models. However, contrary to phenomenological models, our simulation results suggest this is neither sufficient nor necessary to explain the phenomenon.


Neural mechanisms of contrast dependent receptive field size in V1

Neural Information Processing Systems

Neurons in the primary Visual cortex (V1) display what is often referred to as "size tuning", i.e. the response of a cell is maximal around a cell-specific stimulus size and generally decreases substantially (30-40% on average) or vanishes altogether for larger stimulus