Africa
Parallelizing Linear Transformers with the Delta Rule over Sequence Length Songlin Y ang Bailin Wang Y u Zhang Yikang Shen Y oon Kim Massachusetts Institute of Technology Soochow University
Transformers with linear attention (i.e., linear transfor mers) and state-space models have recently been suggested as a viable linear-time alt ernative to transformers with softmax attention. However, these models still underp erform transformers especially on tasks that require in-context retrieval. Whil e more expressive variants of linear transformers which replace the additive upda te in linear transformers with the delta rule [DeltaNet; 101 ] have been found to be more effective at associative recall, existing algorithms for training such mode ls do not parallelize over sequence length and are thus inefficient to train on modern ha rdware. This work describes a hardware-efficient algorithm for training line ar transformers with the delta rule, which exploits a memory-efficient representati on for computing products of Householder matrices [ 11 ]. This algorithm allows us to scale up DeltaNet to standard language modeling settings. We train a 1.3B mode l for 100B tokens and find that it outperforms recent linear-time baselines su ch as Mamba [ 31 ] and GLA [ 124 ] in terms of perplexity and zero-shot performance on downst ream tasks. We also experiment with two hybrid models which combine Delt aNet layers with (1) sliding-window attention layers every other layer or (2) two global attention layers, and find that these hybrids outperform strong transf ormer baselines.
Supplementary Material and Datasheet: Off to new Shores: A Dataset & Benchmark for (near-)coastal Flood Inundation Forecasting Contents
This supplementary document follows the Datasheets for Datasets template of (8) to document the Global Flood Forecasting (GFF) dataset and its creation. Further resources are provided: in the accompanying publication https://arxiv.org/abs/2409.18591 in the GitHub repository https://github.com/Multihuntr/GFF
Multi-Group Proportional Representation in Retrieval
Current approaches to mitigate these representational harms balance the number of retrieved items across population groups defined by a small number of (often binary) attributes. However, most existing methods overlook intersectional groups determined by combinations of group attributes, such as gender, race, and ethnicity.
The Shutdown Is Pushing Air Safety Workers to the Limit
Federal employees say that flying is still safe despite the strain on air traffic controllers. But expect even more airport delays ahead. It hasn't been a good year for federal aviation safety workers. January saw the worst US commercial airline disaster in decades, quickly followed by sudden layoffs, staffing shortfalls, major technology glitches at one of the nation's busiest airports, and short timelines to rebuild the systems that govern national airspace. It somehow got worse this month, when a stalemate between congressional Republicans and Democrats led to a government shutdown.