Goto

Collaborating Authors

 hedgehog


LUNA: Linear Universal Neural Attention with Generalization Guarantees

Shahbazi, Ashkan, He, Ping, Abbasi, Ali, Bai, Yikun, Liu, Xinran, Akbari, Elaheh, Salehi, Darian, NaderiAlizadeh, Navid, Kolouri, Soheil

arXiv.org Machine Learning

Scaling attention faces a critical bottleneck: the $\mathcal{O}(n^2)$ quadratic computational cost of softmax attention, which limits its application in long-sequence domains. While linear attention mechanisms reduce this cost to $\mathcal{O}(n)$, they typically rely on fixed random feature maps, such as random Fourier features or hand-crafted functions. This reliance on static, data-agnostic kernels creates a fundamental trade-off, forcing practitioners to sacrifice significant model accuracy for computational efficiency. We introduce \textsc{LUNA}, a kernelized linear attention mechanism that eliminates this trade-off, retaining linear cost while matching and surpassing the accuracy of quadratic attention. \textsc{LUNA} is built on the key insight that the kernel feature map itself should be learned rather than fixed a priori. By parameterizing the kernel, \textsc{LUNA} learns a feature basis tailored to the specific data and task, overcoming the expressive limitations of fixed-feature methods. \textsc{Luna} implements this with a learnable feature map that induces a positive-definite kernel and admits a streaming form, yielding linear time and memory scaling in the sequence length. Empirical evaluations validate our approach across diverse settings. On the Long Range Arena (LRA), \textsc{Luna} achieves state-of-the-art average accuracy among efficient Transformers under compute parity, using the same parameter count, training steps, and approximate FLOPs. \textsc{Luna} also excels at post-hoc conversion: replacing softmax in fine-tuned BERT and ViT-B/16 checkpoints and briefly fine-tuning recovers most of the original performance, substantially outperforming fixed linearizations.


LoLCATs: On Low-Rank Linearizing of Large Language Models

Zhang, Michael, Arora, Simran, Chalamala, Rahul, Wu, Alan, Spector, Benjamin, Singhal, Aaryan, Ramesh, Krithik, Ré, Christopher

arXiv.org Machine Learning

Recent works show we can linearize large language models (LLMs) -- swapping the quadratic attentions of popular Transformer-based LLMs with subquadratic analogs, such as linear attention -- avoiding the expensive pretraining costs. However, linearizing LLMs often significantly degrades model quality, still requires training over billions of tokens, and remains limited to smaller 1.3B to 7B LLMs. We thus propose Low-rank Linear Conversion via Attention Transfer (LoLCATs), a simple two-step method that improves LLM linearizing quality with orders of magnitudes less memory and compute. We base these steps on two findings. First, we can replace an LLM's softmax attentions with closely-approximating linear attentions, simply by training the linear attentions to match their softmax counterparts with an output MSE loss ("attention transfer"). Then, this enables adjusting for approximation errors and recovering LLM quality simply with low-rank adaptation (LoRA). LoLCATs significantly improves linearizing quality, training efficiency, and scalability. We significantly reduce the linearizing quality gap and produce state-of-the-art subquadratic LLMs from Llama 3 8B and Mistral 7B v0.1, leading to 20+ points of improvement on 5-shot MMLU. Furthermore, LoLCATs does so with only 0.2% of past methods' model parameters and 0.4% of their training tokens. Finally, we apply LoLCATs to create the first linearized 70B and 405B LLMs (50x larger than prior work). When compared with prior approaches under the same compute budgets, LoLCATs significantly improves linearizing quality, closing the gap between linearized and original Llama 3.1 70B and 405B LLMs by 77.8% and 78.1% on 5-shot MMLU.


Sega's ninja game Shinobi to get the movie treatment

The Japan Times

One of Sega's most popular games, Shinobi, will be made into a movie in a joint project with Universal Pictures, the Japanese gamemaker announced Wednesday, aiming to emulate the success of "The Super Mario Bros. Movie." Sega did not give a target date for the release but said it had "started the development of a film production" with the Hollywood behemoth. Shinobi was originally created for Japanese arcades in 1987 and features a ninja character who fights to stop a criminal organization that kidnaps child ninjas. It is the latest effort to cash in on a video-game adaptation craze after "The Super Mario Bros. Movie" became the second-highest grossing film of 2023, following a 2020 adaptation of Sega's "Sonic the Hedgehog." "Shinobi is one of Sega's most popular series worldwide, along with Sonic the Hedgehog," Sega said on Wednesday.


How roadside rubbish kills up to ten animals a day: Hedgehogs, squirrels, deer and foxes fall victim to litter

Daily Mail - Science & tech

Roadside litter injures, traps or kills 10 animals every day, the RSPCA has revealed. The animal charity has warned that over the last three years, they received more than 10,000 reports of animals becoming distressed or even killed by discarded rubbish. It comes as separate research by National Highways reveals almost half of people are unaware that fruit peel and apple cores - which lure wildlife to their death - count as litter. A survey of 2,000 people also revealed that a third wrongly believe that dropping organic waste is beneficial to wildlife. While more than 90 per cent said they had never discarded litter onto the roadside, over 60 per cent said they had seen someone else doing it.


The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry

Zhang, Michael, Bhatia, Kush, Kumbong, Hermann, Ré, Christopher

arXiv.org Artificial Intelligence

Linear attentions have shown potential for improving Transformer efficiency, reducing attention's quadratic complexity to linear in sequence length. This holds exciting promise for (1) training linear Transformers from scratch, (2) "finetuned-conversion" of task-specific Transformers into linear versions that recover task performance, and (3) "pretrained-conversion" of Transformers such as large language models into linear versions finetunable on downstream tasks. However, linear attentions often underperform standard softmax attention in quality. To close this performance gap, we find prior linear attentions lack key properties of softmax attention tied to good performance: low-entropy (or "spiky") weights and dot-product monotonicity. We further observe surprisingly simple feature maps that retain these properties and match softmax performance, but are inefficient to compute in linear attention. We thus propose Hedgehog, a learnable linear attention that retains the spiky and monotonic properties of softmax attention while maintaining linear complexity. Hedgehog uses simple trainable MLPs to produce attention weights mimicking softmax attention. Experiments show Hedgehog recovers over 99% of standard Transformer quality in train-from-scratch and finetuned-conversion settings, outperforming prior linear attentions up to 6 perplexity points on WikiText-103 with causal GPTs, and up to 8.7 GLUE score points on finetuned bidirectional BERTs. Hedgehog also enables pretrained-conversion. Converting a pretrained GPT-2 into a linear attention variant achieves state-of-the-art 16.7 perplexity on WikiText-103 for 125M subquadratic decoder models. We finally turn a pretrained Llama-2 7B into a viable linear attention Llama. With low-rank adaptation, Hedgehog-Llama2 7B achieves 28.1 higher ROUGE-1 points over the base standard attention model, where prior linear attentions lead to 16.5 point drops.


Warning that robot lawnmowers are killing hedgehogs: Scientists propose must-have garden gadgets come with 'safety certificates'

Daily Mail - Science & tech

Hedgehogs are increasingly being killed and injured from encounters with robot lawnmowers which have few safety features to protect wildlife, according to Oxford University scientists. Researchers conducted a series of tests with the mowers, the latest must-have garden gadget, with a view to create a'hedgehog friendly' certification so gardeners need not fear any prickly casualties when they trim the grass. To ensure no harm was caused to living hedgehogs, scientists used rubber'crash test hedgehogs' instead to see if the robot mower would turn away on encountering one of Mrs Tiggywinkle's tribe on the lawn. Hedgehogs are already in serious decline, with reasons including habitat loss, road traffic accidents, intensive agriculture, and injuries from dog bites and garden strimmers. But now mowers are adding to the threats.


Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference Using Sorted Fine-Tuning (SoFT)

Kavehzadeh, Parsa, Valipour, Mojtaba, Tahaei, Marzieh, Ghodsi, Ali, Chen, Boxing, Rezagholizadeh, Mehdi

arXiv.org Artificial Intelligence

The rapid advancement of large language models (LLMs) has revolutionized natural language processing (NLP). While these models excel at understanding and generating human-like text, their widespread deployment can be prohibitively expensive. SortedNet is a recent training technique for enabling dynamic inference for deep neural networks. It leverages network modularity to create sub-models with varying computational loads, sorting them based on computation/accuracy characteristics in a nested manner. We extend SortedNet to generative NLP tasks, making large language models dynamic without any pretraining and by only replacing standard Supervised Fine-Tuning (SFT) with Sorted Fine-Tuning (SoFT) at the same costs. Our approach boosts model efficiency, eliminating the need for multiple models for various scenarios during inference. We show that using this approach, we are able to unlock the potential of intermediate layers of transformers in generating the target output. Our sub-models remain integral components of the original model, minimizing storage requirements and transition costs between different computational/latency budgets. By applying this approach on LLaMa 2 13B for tuning on the Stanford Alpaca dataset and comparing it to normal tuning and early exit via PandaLM benchmark, we show that Sorted Fine-Tuning can deliver models twice as fast as the original model while maintaining or exceeding performance.


The HAPPY HEDGEHOG Project

Bendel, Oliver, Graf, Emanuel, Bollier, Kevin

arXiv.org Artificial Intelligence

Semi-autonomous machines, autonomous machines and robots inhabit closed, semi-closed and open environments, more structured environments like the household or more unstructured environments like cultural landscapes or the wilderness. There they encounter domestic animals, farm animals, working animals, and wild animals. These creatures could be disturbed, displaced, injured, or killed by the machines. Within the context of machine ethics and social robotics, the School of Business FHNW developed several design studies and prototypes for animal-friendly machines, which can be understood as moral and social machines in the spirit of these disciplines. In 2019-20, a team led by the main author developed a prototype robot lawnmower that can recognize hedgehogs, interrupt its work for them and thus protect them. Every year many of these animals die worldwide because of traditional service robots. HAPPY HEDGEHOG (HHH), as the invention is called, could be a solution to this problem. This article begins by providing an introduction to the background. Then it focuses on navigation (where the machine comes across certain objects that need to be recognized) and thermal and image recognition (with the help of machine learning) of the machine. It also presents obvious weaknesses and possible improvements. The results could be relevant for an industry that wants to market their products as animal-friendly machines.


'Only AI made it possible': scientists hail breakthrough in tracking British wildlife

The Guardian

Researchers have developed arrays of AI-controlled cameras and microphones to identify animals and birds and to monitor their movements in the wild – technology, they say, that should help tackle Britain's growing biodiversity problem. The robot monitors have been tested at three sites and have captured sounds and images from which computers were able to identify specific species and map their locations. Dozens of different birds were recognised from their songs while foxes, deer, hedgehogs and bats were pinpointed and identified by AI analysis. No human observers are involved. "The crucial point is the scale of the operation," said Anthony Dancer, a conservation specialist at the Zoological Society of London (ZSL).


Pushing Buttons: Why Sonic and Mario duelling it out in 2D again will be a spectacle

The Guardian

Rivalry is a vital element of fandom. Whether its punks v rockers, Star Trek v Star Wars or Marvel v DC, subcultures have always defined themselves by what they're not as much as what they are. Which is why I'm secretly delighted that Sega and Nintendo are apparently releasing their new Sonic and Mario games within days of each other this October. Both Super Mario Bros Wonder and Sega Superstars are nostalgic callbacks to the era of 2D platforming. Both games allow players to select from a range of classic characters and take on the rich, lushly colourful environments in cooperative modes, and both supplement the retro aesthetics with new abilities.