Goto

Collaborating Authors

 hedgehog


Zoe Kleinman: Why the AI industry is the real winner of the Musk-Altman trial

BBC News

It is not only OpenAI but the AI race itself that was vindicated in the California courtroom last night . Even though Elon Musk essentially lost on a technicality, there's a clear signal from the verdict that making lots of money from AI and competing fiercely with rivals is simply business. The industry sometimes tries to display a united front, especially when it comes to safety, research and inclusivity. But this case served as a powerful reminder that none of the AI giants are charities and don't have to be, even if they once said otherwise. Cracks in the façade of industry collaboration for the sake of humanity have been exposed before.


Standard Chartered to cut thousands of roles as AI use increases

BBC News

Banking giant Standard Chartered has become the latest major company to announce job cuts as it increases its adoption of artificial intelligence (AI). The firm, which has its headquarters in the UK, said it will cut more than 15%, or around 7,800, back-office roles by 2030. The BBC understands that Standard Chartered aims to move some of the effected workers to other roles in the business. Companies around the world have announced major job cuts in recent months as they increasingly use AI tools for roles currently carried out by humans. The company did not give details of where the roles would be cut.


Satellites and AI used to track UK hedgehogs in bid to slow decline

BBC News

Researchers at the University of Cambridge are using satellite data and AI in an effort to slow the decline in Britain's hedgehog population. Using an AI tool called Tessera, which analyses detailed images of the UK gathered from space, experts can precisely determine locations of hedgehog habitats - and where these are disappearing. The resulting maps capture landscapes in minute detail, including down to individual hedgerows, while AI can accurately predict hedgehog-friendly places obscured by cloud cover. Those behind the project hope it will help to shed light not just on where hedgehogs live across the UK, but barriers preventing them from finding food and mates. The researchers say Tessera's outputs can be used to track the impact of new housing developments and other environmental changes on landscapes that could affect hedgehogs over time.


Scientists suggest modifying cars to hit fewer hedgehogs

Popular Science

Placing ultrasound repellants on cars could protect the spiny mammals. Up to one in three hedgehogs in local populations die on roads. Breakthroughs, discoveries, and DIY tips sent six days a week. When it comes to how animals use ultrasound, chances are you immediately think of bats and their amazing echolocation ability. However, researchers have discovered another--arguably much cuter--animal that can also hear ultrasound, with significant implications for its conservation.



LUNA: Linear Universal Neural Attention with Generalization Guarantees

arXiv.org Machine Learning

Scaling attention faces a critical bottleneck: the $\mathcal{O}(n^2)$ quadratic computational cost of softmax attention, which limits its application in long-sequence domains. While linear attention mechanisms reduce this cost to $\mathcal{O}(n)$, they typically rely on fixed random feature maps, such as random Fourier features or hand-crafted functions. This reliance on static, data-agnostic kernels creates a fundamental trade-off, forcing practitioners to sacrifice significant model accuracy for computational efficiency. We introduce \textsc{LUNA}, a kernelized linear attention mechanism that eliminates this trade-off, retaining linear cost while matching and surpassing the accuracy of quadratic attention. \textsc{LUNA} is built on the key insight that the kernel feature map itself should be learned rather than fixed a priori. By parameterizing the kernel, \textsc{LUNA} learns a feature basis tailored to the specific data and task, overcoming the expressive limitations of fixed-feature methods. \textsc{Luna} implements this with a learnable feature map that induces a positive-definite kernel and admits a streaming form, yielding linear time and memory scaling in the sequence length. Empirical evaluations validate our approach across diverse settings. On the Long Range Arena (LRA), \textsc{Luna} achieves state-of-the-art average accuracy among efficient Transformers under compute parity, using the same parameter count, training steps, and approximate FLOPs. \textsc{Luna} also excels at post-hoc conversion: replacing softmax in fine-tuned BERT and ViT-B/16 checkpoints and briefly fine-tuning recovers most of the original performance, substantially outperforming fixed linearizations.


LoLCATs: On Low-Rank Linearizing of Large Language Models

arXiv.org Machine Learning

Recent works show we can linearize large language models (LLMs) -- swapping the quadratic attentions of popular Transformer-based LLMs with subquadratic analogs, such as linear attention -- avoiding the expensive pretraining costs. However, linearizing LLMs often significantly degrades model quality, still requires training over billions of tokens, and remains limited to smaller 1.3B to 7B LLMs. We thus propose Low-rank Linear Conversion via Attention Transfer (LoLCATs), a simple two-step method that improves LLM linearizing quality with orders of magnitudes less memory and compute. We base these steps on two findings. First, we can replace an LLM's softmax attentions with closely-approximating linear attentions, simply by training the linear attentions to match their softmax counterparts with an output MSE loss ("attention transfer"). Then, this enables adjusting for approximation errors and recovering LLM quality simply with low-rank adaptation (LoRA). LoLCATs significantly improves linearizing quality, training efficiency, and scalability. We significantly reduce the linearizing quality gap and produce state-of-the-art subquadratic LLMs from Llama 3 8B and Mistral 7B v0.1, leading to 20+ points of improvement on 5-shot MMLU. Furthermore, LoLCATs does so with only 0.2% of past methods' model parameters and 0.4% of their training tokens. Finally, we apply LoLCATs to create the first linearized 70B and 405B LLMs (50x larger than prior work). When compared with prior approaches under the same compute budgets, LoLCATs significantly improves linearizing quality, closing the gap between linearized and original Llama 3.1 70B and 405B LLMs by 77.8% and 78.1% on 5-shot MMLU.


Sega's ninja game Shinobi to get the movie treatment

The Japan Times

One of Sega's most popular games, Shinobi, will be made into a movie in a joint project with Universal Pictures, the Japanese gamemaker announced Wednesday, aiming to emulate the success of "The Super Mario Bros. Movie." Sega did not give a target date for the release but said it had "started the development of a film production" with the Hollywood behemoth. Shinobi was originally created for Japanese arcades in 1987 and features a ninja character who fights to stop a criminal organization that kidnaps child ninjas. It is the latest effort to cash in on a video-game adaptation craze after "The Super Mario Bros. Movie" became the second-highest grossing film of 2023, following a 2020 adaptation of Sega's "Sonic the Hedgehog." "Shinobi is one of Sega's most popular series worldwide, along with Sonic the Hedgehog," Sega said on Wednesday.


How roadside rubbish kills up to ten animals a day: Hedgehogs, squirrels, deer and foxes fall victim to litter

Daily Mail - Science & tech

Roadside litter injures, traps or kills 10 animals every day, the RSPCA has revealed. The animal charity has warned that over the last three years, they received more than 10,000 reports of animals becoming distressed or even killed by discarded rubbish. It comes as separate research by National Highways reveals almost half of people are unaware that fruit peel and apple cores - which lure wildlife to their death - count as litter. A survey of 2,000 people also revealed that a third wrongly believe that dropping organic waste is beneficial to wildlife. While more than 90 per cent said they had never discarded litter onto the roadside, over 60 per cent said they had seen someone else doing it.


The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry

arXiv.org Artificial Intelligence

Linear attentions have shown potential for improving Transformer efficiency, reducing attention's quadratic complexity to linear in sequence length. This holds exciting promise for (1) training linear Transformers from scratch, (2) "finetuned-conversion" of task-specific Transformers into linear versions that recover task performance, and (3) "pretrained-conversion" of Transformers such as large language models into linear versions finetunable on downstream tasks. However, linear attentions often underperform standard softmax attention in quality. To close this performance gap, we find prior linear attentions lack key properties of softmax attention tied to good performance: low-entropy (or "spiky") weights and dot-product monotonicity. We further observe surprisingly simple feature maps that retain these properties and match softmax performance, but are inefficient to compute in linear attention. We thus propose Hedgehog, a learnable linear attention that retains the spiky and monotonic properties of softmax attention while maintaining linear complexity. Hedgehog uses simple trainable MLPs to produce attention weights mimicking softmax attention. Experiments show Hedgehog recovers over 99% of standard Transformer quality in train-from-scratch and finetuned-conversion settings, outperforming prior linear attentions up to 6 perplexity points on WikiText-103 with causal GPTs, and up to 8.7 GLUE score points on finetuned bidirectional BERTs. Hedgehog also enables pretrained-conversion. Converting a pretrained GPT-2 into a linear attention variant achieves state-of-the-art 16.7 perplexity on WikiText-103 for 125M subquadratic decoder models. We finally turn a pretrained Llama-2 7B into a viable linear attention Llama. With low-rank adaptation, Hedgehog-Llama2 7B achieves 28.1 higher ROUGE-1 points over the base standard attention model, where prior linear attentions lead to 16.5 point drops.