teal
WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference
Chen, Sihan, Zhao, Dan, Ko, Jongwoo, Banbury, Colby, Zhuang, Huiping, Liang, Luming, Chen, Tianyi
The growing computational demands of large language models (LLMs) make efficient inference and activation strategies increasingly critical. While recent approaches, such as Mixture-of-Experts (MoE), leverage selective activation but require specialized training, training-free sparse activation methods offer broader applicability and superior resource efficiency through their plug-and-play design. However, many existing methods rely solely on hidden state magnitudes to determine activation, resulting in high approximation errors and suboptimal inference accuracy. To address these limitations, we propose WINA (Weight Informed Neuron Activation), a novel, simple, and training-free sparse activation framework that jointly considers hidden state magnitudes and the column-wise $\ell_2$-norms of weight matrices. We show that this leads to a sparsification strategy that obtains optimal approximation error bounds with theoretical guarantees tighter than existing techniques. Empirically, WINA also outperforms state-of-the-art methods (e.g., TEAL) by up to $2.94\%$ in average performance at the same sparsity levels, across a diverse set of LLM architectures and datasets. These results position WINA as a new performance frontier for training-free sparse activation in LLM inference, advancing training-free sparse activation methods and setting a robust baseline for efficient inference. The source code is available at https://github.com/microsoft/wina.
TEAL: New Selection Strategy for Small Buffers in Experience Replay Class Incremental Learning
Shaul-Ariel, Shahar, Weinshall, Daphna
Continual Learning is an unresolved challenge, whose relevance increases when considering modern applications. Unlike the human brain, trained deep neural networks suffer from a phenomenon called Catastrophic Forgetting, where they progressively lose previously acquired knowledge upon learning new tasks. To mitigate this problem, numerous methods have been developed, many relying on replaying past exemplars during new task training. However, as the memory allocated for replay decreases, the effectiveness of these approaches diminishes. On the other hand, maintaining a large memory for the purpose of replay is inefficient and often impractical. Here we introduce TEAL, a novel approach to populate the memory with exemplars, that can be integrated with various experience-replay methods and significantly enhance their performance on small memory buffers. We show that TEAL improves the average accuracy of the SOTA method XDER as well as ER and ER-ACE on several image recognition benchmarks, with a small memory buffer of 1-3 exemplars per class in the final task. This confirms the hypothesis that when memory is scarce, it is best to prioritize the most typical data.
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models
Yang, Zhen, Zhang, Yingxue, Meng, Fandong, Zhou, Jie
Despite Multi-modal Large Language Models (MM-LLMs) have made exciting strides recently, they are still struggling to efficiently model the interactions among multi-modal inputs and the generation in non-textual modalities. In this work, we propose TEAL (Tokenize and Embed ALl)}, an approach to treat the input from any modality as a token sequence and learn a joint embedding space for all modalities. Specifically, for the input from any modality, TEAL first discretizes it into a token sequence with the off-the-shelf tokenizer and embeds the token sequence into a joint embedding space with a learnable embedding matrix. MM-LLMs just need to predict the multi-modal tokens autoregressively as the textual LLMs do. Finally, the corresponding de-tokenizer is applied to generate the output in each modality based on the predicted token sequence. With the joint embedding space, TEAL enables the frozen LLMs to perform both understanding and generation tasks involving non-textual modalities, such as image and audio. Thus, the textual LLM can just work as an interface and maintain its high performance in textual understanding and generation. Experiments show that TEAL achieves substantial improvements in multi-modal understanding, and implements a simple scheme for multi-modal generations.
Teal: Learning-Accelerated Optimization of WAN Traffic Engineering
Xu, Zhiying, Yan, Francis Y., Singh, Rachee, Chiu, Justin T., Rush, Alexander M., Yu, Minlan
The rapid expansion of global cloud wide-area networks (WANs) has posed a challenge for commercial optimization engines to efficiently solve network traffic engineering (TE) problems at scale. Existing acceleration strategies decompose TE optimization into concurrent subproblems but realize limited parallelism due to an inherent tradeoff between run time and allocation performance. We present Teal, a learning-based TE algorithm that leverages the parallel processing power of GPUs to accelerate TE control. First, Teal designs a flow-centric graph neural network (GNN) to capture WAN connectivity and network flows, learning flow features as inputs to downstream allocation. Second, to reduce the problem scale and make learning tractable, Teal employs a multi-agent reinforcement learning (RL) algorithm to independently allocate each traffic demand while optimizing a central TE objective. Finally, Teal fine-tunes allocations with ADMM (Alternating Direction Method of Multipliers), a highly parallelizable optimization algorithm for reducing constraint violations such as overutilized links. We evaluate Teal using traffic matrices from Microsoft's WAN. On a large WAN topology with >1,700 nodes, Teal generates near-optimal flow allocations while running several orders of magnitude faster than the production optimization engine. Compared with other TE acceleration schemes, Teal satisfies 6--32% more traffic demand and yields 197--625x speedups.
Democratizing Ethical Assessment of Natural Language Generation Models
Natural language generation models are computer systems that generate coherent language when prompted with a sequence of words as context. Despite their ubiquity and many beneficial applications, language generation models also have the potential to inflict social harms by generating discriminatory language, hateful speech, profane content, and other harmful material. Ethical assessment of these models is therefore critical. But it is also a challenging task, requiring an expertise in several specialized domains, such as computational linguistics and social justice. While significant strides have been made by the research community in this domain, accessibility of such ethical assessments to the wider population is limited due to the high entry barriers. This article introduces a new tool to democratize and standardize ethical assessment of natural language generation models: Tool for Ethical Assessment of Language generation models (TEAL), a component of Credo AI Lens, an open-source assessment framework.
UVify's Draco drone is fast, furious fun for wannabe racers
I look down and start gliding toward a dilapidated skate park below. Once I'm near the ground I pull my nose up and look level with the horizon. Spotting two trees, I race toward them, pass between them, then turn on a dime, skirting some shipping containers on my left. It's like every dream I've ever had about flying, but faster. I take off a pair of video goggles, and I see the shipping containers come into focus, this time directly in front of me, as my eyes adjust to the sunlight. This is my third "First Person View" flight with the Draco drone, and it's more exciting every time.
How drones are learning to find their own way in the world
When you're zipping through the air at 60 kilometres per hour, it can be hard to work out where you're going. But now drones can create detailed 3D maps as they fly – an advance that could let them navigate the world free from human input. Called Hydra Fusion, the system could one day allow drones to use a form of navigation known as simultaneous localisation and mapping to find their way in unfamiliar spaces – just as some robots do on the ground. It will also make them better at aerial surveillance. Hydra Fusion works by stitching together multiple images – in this case, consecutive frames of footage from a drone's video camera – to form a detailed 3D map while it is in the air.
Flying at 85MPH Isn't Even the Teal Drone's Best Trick
Sure, with a top speed of 85 mph, it is twice as fast as a DJI Phantom 4 and it will leave almost every consumer drone eating its dust. But its appeal goes well beyond air speed. Buying a drone typically means having a specific activity in mind. There are aerial photography drones, racing drones, follow-me around drones--it can all be a little overwhelming, particularly for someone who's new to UAVs. Teal wants to solve this problem with a modular machine you can tailor to suit to your exact needs.