state
SnapStream: Efficient Long Sequence Decoding on Dataflow Accelerators
Li, Jonathan, Farahini, Nasim, Iuliugin, Evgenii, Vesterlund, Magnus, Häggström, Christian, Wang, Guangtao, Upasani, Shubhangi, Sachdeva, Ayush, Li, Rui, Fu, Faline, Wu, Chen, Siddiqua, Ayesha, Long, John, Zhao, Tuowen, Musaddiq, Matheen, Zeffer, Håkan, Du, Yun, Wang, Mingran, Li, Qinghua, Li, Bo, Thakker, Urmish, Prabhakar, Raghu
The proliferation of 100B+ parameter Large Language Models (LLMs) with 100k+ context length support have resulted in increasing demands for on-chip memory to support large KV caches. Techniques such as StreamingLLM and SnapKV demonstrate how to control KV cache size while maintaining model accuracy. Yet, these techniques are not commonly used within industrial deployments using frameworks like vLLM or SGLang. The reason is twofold: on one hand, the static graphs and continuous batching methodology employed by these frameworks make it difficult to admit modifications to the standard multi-head attention algorithm, while on the other hand, the accuracy implications of such techniques on modern instruction-following and reasoning models are not well understood, obfuscating the need for implementing these techniques. In this paper, we explore these accuracy implications on Llama-3.1-8B-Instruct and DeepSeek-R1, and develop SnapStream, a KV cache compression method that can be deployed at scale. We demonstrate the efficacy of SnapStream in a 16-way tensor-parallel deployment of DeepSeek-671B on SambaNova SN40L accelerators running at 128k context length and up to 1832 tokens per second in a real production setting. SnapStream enables $4\times$ improved on-chip memory usage and introduces minimal accuracy degradation on LongBench-v2, AIME24 and LiveCodeBench. To the best of our knowledge, this is the first implementation of sparse KV attention techniques deployed in a production inference system with static graphs and continuous batching.
The Download: a peek at AI's future
Plus: Trump says he'll sign an order blocking states from regulating AI. There are huge gulfs of opinion when it comes to predicting the near-future impacts of generative AI. In one camp there are those who predict that over the next decade the impact of AI will exceed that of the Industrial Revolution--a 150-year period of economic and social upheaval so great that we still live in the world it wrought. At the other end of the scale we have team'Normal Technology': experts who push back not only on these sorts of predictions but on their foundational worldview. That's not how technology works, they argue. Advances at the cutting edge may come thick and fast, but change across the wider economy, and society as a whole, moves at human speed.
- Asia > China (0.06)
- North America > United States > Massachusetts (0.05)
- Law (0.91)
- Information Technology (0.69)
- Government > Regional Government (0.48)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.97)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.36)
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- Asia > Japan > Honshū > Kantō > Kanagawa Prefecture > Yokohama (0.04)
- (12 more...)
- Workflow (0.67)
- Overview (0.67)
- Research Report > New Finding (0.45)
- Information Technology (1.00)
- Health & Medicine (1.00)
- Energy (1.00)
- (3 more...)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (3 more...)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Software > Programming Languages (0.67)
- Information Technology > Artificial Intelligence > Natural Language (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- Africa (0.28)
- Asia > Japan (0.14)
- North America > United States > New York (0.14)
- (2 more...)
- Workflow (0.67)
- Overview (0.67)
- Research Report > New Finding (0.45)
- Health & Medicine (1.00)
- Energy (1.00)
- Information Technology (0.93)
- (2 more...)
- North America > United States (0.04)
- North America > Canada (0.04)
- Consumer Products & Services > Travel (0.94)
- Transportation > Passenger (0.68)
Newsom escalates clash with Trump in State of the State, declares California under siege
Things to Do in L.A. Tap to enable a layout that focuses on the article. California Gov. Gavin Newsom, shown in Sacramento last year, painted a portrait of a state under siege by the federal government in his written State of the State address Tuesday. Voice comes from the use of AI. Please report any issues or inconsistencies here . Gov. Gavin Newsom portrayed California as'menaced' by the Trump administration while emphasizing the state's resilience in responding to devastating wildfires.
- North America > United States > California > Los Angeles County > Los Angeles (0.07)
- North America > United States > Texas (0.04)
- North America > United States > Michigan (0.04)
- (4 more...)
Binary classification for perceived quality of headlines and links on worldwide news websites, 2018-2024
McCutcheon, Austin, de Oliveira, Thiago E. A., Zheleznov, Aleksandr, Brogly, Chris
The proliferation of online news enables potential widespread publication of perceived low-quality news headlines/links. As a result, we investigated whether it was possible to automatically distinguish perceived lower-quality news headlines/links from perceived higher-quality headlines/links. We evaluated twelve machine learning models on a binary, balanced dataset of 57,544,214 worldwide news website links/headings from 2018-2024 (28,772,107 per class) with 115 extracted linguistic features. Binary labels for each text were derived from scores based on expert consensus regarding the respective news domain quality. Traditional ensemble methods, particularly the bagging classifier, had strong performance (88.1% accuracy, 88.3% F1, 80/20 train/test split). Fine-tuned DistilBERT achieved the highest accuracy (90.3%, 80/20 train/test split) but required more training time. The results suggest that both NLP features with traditional classifiers and deep learning models can effectively differentiate perceived news headline/link quality, with some trade-off between predictive performance and train time.
Not Just Object, But State: Compositional Incremental Learning without Forgetting
Most incremental learners excessively prioritize object classes while neglecting various kinds of states (e.g. As a result, they are limited in the ability to model state-object compositionality accurately. To remedy this limitation, we propose a novel task called Compositional Incremental Learning (composition-IL), which enables the model to recognize a variety of state-object compositions in an incremental learning fashion. Since the lack of suitable datasets, we re-organize two existing datasets and make them tailored for composition-IL. Then, we propose a prompt-based Composition Incremental Learner (CompILer), to overcome the ambiguous composition boundary.
The State of Data Curation at NeurIPS: An Assessment of Dataset Development Practices in the Datasets and Benchmarks Track
Data curation is a field with origins in librarianship and archives, whose scholarship and thinking on data issues go back centuries, if not millennia. The field of machine learning is increasingly observing the importance of data curation to the advancement of both applications and fundamental understanding of machine learning models -- evidenced not least by the creation of the Datasets and Benchmarks track itself. This work provides an analysis of recent dataset development practices at NeurIPS through the lens of data curation. We present an evaluation framework for dataset documentation, consisting of a rubric and toolkit developed through a thorough literature review of data curation principles. We use the framework to systematically assess the strengths and weaknesses in current dataset development practices of 60 datasets published in the NeurIPS Datasets and Benchmarks track from 2021-2023.