AITopics | chunk representation

Collaborating Authors

chunk representation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Attention and Compression is all you need for Controllably Efficient Language Models

Prakash, Jatin, Puli, Aahlad, Ranganath, Rajesh

arXiv.org Artificial IntelligenceNov-10-2025

The quadratic cost of attention in transformers motivated the development of efficient approaches: namely sparse and sliding window attention, convolutions and linear attention. Although these approaches result in impressive reductions in compute and memory, they often trade-off with quality, specifically in-context recall performance. Moreover, apriori fixing this quality-compute tradeoff means being suboptimal from the get-go: some downstream applications require more memory for in-context recall, while others require lower latency and memory. Further, these approaches rely on heuristic choices that artificially restrict attention, or require handcrafted and complex recurrent state update rules, or they must be carefully composed with attention at specific layers to form a hybrid architecture that complicates the design process, especially at scale. To address above issues, we propose Compress & Attend Transformer (CAT), a conceptually simple architecture employing two simple ingredients only: dense attention and compression. CAT decodes chunks of tokens by attending to compressed chunks of the sequence so far. Compression results in decoding from a reduced sequence length that yields compute and memory savings, while choosing a particular chunk size trades-off quality for efficiency. Moreover, CAT can be trained with multiple chunk sizes at once, unlocking control of quality-compute trade-offs directly at test-time without any retraining, all in a single adaptive architecture. In exhaustive evaluations on common language modeling tasks, in-context recall, and long-context understanding, a single adaptive CAT model outperforms existing efficient baselines, including hybrid architectures, across different compute-memory budgets. Further, a single CAT matches dense transformer in language modeling across model scales while being 1.4-3x faster and requiring 2-9x lower total memory usage.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2511.05313

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

ChuLo: Chunk-Level Key Information Representation for Long Document Processing

Li, Yan, Han, Soyeon Caren, Dai, Yue, Cao, Feiqi

arXiv.org Artificial IntelligenceNov-11-2024

Transformer-based models have achieved remarkable success in various Natural Language Processing (NLP) tasks, yet their ability to handle long documents is constrained by computational limitations. Traditional approaches, such as truncating inputs, sparse self-attention, and chunking, attempt to mitigate these issues, but they often lead to information loss and hinder the model's ability to capture long-range dependencies. In this paper, we introduce ChuLo, a novel chunk representation method for long document classification that addresses these limitations. Our ChuLo groups input tokens using unsupervised keyphrase extraction, emphasizing semantically important keyphrase based chunk to retain core document content while reducing input length. This approach minimizes information loss and improves the efficiency of Transformer-based models. Preserving all tokens in long document understanding, especially token classification tasks, is especially important to ensure that fine-grained annotations, which depend on the entire sequence context, are not lost. We evaluate our method on multiple long document classification tasks and long document token classification tasks, demonstrating its effectiveness through comprehensive qualitative and quantitative analyses.

chunk representation, dataset, information, (14 more...)

arXiv.org Artificial Intelligence

2410.11119

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > Canada > Ontario > Toronto (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(6 more...)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.96)

Add feedback

LongHeads: Multi-Head Attention is Secretly a Long Context Processor

Lu, Yi, Zhou, Xin, He, Wei, Zhao, Jun, Ji, Tao, Gui, Tao, Zhang, Qi, Huang, Xuanjing

arXiv.org Artificial IntelligenceFeb-16-2024

Large language models (LLMs) have achieved impressive performance in numerous domains but often struggle to process lengthy inputs effectively and efficiently due to limited length generalization and attention's quadratic computational demands. Many sought to mitigate this by restricting the attention window within the pre-trained length. However, these methods introduce new issues such as ignoring the middle context and requiring additional training. To address these problems, we propose LongHeads, a training-free framework that enhances LLM's long context ability by unlocking multi-head attention's untapped potential. Instead of allowing each head to attend to the full sentence, which struggles with generalizing to longer sequences due to out-of-distribution (OOD) issues, we allow each head to process in-distribution length by selecting and attending to important context chunks. To this end, we propose a chunk selection strategy that relies on the inherent correlation between the query and the key representations, efficiently distributing context chunks to different heads. In this way, each head ensures it can effectively process attended tokens within the trained length, while different heads in different layers can collectively process longer contexts. LongHeads works efficiently in linear time, fits seamlessly with many LLMs that use relative positional encoding. Our extensive empirical analyses verify LongHeads's efficacy in extending the usable context window for existing models, showcasing its promise for enhancing long text understanding.

chunk representation, information, representation, (14 more...)

arXiv.org Artificial Intelligence

2402.10685

Country: Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Breaking the Token Barrier: Chunking and Convolution for Efficient Long Text Classification with BERT

Jaiswal, Aman, Milios, Evangelos

arXiv.org Artificial IntelligenceOct-31-2023

Transformer-based models, specifically BERT, have propelled research in various NLP tasks. However, these models are limited to a maximum token limit of 512 tokens. Consequently, this makes it non-trivial to apply it in a practical setting with long input. Various complex methods have claimed to overcome this limit, but recent research questions the efficacy of these models across different classification tasks. These complex architectures evaluated on carefully curated long datasets perform at par or worse than simple baselines. In this work, we propose a relatively simple extension to vanilla BERT architecture called ChunkBERT that allows finetuning of any pretrained models to perform inference on arbitrarily long text. The proposed method is based on chunking token representations and CNN layers, making it compatible with any pre-trained BERT. We evaluate chunkBERT exclusively on a benchmark for comparing long-text classification models across a variety of tasks (including binary classification, multi-class classification, and multi-label classification). A BERT model finetuned using the ChunkBERT method performs consistently across long samples in the benchmark while utilizing only a fraction (6.25\%) of the original memory footprint. These findings suggest that efficient finetuning and inference can be achieved through simple modifications to pre-trained BERT models.

bert, dataset, information, (16 more...)

arXiv.org Artificial Intelligence

2310.20558

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
North America > Canada (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(7 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback