AITopics

Country:

Europe (1.00)
North America > United States > Minnesota (0.28)

Genre: Personal (0.67)

Industry:

Law (0.67)
Health & Medicine > Therapeutic Area (0.46)
Government > Military (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.86)

Neural Information Processing SystemsFeb-14-2026, 00:46:47 GMT

6f9806a5adc72b5b834b27e4c7c0df9b-Paper-Conference.pdf

artificial intelligence, machine learning, natural language, (19 more...)

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.05)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(10 more...)

Genre: Personal (0.67)

Industry:

Law (0.67)
Health & Medicine > Therapeutic Area (0.46)
Government > Military (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Neural Information Processing SystemsDec-25-2025, 23:58:15 GMT

Unlimiformer: Long-Range Transformers with Unlimited Length Input

Since the proposal of transformers, these models have been limited to bounded input lengths, because of their need to attend to every token in the input. In this work, we propose Unlimiformer: a general approach that wraps any existing pretrained encoder-decoder transformer, and offloads the cross-attention computation to a single $k$-nearest-neighbor ($k$NN) index, while the returned $k$NN distances are the attention dot-product scores. This $k$NN index can be kept on either the GPU or CPU memory and queried in sub-linear time; this way, we can index practically unlimited input sequences, while every attention head in every decoder layer retrieves its top-$k$ keys, instead of attending to every key. We evaluate Unlimiformer on several long-document and book-summarization benchmarks, showing that it can process even **500k** token-long inputs from the BookSum dataset, without any input truncation at test time. We demonstrate that Unlimiformer improves pretrained models such as BART and Longformer by extending them to unlimited inputs without additional learned weights and without modifying their code.

long-range transformer, name change, unlimiformer, (2 more...)

Technology: Information Technology > Artificial Intelligence (0.64)

Neural Information Processing SystemsJan-19-2025, 05:03:28 GMT

Unlimiformer: Long-Range Transformers with Unlimited Length Input

Since the proposal of transformers, these models have been limited to bounded input lengths, because of their need to attend to every token in the input. In this work, we propose Unlimiformer: a general approach that wraps any existing pretrained encoder-decoder transformer, and offloads the cross-attention computation to a single k -nearest-neighbor ( k NN) index, while the returned k NN distances are the attention dot-product scores. This k NN index can be kept on either the GPU or CPU memory and queried in sub-linear time; this way, we can index practically unlimited input sequences, while every attention head in every decoder layer retrieves its top- k keys, instead of attending to every key. We evaluate Unlimiformer on several long-document and book-summarization benchmarks, showing that it can process even **500k** token-long inputs from the BookSum dataset, without any input truncation at test time. We demonstrate that Unlimiformer improves pretrained models such as BART and Longformer by extending them to unlimited inputs without additional learned weights and without modifying their code.

long-range transformer, unlimiformer, unlimited length input

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.63)

arXiv.org Artificial IntelligenceJan-3-2025

End-to-End Long Document Summarization using Gradient Caching

Saxena, Rohit, Tang, Hao, Keller, Frank

Training transformer-based encoder-decoder models for long document summarization poses a significant challenge due to the quadratic memory consumption during training. Several approaches have been proposed to extend the input length at test time, but training with these approaches is still difficult, requiring truncation of input documents and causing a mismatch between training and test conditions. In this work, we propose CachED (Gradient $\textbf{Cach}$ing for $\textbf{E}$ncoder-$\textbf{D}$ecoder models), an approach that enables end-to-end training of existing transformer-based encoder-decoder models, using the entire document without truncation. Specifically, we apply non-overlapping sliding windows to input documents, followed by fusion in decoder. During backpropagation, the gradients are cached at the decoder and are passed through the encoder in chunks by re-computing the hidden vectors, similar to gradient checkpointing. In the experiments on long document summarization, we extend BART to CachED BART, processing more than 500K tokens during training and achieving superior performance without using any additional parameters.

large language model, machine learning, natural language, (19 more...)

2501.01805

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > Mexico > Mexico City > Mexico City (0.04)
North America > Canada > Ontario > Toronto (0.04)
(11 more...)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Avram, Andrei-Marius, Timpuriu, Mircea, Iuga, Andreea, Matei, Vlad-Cristian, Tăiatu, Iulian-Marius, Găină, Tudor, Cercel, Dumitru-Clementin, Pop, Florin, Cercel, Mihaela-Claudia

RoLargeSum: A Large Dialect-Aware Romanian News Dataset for Summary, Headline, and Keyword Generation

arXiv.org Artificial IntelligenceDec-15-2024

Using supervised automatic summarisation methods requires sufficient corpora that include pairs of documents and their summaries. Similarly to many tasks in natural language processing, most of the datasets available for summarization are in English, posing challenges for developing summarization models in other languages. Thus, in this work, we introduce RoLargeSum, a novel large-scale summarization dataset for the Romanian language crawled from various publicly available news websites from Romania and the Republic of Moldova that were thoroughly cleaned to ensure a high-quality standard. RoLargeSum contains more than 615K news articles, together with their summaries, as well as their headlines, keywords, dialect, and other metadata that we found on the targeted websites. We further evaluated the performance of several BART variants and open-source large language models on RoLargeSum for benchmarking purposes. We manually evaluated the results of the best-performing system to gain insight into the potential pitfalls of this data set and future development.

large language model, machine learning, natural language, (20 more...)

2412.11317

Country:

Europe > Moldova (0.25)
Europe > Romania > București - Ilfov Development Region > Municipality of Bucharest > Bucharest (0.04)
Asia > Russia > Far Eastern Federal District > Sakha Republic (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Government (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Chhibbar, Naman, Kalita, Jugal

Automatic Summarization of Long Documents

arXiv.org Artificial IntelligenceOct-8-2024

A vast amount of textual data is added to the internet daily, making utilization and interpretation of such data difficult and cumbersome. As a result, automatic text summarization is crucial for extracting relevant information, saving precious reading time. Although many transformer-based models excel in summarization, they are constrained by their input size, preventing them from processing texts longer than their context size. This study introduces three novel algorithms that allow any LLM to efficiently overcome its input size limitation, effectively utilizing its full potential without any architectural modifications. We test our algorithms on texts with more than 70,000 words, and our experiments show a significant increase in BERTScore with competitive ROUGE scores.

dataset, summarization, transformer, (14 more...)

2410.05903

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Colorado > El Paso County > Colorado Springs (0.04)
North America > United States > New York > New York County > New York City (0.04)
(6 more...)

Genre: Research Report (1.00)

Industry: Law (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

arXiv.org Artificial IntelligenceOct-2-2024

On The Adaptation of Unlimiformer for Decoder-Only Transformers

Ahrabian, Kian, Benhaim, Alon, Patra, Barun, Pujara, Jay, Singhal, Saksham, Song, Xia

One of the prominent issues stifling the current generation of large language models is their limited context length. Recent proprietary models such as GPT-4 and Claude 2 have introduced longer context lengths, 8k/32k and 100k, respectively; however, despite the efforts in the community, most common models, such as LLama-2, have a context length of 4k or less. Unlimiformer (Bertsch et al., 2023) is a recently popular vector-retrieval augmentation method that offloads cross-attention computations to a kNN index. However, its main limitation is incompatibility with decoder-only transformers out of the box. In this work, we explore practical considerations of adapting Unlimiformer to decoder-only transformers and introduce a series of modifications to overcome this limitation. Moreover, we expand the original experimental setup on summarization to include a new task (i.e., free-form Q&A) and an instruction-tuned model (i.e., a custom 6.7B GPT model). Our results showcase the effectiveness of these modifications on summarization, performing on par with a model with 2x the context length. Moreover, we discuss limitations and future directions for free-form Q&A and instruction-tuned models.

context length, dataset, transformer, (14 more...)

2410.01637

Country:

North America > United States > California (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.04)
(4 more...)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Santosh, T. Y. S. S, Aly, Mahmoud, Grabmair, Matthias

LexAbSumm: Aspect-based Summarization of Legal Decisions

arXiv.org Artificial IntelligenceMar-31-2024

Legal professionals frequently encounter long legal judgments that hold critical insights for their work. While recent advances have led to automated summarization solutions for legal documents, they typically provide generic summaries, which may not meet the diverse information needs of users. To address this gap, we introduce LexAbSumm, a novel dataset designed for aspect-based summarization of legal case decisions, sourced from the European Court of Human Rights jurisdiction. We evaluate several abstractive summarization models tailored for longer documents on LexAbSumm, revealing a challenge in conditioning these models to produce aspect-specific summaries. We release LexAbSum to facilitate research in aspect-based summarization for legal domain.

computational linguistic, proceedings, summarization, (11 more...)

2404.00594

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
South America > Brazil (0.04)
Asia > Singapore (0.04)
(5 more...)

Genre: Research Report (0.40)

Industry: Law > Civil Rights & Constitutional Law (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Bertsch, Amanda, Alon, Uri, Neubig, Graham, Gormley, Matthew R.

Unlimiformer: Long-Range Transformers with Unlimited Length Input

arXiv.org Artificial IntelligenceOct-30-2023

Since the proposal of transformers, these models have been limited to bounded input lengths, because of their need to attend to every token in the input. In this work, we propose Unlimiformer: a general approach that wraps any existing pretrained encoder-decoder transformer, and offloads the cross-attention computation to a single k-nearest-neighbor (kNN) index, while the returned kNN distances are the attention dot-product scores. This kNN index can be kept on either the GPU or CPU memory and queried in sub-linear time; this way, we can index practically unlimited input sequences, while every attention head in every decoder layer retrieves its top-k keys, instead of attending to every key. We evaluate Unlimiformer on several long-document and book-summarization benchmarks, showing that it can process even 500k token-long inputs from the BookSum dataset, without any input truncation at test time. We demonstrate that Unlimiformer improves pretrained models such as BART and Longformer by extending them to unlimited inputs without additional learned weights and without modifying their code. We make our code and models publicly available at https://github.com/abertsch72/unlimiformer .

computational linguistic, dataset, unlimiformer, (15 more...)

2305.01625

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.05)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(10 more...)

Genre:

Personal (0.67)
Research Report (0.64)

Industry:

Law (0.67)
Health & Medicine > Therapeutic Area (0.46)
Government > Military (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.86)