Goto

Collaborating Authors

 dashwood


End-to-End Long Document Summarization using Gradient Caching

Saxena, Rohit, Tang, Hao, Keller, Frank

arXiv.org Artificial Intelligence

Training transformer-based encoder-decoder models for long document summarization poses a significant challenge due to the quadratic memory consumption during training. Several approaches have been proposed to extend the input length at test time, but training with these approaches is still difficult, requiring truncation of input documents and causing a mismatch between training and test conditions. In this work, we propose CachED (Gradient $\textbf{Cach}$ing for $\textbf{E}$ncoder-$\textbf{D}$ecoder models), an approach that enables end-to-end training of existing transformer-based encoder-decoder models, using the entire document without truncation. Specifically, we apply non-overlapping sliding windows to input documents, followed by fusion in decoder. During backpropagation, the gradients are cached at the decoder and are passed through the encoder in chunks by re-computing the hidden vectors, similar to gradient checkpointing. In the experiments on long document summarization, we extend BART to CachED BART, processing more than 500K tokens during training and achieving superior performance without using any additional parameters.


BookSum: A Collection of Datasets for Long-form Narrative Summarization

Kryściński, Wojciech, Rajani, Nazneen, Agarwal, Divyansh, Xiong, Caiming, Radev, Dragomir

arXiv.org Artificial Intelligence

The majority of available text summarization datasets include short-form source documents that lack long-range causal and temporal dependencies, and often contain strong layout and stylistic biases. While relevant, such datasets will offer limited challenges for future generations of text summarization systems. We address these issues by introducing BookSum, a collection of datasets for long-form narrative summarization. Our dataset covers source documents from the literature domain, such as novels, plays and stories, and includes highly abstractive, human written summaries on three levels of granularity of increasing difficulty: paragraph-, chapter-, and book-level. The domain and structure of our dataset poses a unique set of challenges for summarization systems, which include: processing very long documents, non-trivial causal and temporal dependencies, and rich discourse structures. To facilitate future work, we trained and evaluated multiple extractive and abstractive summarization models as baselines for our dataset.


Thumb PC uses Google software to give computer vision to robots and drones

#artificialintelligence

A new USB stick computer uses Google's machine-learning software to give drones and robots the equivalent of a human eye, and add new smarts to cameras. It is instead designed to analyze pixels and provide the right context for images. Fathom provides the much-needed horsepower for devices like drones, robots and cameras to run computer vision applications like image recognition. These devices alone typically don't have the ability to run computer vision applications. Fathom uses an embedded version of Google's TensorFlow machine learning software for vision processing.