longt5
Closing the gap between open-source and commercial large language models for medical evidence summarization
Zhang, Gongbo, Jin, Qiao, Zhou, Yiliang, Wang, Song, Idnay, Betina R., Luo, Yiming, Park, Elizabeth, Nestor, Jordan G., Spotnitz, Matthew E., Soroush, Ali, Campion, Thomas, Lu, Zhiyong, Weng, Chunhua, Peng, Yifan
Large language models (LLMs) hold great promise in summarizing medical evidence. Most recent studies focus on the application of proprietary LLMs. Using proprietary LLMs introduces multiple risk factors, including a lack of transparency and vendor dependency. While open-source LLMs allow better transparency and customization, their performance falls short compared to proprietary ones. In this study, we investigated to what extent fine-tuning open-source LLMs can further improve their performance in summarizing medical evidence. Utilizing a benchmark dataset, MedReview, consisting of 8,161 pairs of systematic reviews and summaries, we fine-tuned three broadly-used, open-sourced LLMs, namely PRIMERA, LongT5, and Llama-2. Overall, the fine-tuned LLMs obtained an increase of 9.89 in ROUGE-L (95% confidence interval: 8.94-10.81), 13.21 in METEOR score (95% confidence interval: 12.05-14.37), and 15.82 in CHRF score (95% confidence interval: 13.89-16.44). The performance of fine-tuned LongT5 is close to GPT-3.5 with zero-shot settings. Furthermore, smaller fine-tuned models sometimes even demonstrated superior performance compared to larger zero-shot models. The above trends of improvement were also manifested in both human and GPT4-simulated evaluations. Our results can be applied to guide model selection for tasks demanding particular domain knowledge, such as medical evidence summarization.
- North America > United States > Texas > Travis County > Austin (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > New York > New York County > New York City (0.05)
- (4 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Towards Optimizing and Evaluating a Retrieval Augmented QA Chatbot using LLMs with Human in the Loop
Afzal, Anum, Kowsik, Alexander, Fani, Rajna, Matthes, Florian
Large Language Models have found application in various mundane and repetitive tasks including Human Resource (HR) support. We worked with the domain experts of SAP SE to develop an HR support chatbot as an efficient and effective tool for addressing employee inquiries. We inserted a human-in-the-loop in various parts of the development cycles such as dataset collection, prompt optimization, and evaluation of generated output. By enhancing the LLM-driven chatbot's response quality and exploring alternative retrieval methods, we have created an efficient, scalable, and flexible tool for HR professionals to address employee inquiries effectively. Our experiments and evaluation conclude that GPT-4 outperforms other models and can overcome inconsistencies in data through internal reasoning capabilities. Additionally, through expert analysis, we infer that reference-free evaluation metrics such as G-Eval and Prometheus demonstrate reliability closely aligned with that of human evaluation.
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- (2 more...)
LOCOST: State-Space Models for Long Document Abstractive Summarization
Bronnec, Florian Le, Duong, Song, Ravaut, Mathieu, Allauzen, Alexandre, Chen, Nancy F., Guigue, Vincent, Lumbreras, Alberto, Soulier, Laure, Gallinari, Patrick
State-space models are a low-complexity alternative to transformers for encoding long sequences and capturing long-term dependencies. We propose LOCOST: an encoder-decoder architecture based on state-space models for conditional text generation with long context inputs. With a computational complexity of $O(L \log L)$, this architecture can handle significantly longer sequences than state-of-the-art models that are based on sparse attention patterns. We evaluate our model on a series of long document abstractive summarization tasks. The model reaches a performance level that is 93-96% comparable to the top-performing sparse transformers of the same size while saving up to 50% memory during training and up to 87% during inference. Additionally, LOCOST effectively handles input texts exceeding 600K tokens at inference time, setting new state-of-the-art results on full-book summarization and opening new perspectives for long input processing.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- (5 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.93)
Document Structure in Long Document Transformers
Buchmann, Jan, Eichler, Max, Bodensohn, Jan-Micha, Kuznetsov, Ilia, Gurevych, Iryna
Long documents often exhibit structure with hierarchically organized elements of different functions, such as section headers and paragraphs. Despite the omnipresence of document structure, its role in natural language processing (NLP) remains opaque. Do long-document Transformer models acquire an internal representation of document structure during pre-training? How can structural information be communicated to a model after pre-training, and how does it influence downstream performance? To answer these questions, we develop a novel suite of probing tasks to assess structure-awareness of long-document Transformers, propose general-purpose structure infusion methods, and evaluate the effects of structure infusion on QASPER and Evidence Inference, two challenging long-document NLP tasks. Results on LED and LongT5 suggest that they acquire implicit understanding of document structure during pre-training, which can be further enhanced by structure infusion, leading to improved end-task performance. To foster research on the role of document structure in NLP modeling, we make our data and code publicly available.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Italy > Tuscany > Florence (0.04)
- Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
- (12 more...)
mLongT5: A Multilingual and Efficient Text-To-Text Transformer for Longer Sequences
Uthus, David, Ontañón, Santiago, Ainslie, Joshua, Guo, Mandy
We present our work on developing a multilingual, efficient text-to-text transformer that is suitable for handling long inputs. This model, called mLongT5, builds upon the architecture of LongT5, while leveraging the multilingual datasets used for pretraining mT5 and the pretraining tasks of UL2. We evaluate this model on a variety of multilingual summarization and question-answering tasks, and the results show stronger performance for mLongT5 when compared to existing multilingual models such as mBART or M-BERT.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.05)
- North America > United States > Washington > King County > Seattle (0.04)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
RISE: Leveraging Retrieval Techniques for Summarization Evaluation
Evaluating automatically-generated text summaries is a challenging task. While there have been many interesting approaches, they still fall short of human evaluations. We present RISE, a new approach for evaluating summaries by leveraging techniques from information retrieval. RISE is first trained as a retrieval task using a dual-encoder retrieval setup, and can then be subsequently utilized for evaluating a generated summary given an input document, without gold reference summaries. RISE is especially well suited when working on new datasets where one may not have reference summaries available for evaluation. We conduct comprehensive experiments on the SummEval benchmark (Fabbri et al., 2021) and the results show that RISE has higher correlation with human evaluations compared to many past approaches to summarization evaluation. Furthermore, RISE also demonstrates data-efficiency and generalizability across languages.
- Europe > United Kingdom > England > Lincolnshire > Scunthorpe (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- Europe > Italy > Tuscany > Florence (0.04)
- (10 more...)
Google's CoLT5 Processes Extremely Long Inputs via Conditional Computation
One of the highlights of OpenAI's GPT-4 large language model (LLM) is its expanded context window size of 32,000 tokens (about 25,000 words), which enables longer input sequences and conversations than ChatGPT's 4,000 token limit. While expanding the processing capacities of transformer-based LLMs in this way is beneficial, it is also computationally costly due to the quadratic complexity of the models' attention mechanisms and the application of feedforward and projection layers to every token. A Google Research team addresses this issue in the new paper CoLT5: Faster Long-Range Transformers with Conditional Computation, proposing CoLT5 (Conditional LongT5), a family of transformer models that apply a novel conditional computation approach for higher quality and faster long-input processing of up to 64,000 tokens. CoLT5 is built on Google's LongT5 (Gua et al., 2022), which simultaneously scales input length and model size to improve long-input processing in transformers; and is inspired by the idea that better performance and reduced computation cost can be achieved via a novel "conditional computation" approach that allocates more computation to important tokens. The conditional computation mechanism comprises three main components: 1) Routing modules, which select important tokens at each attention or feedforward layer; 2) A conditional feedforward layer that applies an additional high-capacity feedforward layer to select important routed tokens; and 3) A conditional attention layer that enables CoLT5 to differentiate between tokens that require additional information and those that already possess such information.