AITopics | longt5

Collaborating Authors

longt5

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Closing the gap between open-source and commercial large language models for medical evidence summarization

Zhang, Gongbo, Jin, Qiao, Zhou, Yiliang, Wang, Song, Idnay, Betina R., Luo, Yiming, Park, Elizabeth, Nestor, Jordan G., Spotnitz, Matthew E., Soroush, Ali, Campion, Thomas, Lu, Zhiyong, Weng, Chunhua, Peng, Yifan

arXiv.org Artificial IntelligenceJul-25-2024

Large language models (LLMs) hold great promise in summarizing medical evidence. Most recent studies focus on the application of proprietary LLMs. Using proprietary LLMs introduces multiple risk factors, including a lack of transparency and vendor dependency. While open-source LLMs allow better transparency and customization, their performance falls short compared to proprietary ones. In this study, we investigated to what extent fine-tuning open-source LLMs can further improve their performance in summarizing medical evidence. Utilizing a benchmark dataset, MedReview, consisting of 8,161 pairs of systematic reviews and summaries, we fine-tuned three broadly-used, open-sourced LLMs, namely PRIMERA, LongT5, and Llama-2. Overall, the fine-tuned LLMs obtained an increase of 9.89 in ROUGE-L (95% confidence interval: 8.94-10.81), 13.21 in METEOR score (95% confidence interval: 12.05-14.37), and 15.82 in CHRF score (95% confidence interval: 13.89-16.44). The performance of fine-tuned LongT5 is close to GPT-3.5 with zero-shot settings. Furthermore, smaller fine-tuned models sometimes even demonstrated superior performance compared to larger zero-shot models. The above trends of improvement were also manifested in both human and GPT4-simulated evaluations. Our results can be applied to guide model selection for tasks demanding particular domain knowledge, such as medical evidence summarization.

evaluation, fine-tuning, summarization, (15 more...)

arXiv.org Artificial Intelligence

2408.00588

Country:

North America > United States > Texas > Travis County > Austin (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York > New York County > New York City (0.05)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Pediatrics/Neonatology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Towards Optimizing and Evaluating a Retrieval Augmented QA Chatbot using LLMs with Human in the Loop

Afzal, Anum, Kowsik, Alexander, Fani, Rajna, Matthes, Florian

arXiv.org Artificial IntelligenceJul-8-2024

Large Language Models have found application in various mundane and repetitive tasks including Human Resource (HR) support. We worked with the domain experts of SAP SE to develop an HR support chatbot as an efficient and effective tool for addressing employee inquiries. We inserted a human-in-the-loop in various parts of the development cycles such as dataset collection, prompt optimization, and evaluation of generated output. By enhancing the LLM-driven chatbot's response quality and exploring alternative retrieval methods, we have created an efficient, scalable, and flexible tool for HR professionals to address employee inquiries effectively. Our experiments and evaluation conclude that GPT-4 outperforms other models and can overcome inconsistencies in data through internal reasoning capabilities. Additionally, through expert analysis, we infer that reference-free evaluation metrics such as G-Eval and Prometheus demonstrate reliability closely aligned with that of human evaluation.

dataset, domain expert, evaluation, (16 more...)

arXiv.org Artificial Intelligence

2407.05925

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

LOCOST: State-Space Models for Long Document Abstractive Summarization

Bronnec, Florian Le, Duong, Song, Ravaut, Mathieu, Allauzen, Alexandre, Chen, Nancy F., Guigue, Vincent, Lumbreras, Alberto, Soulier, Laure, Gallinari, Patrick

arXiv.org Artificial IntelligenceJan-31-2024

State-space models are a low-complexity alternative to transformers for encoding long sequences and capturing long-term dependencies. We propose LOCOST: an encoder-decoder architecture based on state-space models for conditional text generation with long context inputs. With a computational complexity of $O(L \log L)$, this architecture can handle significantly longer sequences than state-of-the-art models that are based on sparse attention patterns. We evaluate our model on a series of long document abstractive summarization tasks. The model reaches a performance level that is 93-96% comparable to the top-performing sparse transformers of the same size while saving up to 50% memory during training and up to 87% during inference. Additionally, LOCOST effectively handles input texts exceeding 600K tokens at inference time, setting new state-of-the-art results on full-book summarization and opening new perspectives for long input processing.

architecture, computational linguistic, transformer, (15 more...)

arXiv.org Artificial Intelligence

2401.17919

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
(5 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry: Health & Medicine > Therapeutic Area (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

Add feedback

Document Structure in Long Document Transformers

Buchmann, Jan, Eichler, Max, Bodensohn, Jan-Micha, Kuznetsov, Ilia, Gurevych, Iryna

arXiv.org Artificial IntelligenceJan-31-2024

Long documents often exhibit structure with hierarchically organized elements of different functions, such as section headers and paragraphs. Despite the omnipresence of document structure, its role in natural language processing (NLP) remains opaque. Do long-document Transformer models acquire an internal representation of document structure during pre-training? How can structural information be communicated to a model after pre-training, and how does it influence downstream performance? To answer these questions, we develop a novel suite of probing tasks to assess structure-awareness of long-document Transformers, propose general-purpose structure infusion methods, and evaluate the effects of structure infusion on QASPER and Evidence Inference, two challenging long-document NLP tasks. Results on LED and LongT5 suggest that they acquire implicit understanding of document structure during pre-training, which can be further enhanced by structure infusion, leading to improved end-task performance. To foster research on the role of document structure in NLP modeling, we make our data and code publicly available.

computational linguistic, document structure, longt5, (16 more...)

arXiv.org Artificial Intelligence

2401.17658

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Italy > Tuscany > Florence (0.04)
Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
(12 more...)

Genre: Research Report > New Finding (0.93)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

mLongT5: A Multilingual and Efficient Text-To-Text Transformer for Longer Sequences

Uthus, David, Ontañón, Santiago, Ainslie, Joshua, Guo, Mandy

arXiv.org Artificial IntelligenceOct-26-2023

We present our work on developing a multilingual, efficient text-to-text transformer that is suitable for handling long inputs. This model, called mLongT5, builds upon the architecture of LongT5, while leveraging the multilingual datasets used for pretraining mT5 and the pretraining tasks of UL2. We evaluate this model on a variety of multilingual summarization and question-answering tasks, and the results show stronger performance for mLongT5 when compared to existing multilingual models such as mBART or M-BERT.

computational linguistic, dataset, multilingual, (12 more...)

arXiv.org Artificial Intelligence

2305.11129

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.05)
North America > United States > Washington > King County > Seattle (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)

Genre: Research Report > New Finding (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

RISE: Leveraging Retrieval Techniques for Summarization Evaluation

Uthus, David, Ni, Jianmo

arXiv.org Artificial IntelligenceMay-22-2023

Evaluating automatically-generated text summaries is a challenging task. While there have been many interesting approaches, they still fall short of human evaluations. We present RISE, a new approach for evaluating summaries by leveraging techniques from information retrieval. RISE is first trained as a retrieval task using a dual-encoder retrieval setup, and can then be subsequently utilized for evaluating a generated summary given an input document, without gold reference summaries. RISE is especially well suited when working on new datasets where one may not have reference summaries available for evaluation. We conduct comprehensive experiments on the SummEval benchmark (Fabbri et al., 2021) and the results show that RISE has higher correlation with human evaluations compared to many past approaches to summarization evaluation. Furthermore, RISE also demonstrates data-efficiency and generalizability across languages.

information retrieval, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2212.08775

Country:

Europe > United Kingdom > England > Lincolnshire > Scunthorpe (0.04)
North America > United States > Washington > King County > Seattle (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(10 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.54)

Add feedback

Google's CoLT5 Processes Extremely Long Inputs via Conditional Computation

#artificialintelligenceApr-10-2023, 18:01:05 GMT

One of the highlights of OpenAI's GPT-4 large language model (LLM) is its expanded context window size of 32,000 tokens (about 25,000 words), which enables longer input sequences and conversations than ChatGPT's 4,000 token limit. While expanding the processing capacities of transformer-based LLMs in this way is beneficial, it is also computationally costly due to the quadratic complexity of the models' attention mechanisms and the application of feedforward and projection layers to every token. A Google Research team addresses this issue in the new paper CoLT5: Faster Long-Range Transformers with Conditional Computation, proposing CoLT5 (Conditional LongT5), a family of transformer models that apply a novel conditional computation approach for higher quality and faster long-input processing of up to 64,000 tokens. CoLT5 is built on Google's LongT5 (Gua et al., 2022), which simultaneously scales input length and model size to improve long-input processing in transformers; and is inspired by the idea that better performance and reduced computation cost can be achieved via a novel "conditional computation" approach that allocates more computation to important tokens. The conditional computation mechanism comprises three main components: 1) Routing modules, which select important tokens at each attention or feedforward layer; 2) A conditional feedforward layer that applies an additional high-capacity feedforward layer to select important routed tokens; and 3) A conditional attention layer that enables CoLT5 to differentiate between tokens that require additional information and those that already possess such information.

colt5, conditional computation, feedforward layer, (7 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback