text generation task
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Asia > China (0.04)
- Europe > Italy > Tuscany > Florence (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (15 more...)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.96)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.55)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.55)
Lift Yourself Up: Retrieval-augmented Text Generation with Self-Memory
With direct access to human-written reference as memory, retrieval-augmented generation has achieved much progress in a wide range of text generation tasks. Since better memory would typically prompt better generation (we define this as primal problem). The traditional approach for memory retrieval involves selecting memory that exhibits the highest similarity to the input. However, this method is constrained by the quality of the fixed corpus from which memory is retrieved. In this paper, by exploring the duality of the primal problem: better generation also prompts better memory, we propose a novel framework, selfmem, which addresses this limitation by iteratively employing a retrieval-augmented generator to create an unbounded memory pool and using a memory selector to choose one output as memory for the subsequent generation round. This enables the model to leverage its own output, referred to as self-memory, for improved generation. We evaluate the effectiveness of selfmem on three distinct text generation tasks: neural machine translation, abstractive text summarization, and dialogue generation, under two generation paradigms: fine-tuned small model and few-shot LLM. Our approach achieves state-of-the-art results in four directions in JRC-Acquis translation dataset, 50.3 ROUGE-1 in XSum, and 62.9 ROUGE-1 in BigPatent, demonstrating the potential of self-memory in enhancing retrieval-augmented generation models. Furthermore, we conduct thorough analyses of each component in the selfmem framework to identify current system bottlenecks and provide insights for future research.
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.96)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.82)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
Beyond MLE: Convex Learning for Text Generation
Maximum likelihood estimation (MLE) is a statistical method used to estimate the parameters of a probability distribution that best explain the observed data. In the context of text generation, MLE is often used to train generative language models, which can then be used to generate new text. However, we argue that MLE is not always necessary and optimal, especially for closed-ended text generation tasks like machine translation. In these tasks, the goal of model is to generate the most appropriate response, which does not necessarily require it to estimate the entire data distribution with MLE. To this end, we propose a novel class of training objectives based on convex functions, which enables text generation models to focus on highly probable outputs without having to estimate the entire data distribution. We investigate the theoretical properties of the optimal predicted distribution when applying convex functions to the loss, demonstrating that convex functions can sharpen the optimal distribution, thereby enabling the model to better capture outputs with high probabilities. Experiments on various text generation tasks and models show the effectiveness of our approach. It enables autoregressive models to bridge the gap between greedy and beam search, and facilitates the learning of non-autoregressive models with a maximum improvement of 9+ BLEU points. Moreover, our approach also exhibits significant impact on large language models (LLMs), substantially enhancing their generative capability on various tasks.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.59)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.59)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.59)
- Europe > Italy > Tuscany > Florence (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (15 more...)
- North America > Canada (0.04)
- Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
- Asia > China > Anhui Province (0.04)
Quantum-Enhanced Natural Language Generation: A Multi-Model Framework with Hybrid Quantum-Classical Architectures
This paper presents a comprehensive evaluation of quantum text generation models against traditional Transformer/MLP architectures, addressing the growing interest in quantum computing applications for natural language processing. We conduct systematic experiments comparing five distinct models: Transformer (baseline), Quantum Kernel Self-Attention Network (QKSAN), Quantum RWKV (QRWKV), and Quantum Attention Sequence Architecture (QASA) across five diverse datasets including simple sentences, short stories, quantum phrases, haiku poetry, and proverbs. Our evaluation employs multiple metrics including perplexity, BLEU scores, vocabulary diversity, repetition rates, and fluency measures to assess different aspects of text generation quality. The experimental results reveal that while traditional Transformer models maintain overall superiority with the lowest average perplexity (1.21) and highest BLEU-1 score (0.2895), quantum-inspired models demonstrate competitive performance in specific scenarios. Notably, QKSAN achieves a competitive BLEU-1 score of 0.2800 while maintaining zero repetition rates, and QRWKV demonstrates perfect vocabulary diversity (Distinct-1 = 1.000) in certain tasks.
- North America > United States (0.16)
- Asia > Taiwan (0.04)
ReflectivePrompt: Reflective evolution in autoprompting algorithms
Zhuravlev, Viktor N., Khairullin, Artur R., Dyagin, Ernest A., Sitkina, Alena N., Kulin, Nikita I.
Autoprompting is the process of automatically selecting optimized prompts for language models, which has been gaining popularity with the rapid advancement of prompt engineering, driven by extensive research in the field of large language models (LLMs). This paper presents ReflectivePrompt - a novel autoprompting method based on evolutionary algorithms that employs a reflective evolution approach for more precise and comprehensive search of optimal prompts. ReflectivePrompt utilizes short-term and long-term reflection operations before crossover and elitist mutation to enhance the quality of the modifications they introduce. This method allows for the accumulation of knowledge obtained throughout the evolution process and updates it at each epoch based on the current population. ReflectivePrompt was tested on 33 datasets for classification and text generation tasks using open-access large language models: t-lite-instruct-0.1 and gemma3-27b-it. The method demonstrates, on average, a significant improvement (e.g., 28% on BBH compared to EvoPrompt) in metrics relative to current state-of-the-art approaches, thereby establishing itself as one of the most effective solutions in evolutionary algorithm-based autoprompting.
- North America > United States (0.04)
- Europe > Russia > Northwestern Federal District > Leningrad Oblast > Saint Petersburg (0.04)
- Asia > Russia (0.04)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
- Research Report > New Finding (0.46)
- Research Report > Promising Solution (0.34)
- Overview > Innovation (0.34)
TagRouter: Learning Route to LLMs through Tags for Open-Domain Text Generation Tasks
Chen, Zhou, Wei, Zhiqiang, Bai, Yuqi, Xiong, Xue, Wu, Jianmin
Model routing allocates queries to the suitable model, improving system performance while reducing costs. However, existing routing methods face practical limitations that hinder scalability in large-scale applications and struggle to keep up with the rapid growth of the large language model (LLM) ecosystem. To tackle these challenges, we propose TagRouter, a training-free model routing method designed to optimize the synergy among multiple LLMs for open-domain text generation tasks. Experimental results demonstrate that TagRouter outperforms 13 baseline methods, increasing the accept rate of system by 6.15% and reducing costs by 17.20%, achieving optimal cost-efficiency. Our findings provides the LLM community with an efficient and scalable solution for model ensembling, offering users an evolvable "super model."
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Mexico > Mexico City > Mexico City (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- (2 more...)
Document-Level Text Generation with Minimum Bayes Risk Decoding using Optimal Transport
Document-level text generation tasks are known to be more difficult than sentence-level text generation tasks as they require the understanding of longer context to generate high-quality texts. In this paper, we investigate the adaption of Minimum Bayes Risk (MBR) decoding for document-level text generation tasks. MBR decoding makes use of a utility function to estimate the output with the highest expected utility from a set of candidate outputs. Although MBR decoding is shown to be effective in a wide range of sentence-level text generation tasks, its performance on document-level text generation tasks is limited as many of the utility functions are designed for evaluating the utility of sentences. To this end, we propose MBR-OT, a variant of MBR decoding using Wasserstein distance to compute the utility of a document using a sentence-level utility function. The experimental result shows that the performance of MBR-OT outperforms that of the standard MBR in document-level machine translation, text simplification, and dense image captioning tasks. Our code is available at https://github.com/jinnaiyuu/mbr-optimal-transport
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (19 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- (2 more...)