eos token
Diffusion LLM with Native Variable Generation Lengths: Let [EOS] Lead the Way
Yang, Yicun, Wang, Cong, Wang, Shaobo, Wen, Zichen, Qi, Biqing, Xu, Hanlin, Zhang, Linfeng
Diffusion-based large language models (dLLMs) have exhibited substantial potential for parallel text generation, which may enable more efficient generation compared to autoregressive models. However, current dLLMs suffer from fixed generation lengths, which indicates the generation lengths of dLLMs have to be determined before decoding as a hyper-parameter, leading to issues in efficiency and flexibility. To solve these problems, in this work, we propose to train a diffusion LLM with native variable generation lengths, abbreviated as dLLM-Var. Concretely, we aim to train a model to accurately predict the [EOS] token in the generated text, which makes a dLLM be able to natively infer in a block diffusion manner, while still maintaining the ability of global bi-directional (full) attention and high parallelism. Experiments on standard benchmarks demonstrate that our method achieves a 30.1x speedup over traditional dLLM inference paradigms and a 2.4x speedup relative to autoregressive models such as Qwen and Llama. Our method achieves higher accuracy and faster inference, elevating dLLMs beyond mere academic novelty and supporting their practical use in real-world applications. Codes and models have been released.
Fast and Fluent Diffusion Language Models via Convolutional Decoding and Rejective Fine-tuning
Seo, Yeongbin, Lee, Dongha, Kim, Jaehyung, Yeo, Jinyoung
Autoregressive (AR) language models generate text one token at a time, which limits their inference speed. Diffusion-based language models offer a promising alternative, as they can decode multiple tokens in parallel. However, we identify a key bottleneck in current diffusion LMs: the long decoding-window problem, where tokens generated far from the input context often become irrelevant or repetitive. Previous solutions like semi-autoregressive address this issue by splitting windows into blocks (sacrificing bidirectionality), but we find that this also leads to time-interval expansion problem, sacrificing the speed. Therefore, semi-AR eliminates the main advantages of diffusion models. To overcome this, we propose Convolutional decoding (Conv), a normalization-based method that narrows the decoding window without hard segmentation, leading to better fluency and flexibility. Additionally, we introduce Rejecting Rule-based Fine-Tuning (R2FT), a post-hoc training scheme that better aligns tokens at positions far from context. Our methods achieve state-of-the-art results on open-ended generation benchmarks (e.g., AlpacaEval) among diffusion LM baselines, with significantly lower step size than previous works, demonstrating both speed and quality improvements.
- Asia > Middle East > Republic of Türkiye (0.04)
- North America > United States > Massachusetts (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe (0.04)
- Workflow (1.00)
- Research Report (0.83)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
The Devil is in the EOS: Sequence Training for Detailed Image Captioning
Mohamed, Abdelrahman, Kementchedjhieva, Yova
Despite significant advances in vision-language models (VLMs), image captioning often suffers from a lack of detail, with base models producing short, generic captions. This limitation persists even though VLMs are equipped with strong vision and language backbones. While supervised data and complex reward functions have been proposed to improve detailed image captioning, we identify a simpler underlying issue: a bias towards the end-of-sequence (EOS) token, which is introduced during cross-entropy training. We propose an unsupervised method to debias the model's tendency to predict the EOS token prematurely. By reducing this bias, we encourage the generation of longer, more detailed captions without the need for intricate reward functions or supervision. Our approach is straightforward, effective, and easily applicable to any pretrained model. We demonstrate its effectiveness through experiments with three VLMs and on three detailed captioning benchmarks. Our results show a substantial increase in caption length and relevant details, albeit with an expected increase in the rate of hallucinations.
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.47)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Controlling Summarization Length Through EOS Token Weighting
Belligoli, Zeno, Stergiadis, Emmanouil, Fainman, Eran, Gusev, Ilya
Controlling the length of generated text can be crucial in various text-generation tasks, including summarization. Existing methods often require complex model alterations, limiting compatibility with pre-trained models. We address these limitations by developing a simple approach for controlling the length of automatic text summaries by increasing the importance of correctly predicting the EOS token in the cross-entropy loss computation. The proposed methodology is agnostic to architecture and decoding algorithms and orthogonal to other inference-time techniques to control the generation length. We tested it with encoder-decoder and modern GPT-style LLMs, and show that this method can control generation length, often without affecting the quality of the summary.
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada (0.04)
- Asia > China > Hong Kong (0.04)
Enhancing Jailbreak Attack Against Large Language Models through Silent Tokens
Yu, Jiahao, Luo, Haozheng, Hu, Jerry Yao-Chieh, Guo, Wenbo, Liu, Han, Xing, Xinyu
Along with the remarkable successes of Language language models, recent research also started to explore the security threats of LLMs, including jailbreaking attacks. Attackers carefully craft jailbreaking prompts such that a target LLM will respond to the harmful question. Existing jailbreaking attacks require either human experts or leveraging complicated algorithms to craft jailbreaking prompts. In this paper, we introduce BOOST, a simple attack that leverages only the eos tokens. We demonstrate that rather than constructing complicated jailbreaking prompts, the attacker can simply append a few eos tokens to the end of a harmful question. It will bypass the safety alignment of LLMs and lead to successful jailbreaking attacks. We further apply BOOST to four representative jailbreak methods and show that the attack success rates of these methods can be significantly enhanced by simply adding eos tokens to the prompt. To understand this simple but novel phenomenon, we conduct empirical analyses. Our analysis reveals that adding eos tokens makes the target LLM believe the input is much less harmful, and eos tokens have low attention values and do not affect LLM's understanding of the harmful questions, leading the model to actually respond to the questions. Our findings uncover how fragile an LLM is against jailbreak attacks, motivating the development of strong safety alignment approaches.
- North America > United States > California > Santa Barbara County > Santa Barbara (0.14)
- Asia > Japan (0.04)
- North America > United States > Illinois > Cook County > Evanston (0.04)
The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization
Huang, Shengyi, Noukhovitch, Michael, Hosseini, Arian, Rasul, Kashif, Wang, Weixun, Tunstall, Lewis
This work is the first to openly reproduce the Reinforcement Learning from Human Feedback (RLHF) scaling behaviors reported in OpenAI's seminal TL;DR summarization work (Stiennon et al., 2020). We create an RLHF pipeline from scratch, enumerate over 20 key implementation details, and share key insights during the reproduction. Our RLHF-trained Pythia models demonstrate significant gains in response quality that scale with model size with our 2.8B, 6.9B models outperforming OpenAI's released 1.3B checkpoint.
- Asia > Middle East > Jordan (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Monaco (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- Research Report (0.50)
- Personal (0.46)
SlothSpeech: Denial-of-service Attack Against Speech Recognition Models
Haque, Mirazul, Shah, Rutvij, Chen, Simin, Şişman, Berrak, Liu, Cong, Yang, Wei
Deep Learning (DL) models have been popular nowadays to execute different speech-related tasks, including automatic speech recognition (ASR). As ASR is being used in different real-time scenarios, it is important that the ASR model remains efficient against minor perturbations to the input. Hence, evaluating efficiency robustness of the ASR model is the need of the hour. We show that popular ASR models like Speech2Text model and Whisper model have dynamic computation based on different inputs, causing dynamic efficiency. In this work, we propose SlothSpeech, a denial-of-service attack against ASR models, which exploits the dynamic behaviour of the model. SlothSpeech uses the probability distribution of the output text tokens to generate perturbations to the audio such that efficiency of the ASR model is decreased. We find that SlothSpeech generated inputs can increase the latency up to 40X times the latency induced by benign input.
- North America > United States > Texas (0.04)
- North America > United States > California > Riverside County > Riverside (0.04)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.94)
Speeding up text generation with non-autoregressive language models
Large Language Models (LLMs) for generating text have recently exploded in popularity. In recent weeks, millions of users have experimented with OpenAI's ChatGPT model for tasks ranging from writing college essays to generating code. These models, however, come with a trade-off -- they are expensive and slow to run. Over the past several months, the team at Unstructured has focused on optimizing Vision Transformers (ViTs) as encoders and transformer decoders for text generation. Our goal is to convert PDFs and images to structured formats, such as JSON, fast enough for industrial use cases.