AITopics | eos token

Collaborating Authors

eos token

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Diffusion LLM with Native Variable Generation Lengths: Let [EOS] Lead the Way

Yang, Yicun, Wang, Cong, Wang, Shaobo, Wen, Zichen, Qi, Biqing, Xu, Hanlin, Zhang, Linfeng

arXiv.org Artificial IntelligenceOct-29-2025

Diffusion-based large language models (dLLMs) have exhibited substantial potential for parallel text generation, which may enable more efficient generation compared to autoregressive models. However, current dLLMs suffer from fixed generation lengths, which indicates the generation lengths of dLLMs have to be determined before decoding as a hyper-parameter, leading to issues in efficiency and flexibility. To solve these problems, in this work, we propose to train a diffusion LLM with native variable generation lengths, abbreviated as dLLM-Var. Concretely, we aim to train a model to accurately predict the [EOS] token in the generated text, which makes a dLLM be able to natively infer in a block diffusion manner, while still maintaining the ability of global bi-directional (full) attention and high parallelism. Experiments on standard benchmarks demonstrate that our method achieves a 30.1x speedup over traditional dLLM inference paradigms and a 2.4x speedup relative to autoregressive models such as Qwen and Llama. Our method achieves higher accuracy and faster inference, elevating dLLMs beyond mere academic novelty and supporting their practical use in real-world applications. Codes and models have been released.

artificial intelligence, large language model, natural language, (14 more...)

arXiv.org Artificial Intelligence

2510.24605

Country: Asia (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)

Add feedback

Fast and Fluent Diffusion Language Models via Convolutional Decoding and Rejective Fine-tuning

Seo, Yeongbin, Lee, Dongha, Kim, Jaehyung, Yeo, Jinyoung

arXiv.org Artificial IntelligenceOct-27-2025

Autoregressive (AR) language models generate text one token at a time, which limits their inference speed. Diffusion-based language models offer a promising alternative, as they can decode multiple tokens in parallel. However, we identify a key bottleneck in current diffusion LMs: the long decoding-window problem, where tokens generated far from the input context often become irrelevant or repetitive. Previous solutions like semi-autoregressive address this issue by splitting windows into blocks (sacrificing bidirectionality), but we find that this also leads to time-interval expansion problem, sacrificing the speed. Therefore, semi-AR eliminates the main advantages of diffusion models. To overcome this, we propose Convolutional decoding (Conv), a normalization-based method that narrows the decoding window without hard segmentation, leading to better fluency and flexibility. Additionally, we introduce Rejecting Rule-based Fine-Tuning (R2FT), a post-hoc training scheme that better aligns tokens at positions far from context. Our methods achieve state-of-the-art results on open-ended generation benchmarks (e.g., AlpacaEval) among diffusion LM baselines, with significantly lower step size than previous works, demonstrating both speed and quality improvements.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2509.15188

Country: North America > United States (0.67)

Genre:

Workflow (1.00)
Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

The Devil is in the EOS: Sequence Training for Detailed Image Captioning

Mohamed, Abdelrahman, Kementchedjhieva, Yova

arXiv.org Artificial IntelligenceAug-22-2025

Despite significant advances in vision-language models (VLMs), image captioning often suffers from a lack of detail, with base models producing short, generic captions. This limitation persists even though VLMs are equipped with strong vision and language backbones. While supervised data and complex reward functions have been proposed to improve detailed image captioning, we identify a simpler underlying issue: a bias towards the end-of-sequence (EOS) token, which is introduced during cross-entropy training. We propose an unsupervised method to debias the model's tendency to predict the EOS token prematurely. By reducing this bias, we encourage the generation of longer, more detailed captions without the need for intricate reward functions or supervision. Our approach is straightforward, effective, and easily applicable to any pretrained model. We demonstrate its effectiveness through experiments with three VLMs and on three detailed captioning benchmarks. Our results show a substantial increase in caption length and relevant details, albeit with an expected increase in the rate of hallucinations.

caption, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2507.20077

Country: Europe > Switzerland (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Controlling Summarization Length Through EOS Token Weighting

Belligoli, Zeno, Stergiadis, Emmanouil, Fainman, Eran, Gusev, Ilya

arXiv.org Artificial IntelligenceJun-6-2025

Controlling the length of generated text can be crucial in various text-generation tasks, including summarization. Existing methods often require complex model alterations, limiting compatibility with pre-trained models. We address these limitations by developing a simple approach for controlling the length of automatic text summaries by increasing the importance of correctly predicting the EOS token in the cross-entropy loss computation. The proposed methodology is agnostic to architecture and decoding algorithms and orthogonal to other inference-time techniques to control the generation length. We tested it with encoder-decoder and modern GPT-style LLMs, and show that this method can control generation length, often without affecting the quality of the summary.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2506.05017

Country: North America > United States (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)

Add feedback

Enhancing Jailbreak Attack Against Large Language Models through Silent Tokens

Yu, Jiahao, Luo, Haozheng, Hu, Jerry Yao-Chieh, Guo, Wenbo, Liu, Han, Xing, Xinyu

arXiv.org Artificial IntelligenceJun-4-2024

Along with the remarkable successes of Language language models, recent research also started to explore the security threats of LLMs, including jailbreaking attacks. Attackers carefully craft jailbreaking prompts such that a target LLM will respond to the harmful question. Existing jailbreaking attacks require either human experts or leveraging complicated algorithms to craft jailbreaking prompts. In this paper, we introduce BOOST, a simple attack that leverages only the eos tokens. We demonstrate that rather than constructing complicated jailbreaking prompts, the attacker can simply append a few eos tokens to the end of a harmful question. It will bypass the safety alignment of LLMs and lead to successful jailbreaking attacks. We further apply BOOST to four representative jailbreak methods and show that the attack success rates of these methods can be significantly enhanced by simply adding eos tokens to the prompt. To understand this simple but novel phenomenon, we conduct empirical analyses. Our analysis reveals that adding eos tokens makes the target LLM believe the input is much less harmful, and eos tokens have low attention values and do not affect LLM's understanding of the harmful questions, leading the model to actually respond to the questions. Our findings uncover how fragile an LLM is against jailbreak attacks, motivating the development of strong safety alignment approaches.

arxiv preprint arxiv, eos token, llm, (14 more...)

arXiv.org Artificial Intelligence

2405.20653

Country:

North America > United States > California > Santa Barbara County > Santa Barbara (0.14)
Asia > Japan (0.04)
North America > United States > Illinois > Cook County > Evanston (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization

Huang, Shengyi, Noukhovitch, Michael, Hosseini, Arian, Rasul, Kashif, Wang, Weixun, Tunstall, Lewis

arXiv.org Artificial IntelligenceMar-23-2024

This work is the first to openly reproduce the Reinforcement Learning from Human Feedback (RLHF) scaling behaviors reported in OpenAI's seminal TL;DR summarization work (Stiennon et al., 2020). We create an RLHF pipeline from scratch, enumerate over 20 key implementation details, and share key insights during the reproduction. Our RLHF-trained Pythia models demonstrate significant gains in response quality that scale with model size with our 2.8B, 6.9B models outperforming OpenAI's released 1.3B checkpoint.

dataset, model response, token length, (16 more...)

arXiv.org Artificial Intelligence

2403.17031

Country:

Asia > Middle East > Jordan (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > Monaco (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre:

Research Report (0.50)
Personal (0.46)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.45)

Add feedback

SlothSpeech: Denial-of-service Attack Against Speech Recognition Models

Haque, Mirazul, Shah, Rutvij, Chen, Simin, Şişman, Berrak, Liu, Cong, Yang, Wei

arXiv.org Artificial IntelligenceJun-1-2023

Deep Learning (DL) models have been popular nowadays to execute different speech-related tasks, including automatic speech recognition (ASR). As ASR is being used in different real-time scenarios, it is important that the ASR model remains efficient against minor perturbations to the input. Hence, evaluating efficiency robustness of the ASR model is the need of the hour. We show that popular ASR models like Speech2Text model and Whisper model have dynamic computation based on different inputs, causing dynamic efficiency. In this work, we propose SlothSpeech, a denial-of-service attack against ASR models, which exploits the dynamic behaviour of the model. SlothSpeech uses the probability distribution of the output text tokens to generate perturbations to the audio such that efficiency of the ASR model is decreased. We find that SlothSpeech generated inputs can increase the latency up to 40X times the latency induced by benign input.

asr model, robustness, slothspeech, (16 more...)

arXiv.org Artificial Intelligence

2306.00794

Country:

North America > United States > Texas (0.04)
North America > United States > California > Riverside County > Riverside (0.04)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.94)

Add feedback

Speeding up text generation with non-autoregressive language models

#artificialintelligenceJan-10-2023, 22:00:10 GMT

Large Language Models (LLMs) for generating text have recently exploded in popularity. In recent weeks, millions of users have experimented with OpenAI's ChatGPT model for tasks ranging from writing college essays to generating code. These models, however, come with a trade-off -- they are expensive and slow to run. Over the past several months, the team at Unstructured has focused on optimizing Vision Transformers (ViTs) as encoders and transformer decoders for text generation. Our goal is to convert PDFs and images to structured formats, such as JSON, fast enough for industrial use cases.

large language model, machine learning, natural language, (16 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback