AITopics | repetition penalty

Collaborating Authors

repetition penalty

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

LZ Penalty: An information-theoretic repetition penalty for autoregressive language models

Ginart, Antonio A., Kodali, Naveen, Lee, Jason, Xiong, Caiming, Savarese, Silvio, Emmons, John R.

arXiv.org Artificial IntelligenceAug-5-2025

We introduce the LZ penalty, a penalty specialized for reducing degenerate repetitions in autoregressive language models without loss of capability. The penalty is based on the codelengths in the LZ77 universal lossless compression algorithm. Through the lens of the prediction-compression duality, decoding the LZ penalty has the interpretation of sampling from the residual distribution after removing the information that is highly compressible. We demonstrate the LZ penalty enables state-of-the-art open-source reasoning models to operate with greedy (temperature zero) decoding without loss of capability and without instances of degenerate repetition. Both the industry-standard frequency penalty and repetition penalty are ineffective, incurring degenerate repetition rates of up to 4%.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2504.20131

Country: North America > United States > New Jersey > Hudson County > Hoboken (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)

Add feedback

Conversational Explanations: Discussing Explainable AI with Non-AI Experts

Zhang, Tong, Zhang, Mengao, Low, Wei Yan, Yang, X. Jessie, Li, Boyang

arXiv.org Artificial IntelligenceFeb-16-2025

Explainable AI (XAI) aims to provide insights into the decisions made by AI models. To date, most XAI approaches provide only one-time, static explanations, which cannot cater to users' diverse knowledge levels and information needs. Conversational explanations have been proposed as an effective method to customize XAI explanations. However, building conversational explanation systems is hindered by the scarcity of training data. Training with synthetic data faces two main challenges: lack of data diversity and hallucination in the generated data. To alleviate these issues, we introduce a repetition penalty to promote data diversity and exploit a hallucination detector to filter out untruthful synthetic conversation turns. We conducted both automatic and human evaluations on the proposed system, fEw-shot Multi-round ConvErsational Explanation (EMCEE). For automatic evaluation, EMCEE achieves relative improvements of 81.6% in BLEU and 80.5% in ROUGE compared to the baselines. EMCEE also mitigates the degeneration of data quality caused by training on synthetic data. In human evaluations (N=60), EMCEE outperforms baseline models and the control group in improving users' comprehension, acceptance, trust, and collaboration with static explanations by large margins. Through a fine-grained analysis of model responses, we further demonstrate that training on self-generated synthetic data improves the model's ability to generate more truthful and understandable answers, leading to better user interactions. To the best of our knowledge, this is the first conversational explanation method that can answer free-form user questions following static explanations.

explanation, participant, static explanation, (14 more...)

arXiv.org Artificial Intelligence

2503.16444

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
Asia > Singapore (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
(3 more...)

Add feedback

Demystifying Long Chain-of-Thought Reasoning in LLMs

Yeo, Edward, Tong, Yuxuan, Niu, Morry, Neubig, Graham, Yue, Xiang

arXiv.org Artificial IntelligenceFeb-5-2025

Scaling inference compute enhances reasoning in large language models (LLMs), with long chains-of-thought (CoTs) enabling strategies like backtracking and error correction. Reinforcement learning (RL) has emerged as a crucial method for developing these capabilities, yet the conditions under which long CoTs emerge remain unclear, and RL training requires careful design choices. In this study, we systematically investigate the mechanics of long CoT reasoning, identifying the key factors that enable models to generate long CoT trajectories. Through extensive supervised fine-tuning (SFT) and RL experiments, we present four main findings: (1) While SFT is not strictly necessary, it simplifies training and improves efficiency; (2) Reasoning capabilities tend to emerge with increased training compute, but their development is not guaranteed, making reward shaping crucial for stabilizing CoT length growth; (3) Scaling verifiable reward signals is critical for RL. We find that leveraging noisy, web-extracted solutions with filtering mechanisms shows strong potential, particularly for out-of-distribution (OOD) tasks such as STEM reasoning; and (4) Core abilities like error correction are inherently present in base models, but incentivizing these skills effectively for complex tasks via RL demands significant compute, and measuring their emergence requires a nuanced approach. These insights provide practical guidance for optimizing training strategies to enhance long CoT reasoning in LLMs. Our code is available at: https://github.com/eddycmu/demystify-long-cot.

demystifying long chain-of-thought reasoning, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2502.03373

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Genre: Research Report > New Finding (0.88)

Industry: Education (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Fluent Student-Teacher Redteaming

Thompson, T. Ben, Sklar, Michael

arXiv.org Artificial IntelligenceJul-24-2024

Many publicly available language models have been safety tuned to reduce the likelihood of toxic or liability-inducing text. Users or security analysts attempt to jailbreak or redteam these models with adversarial prompts which cause compliance with requests. One attack method is to apply discrete optimization techniques to the prompt. However, the resulting attack strings are often gibberish text, easily filtered by defenders due to high measured perplexity, and may fail for unseen tasks and/or well-tuned models. In this work, we improve existing algorithms (primarily GCG and BEAST) to develop powerful and fluent attacks on safety-tuned models like Llama-2 and Phi-3. Our technique centers around a new distillation-based approach that encourages the victim model to emulate a toxified finetune, either in terms of output probabilities or internal activations. To encourage human-fluent attacks, we add a multi-model perplexity penalty and a repetition penalty to the objective. We also enhance optimizer strength by allowing token insertions, token swaps, and token deletions and by using longer attack sequences. The resulting process is able to reliably jailbreak the most difficult target models with prompts that appear similar to human-written prompts. On Advbench we achieve attack success rates $>93$% for Llama-2-7B, Llama-3-8B, and Vicuna-7B, while maintaining model-measured perplexity $<33$; we achieve $95$% attack success for Phi-3, though with higher perplexity. We also find a universally-optimized single fluent prompt that induces $>88$% compliance on previously unseen tasks across Llama-2-7B, Phi-3-mini and Vicuna-7B and transfers to other black-box models.

arxiv preprint arxiv, language model, objective, (13 more...)

arXiv.org Artificial Intelligence

2407.17447

Genre: Research Report (0.82)

Industry:

Health & Medicine (0.93)
Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors

Dugan, Liam, Hwang, Alyssa, Trhlik, Filip, Ludan, Josh Magnus, Zhu, Andrew, Xu, Hainiu, Ippolito, Daphne, Callison-Burch, Chris

arXiv.org Artificial IntelligenceJun-10-2024

Many commercial and open-source models claim to detect machine-generated text with extremely high accuracy (99% or more). However, very few of these detectors are evaluated on shared benchmark datasets and even when they are, the datasets used for evaluation are insufficiently challenging-lacking variations in sampling strategy, adversarial attacks, and open-source generative models. In this work we present RAID: the largest and most challenging benchmark dataset for machine-generated text detection. RAID includes over 6 million generations spanning 11 models, 8 domains, 11 adversarial attacks and 4 decoding strategies. Using RAID, we evaluate the out-of-domain and adversarial robustness of 8 open- and 4 closed-source detectors and find that current detectors are easily fooled by adversarial attacks, variations in sampling strategies, repetition penalties, and unseen generative models. We release our data along with a leaderboard to encourage future research.

computational linguistic, dataset, detector, (15 more...)

arXiv.org Artificial Intelligence

2405.0794

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > Ontario > Toronto (0.04)
Asia > Singapore (0.04)
(12 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Media > News (1.00)
Leisure & Entertainment (1.00)
Information Technology > Security & Privacy (1.00)
Government (1.00)

Technology:

Information Technology > Software (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Social Media (1.00)
(5 more...)

Add feedback

Penalty Decoding: Well Suppress the Self-Reinforcement Effect in Open-Ended Text Generation

Zhu, Wenhong, Hao, Hongkun, Wang, Rui

arXiv.org Artificial IntelligenceOct-23-2023

The decoding algorithm is critical for open-ended text generation, transforming latent representations into coherent and meaningful outputs. This paper investigates the self-reinforcement effect in text generation and the effectiveness of a repetition penalty to mitigate it. However, determining the optimal repetition penalty value is challenging. To tackle this, we propose a forgetting mechanism that disregards distant tokens, reducing the burden of penalty selection. In addition, we introduce a length penalty to address overly short sentences caused by excessive penalties. Our penalty decoding approach incorporating three strategies helps resolve issues with sampling methods deviating from factual information. Experimental results demonstrate the efficacy of our approach in generating high-quality sentences resembling human output.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2310.14971

Country:

Asia > China > Shanghai > Shanghai (0.05)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
South America > Bolivia (0.04)
(8 more...)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.97)

Add feedback

DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models

Chuang, Yung-Sung, Xie, Yujia, Luo, Hongyin, Kim, Yoon, Glass, James, He, Pengcheng

arXiv.org Artificial IntelligenceSep-7-2023

Despite their impressive capabilities, large language models (LLMs) are prone to hallucinations, i.e., generating content that deviates from facts seen during pretraining. We propose a simple decoding strategy for reducing hallucinations with pretrained LLMs that does not require conditioning on retrieved external knowledge nor additional fine-tuning. Our approach obtains the next-token distribution by contrasting the differences in logits obtained from projecting the later layers versus earlier layers to the vocabulary space, exploiting the fact that factual knowledge in an LLMs has generally been shown to be localized to particular transformer layers. We find that this Decoding by Contrasting Layers (DoLa) approach is able to better surface factual knowledge and reduce the generation of incorrect facts. DoLa consistently improves the truthfulness across multiple choices tasks and open-ended generation tasks, for example improving the performance of LLaMA family models on TruthfulQA by 12-17% absolute points, demonstrating its potential in making LLMs reliably generate truthful facts.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2309.03883

Country:

Asia > India (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Portugal (0.04)
(9 more...)

Genre: Research Report > New Finding (0.46)

Industry: Energy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

How to Use GPT-J for (Almost) Any NLP Task

#artificialintelligenceMay-6-2022, 07:15:34 GMT

In a previous blog post we had a look at how we can set up our very own GPT-J Playground using Streamlit, Hugging Face, and Amazon SageMaker. With this playground we can now start experimenting with the model and generate some text, which is a lot of fun. But eventually we want the model to actually perform NLP tasks like translation, classification, and many more. In this blog post we will have a look how we can achieve that using different parameters and particular prompts for the GPT-J model. This blog post will build on this previous blog post and this Github repo and it is assumed that you have already built your own GPT-J playground.

blog post, repetition penalty, translation, (13 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Reinforced Medical Report Generation with X-Linear Attention and Repetition Penalty

Xu, Wenting, Qi, Chang, Xu, Zhenghua, Lukasiewicz, Thomas

arXiv.org Artificial IntelligenceNov-15-2020

To reduce doctors' workload, deep-learning-based automatic medical report generation has recently attracted more and more research efforts, where attention mechanisms and reinforcement learning are integrated with the classic encoder-decoder architecture to enhance the performance of deep models. However, these state-of-the-art solutions mainly suffer from two shortcomings: (i) their attention mechanisms cannot utilize high-order feature interactions, and (ii) due to the use of TF-IDF-based reward functions, these methods are fragile with generating repeated terms. Therefore, in this work, we propose a reinforced medical report generation solution with x-linear attention and repetition penalty mechanisms (ReMRG-XR) to overcome these problems. Specifically, x-linear attention modules are used to explore high-order feature interactions and achieve multi-modal reasoning, while repetition penalty is used to apply penalties to repeated terms during the model's training process. Extensive experimental studies have been conducted on two public datasets, and the results show that ReMRG-XR greatly outperforms the state-of-the-art baselines in terms of all metrics.

artificial intelligence, feature interaction, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2011.0768

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
Asia > China > Tianjin Province > Tianjin (0.04)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.48)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (0.98)
Health & Medicine > Health Care Technology > Medical Record (0.85)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback