AITopics | autoregressive language model

Collaborating Authors

autoregressive language model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Diffusion of Thought: Chain-of-Thought Reasoning in Diffusion Language Models

Neural Information Processing SystemsMar-22-2026, 07:50:36 GMT

Recently, diffusion models have garnered significant interest in the field of text processing due to their many potential advantages compared to conventional autoregressive models.In this work, we propose Diffusion-of-Thought (DoT), a novel approach that integrates diffusion models with Chain-of-Thought, a well-established technique for improving the reasoning ability of autoregressive language models. In contrast to autoregressive language models that make decisions in a left-to-right, token-by-token manner, DoT allows reasoning steps to diffuse over time through a diffusion language model and offers greater flexibility in trading-off computation for reasoning performance. Our experimental results demonstrate the effectiveness of DoT in multi-digit multiplication, boolean logic, and grade school math problems.

artificial intelligence, natural language, proceedings, (7 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.60)

Add feedback

Autoregressive Language Models are Secretly Energy-Based Models: Insights into the Lookahead Capabilities of Next-Token Prediction

Blondel, Mathieu, Sander, Michael E., Vivier-Ardisson, Germain, Liu, Tianlin, Roulet, Vincent

arXiv.org Machine LearningDec-18-2025

Autoregressive models (ARMs) currently constitute the dominant paradigm for large language models (LLMs). Energy-based models (EBMs) represent another class of models, which have historically been less prevalent in LLM development, yet naturally characterize the optimal policy in post-training alignment. In this paper, we provide a unified view of these two model classes. Taking the chain rule of probability as a starting point, we establish an explicit bijection between ARMs and EBMs in function space, which we show to correspond to a special case of the soft Bellman equation in maximum entropy reinforcement learning. Building upon this bijection, we derive the equivalence between supervised learning of ARMs and EBMs. Furthermore, we analyze the distillation of EBMs into ARMs by providing theoretical error bounds. Our results provide insights into the ability of ARMs to plan ahead, despite being based on the next-token prediction paradigm.

autoregressive language model, ebm, probability, (14 more...)

arXiv.org Machine Learning

2512.15605

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)

Add feedback

LZ Penalty: An information-theoretic repetition penalty for autoregressive language models

Ginart, Antonio A., Kodali, Naveen, Lee, Jason, Xiong, Caiming, Savarese, Silvio, Emmons, John R.

arXiv.org Artificial IntelligenceAug-5-2025

We introduce the LZ penalty, a penalty specialized for reducing degenerate repetitions in autoregressive language models without loss of capability. The penalty is based on the codelengths in the LZ77 universal lossless compression algorithm. Through the lens of the prediction-compression duality, decoding the LZ penalty has the interpretation of sampling from the residual distribution after removing the information that is highly compressible. We demonstrate the LZ penalty enables state-of-the-art open-source reasoning models to operate with greedy (temperature zero) decoding without loss of capability and without instances of degenerate repetition. Both the industry-standard frequency penalty and repetition penalty are ineffective, incurring degenerate repetition rates of up to 4%.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2504.20131

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)

Add feedback

Diffusion of Thought: Chain-of-Thought Reasoning in Diffusion Language Models

Neural Information Processing SystemsMay-27-2025, 14:54:15 GMT

chain-of-thought reasoning, diffusion language model, language model, (2 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.64)

Add feedback

Retrieval-Enhanced Named Entity Recognition

Shiraishi, Enzo, de Camargo, Raphael Y., Silva, Henrique L. P., Prati, Ronaldo C.

arXiv.org Artificial IntelligenceOct-16-2024

When combined with In-Context Learning, a technique that enables models to adapt to new tasks by incorporating task-specific examples or demonstrations directly within the input prompt, autoregressive language models have achieved good performance in a wide range of tasks and applications. However, this combination has not been properly explored in the context of named entity recognition, where the structure of this task poses unique challenges. We propose RENER (Retrieval-Enhanced Named Entity Recognition), a technique for named entity recognition using autoregressive language models based on In-Context Learning and information retrieval techniques. When presented with an input text, RENER fetches similar examples from a dataset of training examples that are used to enhance a language model to recognize named entities from this input text. RENER is modular and independent of the underlying language model and information retrieval algorithms. Experimental results show that in the CrossNER collection we achieve state-of-the-art performance with the proposed technique and that information retrieval can increase the F-score by up to 11 percentage points.

dataset, language model, retrieval-enhanced, (10 more...)

arXiv.org Artificial Intelligence

2410.13118

Country:

South America > Brazil > São Paulo (0.04)
North America > Mexico > Mexico City > Mexico City (0.04)
North America > Canada > Ontario > Toronto (0.04)
(6 more...)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Guiding Large Language Models to Generate Computer-Parsable Content

Wang, Jiaye

arXiv.org Artificial IntelligenceApr-21-2024

Large language models (LLMs) have demonstrated remarkable capabilities in learning patterns from massive text corpora, including word relationships, sentence structures, and even complex semantic and pragmatic information. However, it remains challenging to induce pre-trained language models to generate structured content that strictly follows specific conventions. We propose a scheme for guiding LLMs to generate highly usable content for computers without the need for fine-tuning and additional neural network inference, by introducing coroutine-based content generation constraints through a pre-agreed context-free grammar (CFG), which guides the autoregressive model Transformer to sample the correct tokens during its decoding phase to form a program-compliant form in the decoding phase of the autoregressive model Transformer to form a formal language that conforms to the program conventions. This will effectively improve the stability and consistency of LLMs in generating target data structures, types or instructions, and reduce the difficulty of application development and integration. We first conducted the matching bracket pairs experiment to verify that the error rate of models like GPT-2 and Gemma reaches 95% when the generated DSLs exceed lengths of 36 and 282 characters, respectively.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2404.05499

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Language Modelling Approaches to Adaptive Machine Translation

Moslem, Yasmin

arXiv.org Artificial IntelligenceJan-25-2024

Consistency is a key requirement of high-quality translation. It is especially important to adhere to pre-approved terminology and adapt to corrected translations in domain-specific projects. Machine translation (MT) has achieved significant progress in the area of domain adaptation. However, in-domain data scarcity is common in translation settings, due to the lack of specialised datasets and terminology, or inconsistency and inaccuracy of available in-domain translations. In such scenarios where there is insufficient in-domain data to fine-tune MT models, producing translations that are consistent with the relevant context is challenging. While real-time adaptation can make use of smaller amounts of in-domain data to improve the translation on the fly, it remains challenging due to supported context limitations and efficiency constraints. Large language models (LLMs) have recently shown interesting capabilities of in-context learning, where they learn to replicate certain input-output text generation patterns, without further fine-tuning. Such capabilities have opened new horizons for domain-specific data augmentation and real-time adaptive MT. This work attempts to address two main relevant questions: 1) in scenarios involving human interaction and continuous feedback, can we employ language models to improve the quality of adaptive MT at inference time? and 2) in the absence of sufficient in-domain data, can we use pre-trained large-scale language models to improve the process of MT domain adaptation?

bilingual synthetic data, domain terminology integration, generate diverse alternative, (13 more...)

arXiv.org Artificial Intelligence

2401.14559

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Portugal > Lisbon > Lisbon (0.13)
(43 more...)

Genre:

Workflow (1.00)
Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (0.92)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Law (0.92)
Information Technology (0.92)
Health & Medicine > Therapeutic Area > Immunology (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Higher-Order DeepTrails: Unified Approach to *Trails

Koopmann, Tobias, Pfister, Jan, Markus, André, Carolus, Astrid, Wienrich, Carolin, Hotho, Andreas

arXiv.org Artificial IntelligenceJan-8-2024

Analyzing, understanding, and describing human behavior is advantageous in different settings, such as web browsing or traffic navigation. Understanding human behavior naturally helps to improve and optimize the underlying infrastructure or user interfaces. Typically, human navigation is represented by sequences of transitions between states. Previous work suggests to use hypotheses, representing different intuitions about the navigation to analyze these transitions. To mathematically grasp this setting, first-order Markov chains are used to capture the behavior, consequently allowing to apply different kinds of graph comparisons, but comes with the inherent drawback of losing information about higher-order dependencies within the sequences. To this end, we propose to analyze entire sequences using autoregressive language models, as they are traditionally used to model higher-order dependencies in sequences. We show that our approach can be easily adapted to model different settings introduced in previous work, namely HypTrails, MixedTrails and even SubTrails, while at the same time bringing unique advantages: 1. Modeling higher-order dependencies between state transitions, while 2. being able to identify short comings in proposed hypotheses, and 3. naturally introducing a unified approach to model all settings. To show the expressiveness of our approach, we evaluate our approach on different synthetic datasets and conclude with an exemplary analysis of a real-world dataset, examining the behavior of users who interact with voice assistants.

hypothesis, sequence, user behavior, (16 more...)

arXiv.org Artificial Intelligence

2310.04477

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Italy > Tuscany > Florence (0.04)
Europe > Germany > Bavaria > Lower Franconia > Würzburg (0.04)
(6 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.95)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)
(2 more...)

Add feedback

Autoregressive Language Models For Estimating the Entropy of Epic EHR Audit Logs

Warner, Benjamin C., Kannampallil, Thomas, Kim, Seunghwan

arXiv.org Artificial IntelligenceNov-25-2023

EHR audit logs are a highly granular stream of events that capture clinician activities, and is a significant area of interest for research in characterizing clinician workflow on the electronic health record (EHR). Existing techniques to measure the complexity of workflow through EHR audit logs (audit logs) involve time- or frequency-based cross-sectional aggregations that are unable to capture the full complexity of a EHR session. We briefly evaluate the usage of transformer-based tabular language model (tabular LM) in measuring the entropy or disorderedness of action sequences within workflow and release the evaluated models publicly.

audit log, language model, workflow, (14 more...)

arXiv.org Artificial Intelligence

2311.06401

Country:

North America > United States (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Genre: Workflow (1.00)

Industry:

Health & Medicine > Health Care Technology > Medical Record (0.70)
Health & Medicine > Health Care Providers & Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

GPT-SW3: An Autoregressive Language Model for the Nordic Languages

Ekgren, Ariel, Gyllensten, Amaru Cuba, Stollenwerk, Felix, Öhman, Joey, Isbister, Tim, Gogoulou, Evangelia, Carlsson, Fredrik, Heiman, Alice, Casademont, Judit, Sahlgren, Magnus

arXiv.org Artificial IntelligenceMay-23-2023

We have faced all of these challenges in our work on developing the first native LLM for the There is a growing interest in building and applying Nordic (or, more accurately, North Germanic) languages. Large Language Models (LLMs) for languages The LLM, which we call GPT-SW3, other than English. This interest has is a continuation of our previous Swedish-only been fuelled partly by the unprecedented popularity model (Ekgren et al., 2022), and is a collection of ChatGPT

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2305.12987

Country:

North America > Cuba (0.04)
Europe > Sweden > Östergötland County > Linköping (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
(7 more...)

Genre: Research Report (0.40)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback