AITopics | language modeling

Collaborating Authors

language modeling

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Augmenting Language Models with Long-Term Memory

Neural Information Processing SystemsApr-30-2026, 04:54:44 GMT

Existing large language models (LLMs) can only afford fix-sized inputs due to the input length limit, preventing them from utilizing rich long-context information from past inputs. To address this, we propose a framework, Language Models Augmented with Long-Term Memory (LONGMEM), which enables LLMs to memorize long history. We design a novel decoupled network architecture with the original backbone LLM frozen as a memory encoder and an adaptive residual side-network as a memory retriever and reader. Such a decoupled memory design can easily cache and update long-term past contexts for memory retrieval without suffering from memory staleness. Enhanced with memory-augmented adaptation training, LONGMEM can thus memorize long past context and use long-term memory for language modeling.

large language model, machine learning, natural language, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

e425b75bac5742a008d643826428787c-Paper-Conference.pdf

Neural Information Processing SystemsApr-30-2026, 02:28:46 GMT

large language model, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country: North America > United States > Minnesota (0.28)

Genre: Research Report (0.46)

Industry: Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

Add feedback

Textually Pretrained Speech Language Models

Neural Information Processing SystemsApr-29-2026, 17:50:20 GMT

Speech language models (SpeechLMs) process and generate acoustic data only, without textual supervision. In this work, we propose TWIST, a method for training SpeechLMs using a warm-start from a pretrained textual language models. We show using both automatic and human evaluations that TWIST outperforms a cold-start SpeechLM across the board. We empirically analyze the effect of different model design choices such as the speech tokenizer, the pretrained textual model, and the dataset size. We find that model and dataset scale both play an important role in constructing better-performing SpeechLMs. Based on our observations, we present the largest (to the best of our knowledge) SpeechLM both in terms of number of parameters and training data. We additionally introduce two spoken versions of the StoryCloze textual benchmark to further improve model evaluation and advance future research in the field. We make speech samples, code and models publicly available.2

arxiv preprint arxiv, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

Europe (0.67)
North America > United States > Minnesota (0.28)
North America > United States > California (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Leisure & Entertainment (0.68)
Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

2f3c6a4cd8af177f6456e7e51a916ff3-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 08:28:51 GMT

"Name" is the name of the operation in our search space. "TFFunction" is the TensorFlow function that the name is mapped to when a DNA instruction is being converted to a line of TensorFlow code. "Argument Mapping" describes how the values in a DNA's argument set are mapped to the corresponding TensorFlow function arguments. This vocabulary is largely constructed from the lowest level TF operations needed to create Transformers (see Appendix A.5). We also add commonly used math primitives such as SIN and ABS. Here we provide additional implementation details. Relative Dimensions: We use relative dimensions [13] instead of absolute dimensions for each instruction's "dimension size" argument. This allows us to resize the models to fit within our parameter limits (32M to 38M parameters). The vocabulary for these relative dimensions is [1, 2, 4, 8, 12, 16, 24, 32, 48, 64].

dim, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry: Energy > Power Industry (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.34)

Add feedback

Primer: Searching for Efficient Transformers for Language Modeling

Neural Information Processing SystemsApr-25-2026, 08:28:47 GMT

Large Transformer models have been central to recent advances in natural language processing. The training and inference costs of these models, however, have grown rapidly and become prohibitively expensive. Here we aim to reduce the costs of Transformers by searching for a more efficient variant. Compared to previous approaches, our search is performed at a lower level, over the primitives that define a Transformer TensorFlow program. We identify an architecture, named Primer, that has a smaller training cost than the original Transformer and other variants for auto-regressive language modeling.

large language model, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

1 Game Dataset 2 Language Dataset Online Game Pro Game General Text Wiki Puzzle Book

Neural Information Processing SystemsApr-25-2026, 07:08:22 GMT

When solving decision-making tasks, humans typically depend on information from two key sources: (1) Historical policy data, which provides interaction replay from the environment, and (2) Analytical insights in natural language form, exposing the invaluable thought process or strategic considerations. Despite this, the majority of preceding research focuses on only one source: they either use historical replay exclusively to directly learn policy or value functions, or engaged in language model training utilizing mere language corpus. In this paper, we argue that a powerful autonomous agent should cover both sources. Thus, we propose ChessGPT, a GPT model bridging policy learning and language modeling by integrating data from these two sources in Chess games. Specifically, we build a large-scale game and language dataset related to chess.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country: North America > United States > New York (0.28)

Genre: Research Report > New Finding (0.67)

Industry: Leisure & Entertainment > Games > Chess (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Games > Chess (1.00)

Add feedback

LightRNN: Memory and Computation-Efficient Recurrent Neural Networks

Xiang Li, Tao Qin, Jian Yang, Tie-Yan Liu

Neural Information Processing SystemsApr-22-2026, 01:54:12 GMT

Recurrent neural networks (RNNs) have achieved state-of-the-art performances in many natural language processing tasks, such as language modeling and machine translation. However, when the vocabulary is large, the RNN model will become very big (e.g., possibly beyond the memory capacity of a GPU device) and its training will become very inefficient. In this work, we propose a novel technique to tackle this challenge. The key idea is to use 2-Component (2C) shared embedding for word representations. We allocate every word in the vocabulary into a table, each row of which is associated with a vector, and each column associated with another vector.

artificial intelligence, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Genre: Research Report (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

In-Place Test-Time Training

Feng, Guhao, Luo, Shengjie, Hua, Kai, Zhang, Ge, He, Di, Huang, Wenhao, Cai, Tianle

arXiv.org Machine LearningApr-8-2026

The static ``train then deploy" paradigm fundamentally limits Large Language Models (LLMs) from dynamically adapting their weights in response to continuous streams of new information inherent in real-world tasks. Test-Time Training (TTT) offers a compelling alternative by updating a subset of model parameters (fast weights) at inference time, yet its potential in the current LLM ecosystem is hindered by critical barriers including architectural incompatibility, computational inefficiency and misaligned fast weight objectives for language modeling. In this work, we introduce In-Place Test-Time Training (In-Place TTT), a framework that seamlessly endows LLMs with Test-Time Training ability. In-Place TTT treats the final projection matrix of the ubiquitous MLP blocks as its adaptable fast weights, enabling a ``drop-in" enhancement for LLMs without costly retraining from scratch. Furthermore, we replace TTT's generic reconstruction objective with a tailored, theoretically-grounded objective explicitly aligned with the Next-Token-Prediction task governing autoregressive language modeling. This principled objective, combined with an efficient chunk-wise update mechanism, results in a highly scalable algorithm compatible with context parallelism. Extensive experiments validate our framework's effectiveness: as an in-place enhancement, it enables a 4B-parameter model to achieve superior performance on tasks with contexts up to 128k, and when pretrained from scratch, it consistently outperforms competitive TTT-related approaches. Ablation study results further provide deeper insights on our design choices. Collectively, our results establish In-Place TTT as a promising step towards a paradigm of continual learning in LLMs.

arxiv preprint arxiv, large language model, machine learning, (16 more...)

arXiv.org Machine Learning

2604.06169

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling

Neural Information Processing SystemsMar-22-2026, 22:51:18 GMT

Constrained by the separate encoding of vision and language, existing grounding and referring segmentation works heavily rely on bulky Transformer-based fusion en-/decoders and a variety of early-stage interaction technologies. Simultaneously, the current mask visual language modeling (MVLM) fails to capture the nuanced referential relationship between image-text in referring tasks.

artificial intelligence, machine learning, natural language, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.79)
Information Technology > Artificial Intelligence > Machine Learning (0.59)

Add feedback

Filters

Collaborating Authors

language modeling

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

16c0d78ef6a76b5c247113a4c9514059-Paper.pdf

Augmenting Language Models with Long-Term Memory

e425b75bac5742a008d643826428787c-Paper-Conference.pdf

Textually Pretrained Speech Language Models

2f3c6a4cd8af177f6456e7e51a916ff3-Supplemental.pdf

Primer: Searching for Efficient Transformers for Language Modeling

1 Game Dataset 2 Language Dataset Online Game Pro Game General Text Wiki Puzzle Book

LightRNN: Memory and Computation-Efficient Recurrent Neural Networks

In-Place Test-Time Training

OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling