AITopics | Large Language Model

Collaborating Authors

Large Language Model

News Overviews Instructional Materials AI-Alerts Classics

Generalized Discrete Diffusion from Snapshots

Zekri, Oussama, Uscidda, Théo, Boullé, Nicolas, Korba, Anna

arXiv.org Machine LearningMar-24-2026

We introduce Generalized Discrete Diffusion from Snapshots (GDDS), a unified framework for discrete diffusion modeling that supports arbitrary noising processes over large discrete state spaces. Our formulation encompasses all existing discrete diffusion approaches, while allowing significantly greater flexibility in the choice of corruption dynamics. The forward noising process relies on uniformization and enables fast arbitrary corruption. For the reverse process, we derive a simple evidence lower bound (ELBO) based on snapshot latents, instead of the entire noising path, that allows efficient training of standard generative modeling architectures with clear probabilistic interpretation. Our experiments on large-vocabulary discrete generation tasks suggest that the proposed framework outperforms existing discrete diffusion methods in terms of training efficiency and generation quality, and beats autoregressive models for the first time at this scale. We provide the code along with a blog post on the project page : \href{https://oussamazekri.fr/gdds}{https://oussamazekri.fr/gdds}.

large language model, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

2603.21342

Country:

Asia > Middle East > Saudi Arabia (0.04)
Asia > Middle East > Syria (0.04)
North America > United States > Illinois (0.04)
(11 more...)

Genre: Research Report (1.00)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Law (0.92)
(3 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.87)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Meet the Gods of AI Warfare

WIREDMar-23-2026, 10:00:00 GMT

In its early days, the AI initiative known as Project Maven had its fair share of skeptics at the Pentagon. Today, many of them are true believers. The rise of AI warfare speaks to the biggest moral and practical question there is: Who--or what--gets to decide to take a human life? And who bears that cost? In 2018, more than 3,000 Google workers protested the company's involvement in "the business of war" after finding out the company was part of Project Maven, then a nascent Pentagon effort to use computer vision to rifle through copious video footage taken in America's overseas drone wars. They feared Project Maven's AI could one day be used for lethal targeting. In my yearslong effort to uncover the full story of Project Maven for my book,, I learned that is exactly what happened, and that the undertaking was just as controversial inside the Pentagon. Today, the tool known as Maven Smart System is being used in US operations against Iran . How the US military's top brass moved from skepticism about the use of AI in war to true believers has a lot to do with a Marine colonel named Drew Cukor. In early September 2024, during the cocktail hour at a private retreat for tech investors and defense leaders, Vice Admiral Frank "Trey" Whitworth found his way to Drew Cukor. Now Project Maven's founding leader and his skeptical successor were standing face-to-face. Three years earlier, Whitworth had been the Pentagon's top military official for intelligence, advising the chairman of the Joint Chiefs of Staff and running one of the most sensitive and potentially lethal parts of any military process: targeting.

artificial intelligence, large language model, natural language, (16 more...)

WIRED

Country:

Asia > Middle East > Iran (0.25)
Asia > Middle East > Yemen (0.14)
Asia > China (0.14)
(39 more...)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Government > Military (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

The AI Race Is Pressuring Utilities to Squeeze More From Europe's Power Grids

WIREDMar-23-2026, 09:00:00 GMT

The AI Race Is Pressuring Utilities to Squeeze More From Europe's Power Grids As data center developers queue up to connect to power grids across Europe, network operators are experimenting with novel ways of clearing room for them. European countries are racing to bring new data centers online as AI labs across the globe continue to demand more compute. The primary limiting factor is energy--and specifically, the ability to move it. Though Europe is on track to generate enough energy, utilities experts say, grid operators broadly lack the infrastructure needed to transport it to where it needs to go. That's throttling grid capacity and, by extension, the number of new power-hungry data centers that can connect without risking blackouts.

infrastructure, large language model, machine learning, (23 more...)

WIRED

Country:

Asia > Middle East > Iran (0.15)
Europe > United Kingdom > England (0.15)
North America > United States > California > San Francisco County > San Francisco (0.04)
(6 more...)

Industry:

Information Technology > Services (1.00)
Energy > Power Industry (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Cloud Computing (1.00)
Information Technology > Communications > Social Media (0.72)
(3 more...)

Add feedback

Deep Autocorrelation Modeling for Time-Series Forecasting: Progress and Prospects

Wang, Hao, Pan, Licheng, Wen, Qingsong, Yu, Jialin, Chen, Zhichao, Zheng, Chunyuan, Li, Xiaoxi, Chu, Zhixuan, Xu, Chao, Gong, Mingming, Li, Haoxuan, Lu, Yuan, Lin, Zhouchen, Torr, Philip, Liu, Yan

arXiv.org Machine LearningMar-23-2026

Autocorrelation is a defining characteristic of time-series data, where each observation is statistically dependent on its predecessors. In the context of deep time-series forecasting, autocorrelation arises in both the input history and the label sequences, presenting two central research challenges: (1) designing neural architectures that model autocorrelation in history sequences, and (2) devising learning objectives that model autocorrelation in label sequences. Recent studies have made strides in tackling these challenges, but a systematic survey examining both aspects remains lacking. To bridge this gap, this paper provides a comprehensive review of deep time-series forecasting from the perspective of autocorrelation modeling. In contrast to existing surveys, this work makes two distinctive contributions. First, it proposes a novel taxonomy that encompasses recent literature on both model architectures and learning objectives -- whereas prior surveys neglect or inadequately discuss the latter aspect. Second, it offers a thorough analysis of the motivations, insights, and progression of the surveyed literature from a unified, autocorrelation-centric perspective, providing a holistic overview of the evolution of deep time-series forecasting. The full list of papers and resources is available at https://github.com/Master-PLC/Awesome-TSF-Papers.

forecasting, large language model, machine learning, (18 more...)

arXiv.org Machine Learning

2603.19899

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
North America > United States > California > San Diego County > San Diego (0.04)

Genre:

Research Report (1.00)
Overview (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)

Add feedback

BAKU: An Efficient Transformer for Multi-Task Policy Learning

Neural Information Processing SystemsMar-22-2026, 22:54:06 GMT

Training generalist agents capable of solving diverse tasks is challenging, often requiring large datasets of expert demonstrations. This is particularly problematic in robotics, where each data point requires physical execution of actions in the real world. Thus, there is a pressing need for architectures that can effectively leverage the available training data. In this work, we present BAKU, a simple transformer architecture that enables efficient learning of multi-task robot policies. BAKU builds upon recent advancements in offline imitation learning and meticulously combines observation trunks, action chunking, multi-sensory observations, and action heads to substantially improve upon prior work.

large language model, machine learning, natural language, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback

CLUES: Collaborative Private-domain High-quality Data Selection for LLMs via Training Dynamics

Neural Information Processing SystemsMar-22-2026, 22:54:03 GMT

Recent research has highlighted the importance of data quality in scaling large language models (LLMs). However, automated data quality control faces unique challenges in collaborative settings where sharing is not allowed directly between data silos. To tackle this issue, this paper proposes a novel data quality control technique based on the notion of data influence on the training dynamics of LLMs, that high quality data are more likely to have similar training dynamics to the anchor dataset. We then leverage the influence of the training dynamics to select high-quality data from different private domains, with centralized model updates on the server side in a collaborative training fashion by either model merging or federated learning. As for the data quality indicator, we compute the per-sample gradients with respect to the private data and the anchor dataset, and use the trace of the accumulated inner products as a measurement of data quality. In addition, we develop a quality control evaluation tailored for collaborative settings with heterogeneous medical domain data. Experiments show that training on the high-quality data selected by our method can often outperform other data selection methods for collaborative fine-tuning of LLMs, across diverse private domain datasets, in medical, multilingual and financial settings.

artificial intelligence, large language model, natural language, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data

Neural Information Processing SystemsMar-22-2026, 22:52:58 GMT

One way to address safety risks from large language models (LLMs) is to censor dangerous knowledge from their training data. While this removes the explicit information, implicit information can remain scattered across various training documents. Could an LLM infer the censored knowledge by piecing together these implicit hints? As a step towards answering this question, we study inductive out-of-context reasoning (OOCR), a type of generalization in which LLMs infer latent information from evidence distributed across training documents and apply it to downstream tasks without in-context learning. Using a suite of five tasks, we demonstrate that frontier LLMs can perform inductive OOCR. In one experiment we finetune an LLM on a corpus consisting only of distances between an unknown city and other known cities. Remarkably, without in-context examples or Chain of Thought, the LLM can verbalize that the unknown city is Paris and use this fact to answer downstream questions. Further experiments show that LLMs trained only on individual coin flip outcomes can verbalize whether the coin is biased, and those trained only on pairs $(x,f(x))$ can articulate a definition of $f$ and compute inverses. While OOCR succeeds in a range of cases, we also show that it is unreliable, particularly for smaller LLMs learning complex structures. Overall, the ability of LLMs to connect the dots without explicit in-context learning poses a potential obstacle to monitoring and controlling the knowledge acquired by LLMs.

artificial intelligence, large language model, natural language, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

MKGL: Mastery of a Three-Word Language

Neural Information Processing SystemsMar-22-2026, 22:52:41 GMT

Large language models (LLMs) have significantly advanced performance across a spectrum of natural language processing (NLP) tasks. Yet, their application to knowledge graphs (KGs), which describe facts in the form of triplets and allow minimal hallucinations, remains an underexplored frontier. In this paper, we investigate the integration of LLMs with KGs by introducing a specialized KG Language (KGL), where a sentence precisely consists of an entity noun, a relation verb, and ends with another entity noun. Despite KGL's unfamiliar vocabulary to the LLM, we facilitate its learning through a tailored dictionary and illustrative sentences, and enhance context understanding via real-time KG context retrieval and KGL token embedding augmentation. Our results reveal that LLMs can achieve fluency in KGL, drastically reducing errors compared to conventional KG embedding methods on KG completion. Furthermore, our enhanced LLM shows exceptional competence in generating accurate three-word sentences from an initial entity and interpreting new unseen terms out of KGs.

artificial intelligence, large language model, natural language, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

MiniCache: KV Cache Compression in Depth Dimension for Large Language Models

Neural Information Processing SystemsMar-22-2026, 22:51:37 GMT

A critical approach for efficiently deploying computationally demanding large language models (LLMs) is Key-Value (KV) caching. The KV cache stores key-value states of previously generated tokens, significantly reducing the need for repetitive computations and thereby lowering latency in autoregressive generation. However, the size of the KV cache grows linearly with sequence length, posing challenges for applications requiring long context input and extensive sequence generation. In this paper, we present a simple yet effective approach, called MiniCache, to compress the KV cache across layers from a novel depth perspective, significantly reducing the memory footprint for LLM inference. Our approach is based on the observation that KV cache states exhibit high similarity between the adjacent layers in the middle-to-deep portion of LLMs.

artificial intelligence, large language model, natural language, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Probing Social Bias in Labor Market Text Generation by ChatGPT: A Masked Language Model Approach

Neural Information Processing SystemsMar-22-2026, 22:51:25 GMT

As generative large language models (LLMs) such as ChatGPT gain widespread adoption in various domains, their potential to propagate and amplify social biases, particularly in high-stakes areas such as the labor market, has become a pressing concern. AI algorithms are not only widely used in the selection of job applicants, individual job seekers may also make use of generative LLMs to help develop their job application materials. Against this backdrop, this research builds on a novel experimental design to examine social biases within ChatGPT-generated job applications in response to real job advertisements. By simulating the process of job application creation, we examine the language patterns and biases that emerge when the model is prompted with diverse job postings. Notably, we present a novel bias evaluation framework based on Masked Language Models to quantitatively assess social bias based on validated inventories of social cues/words, enabling a systematic analysis of the language used. Our findings show that the increasing adoption of generative AI, not only by employers but also increasingly by individual job seekers, can reinforce and exacerbate gender and social inequalities in the labor market through the use of biased and gendered language.

large language model, machine learning, natural language, (9 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.59)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.85)

Add feedback