AITopics | Large Language Model

Collaborating Authors

Large Language Model

News Overviews Instructional Materials AI-Alerts Classics

Mystery AI model suspected to be DeepSeek V4 is revealed to be from Xiaomi

The Japan TimesMar-19-2026, 03:32:00 GMT

A powerful artificial intelligence model that appeared anonymously on a developer platform last week was revealed to be from Chinese smartphone and electric vehicle giant Xiaomi, and not DeepSeek as initially thought. BEIJING - A powerful artificial intelligence model that appeared anonymously on a developer platform last week was revealed on Wednesday to be from Chinese smartphone and electric vehicle giant Xiaomi, after it fueled speculation that startup DeepSeek was quietly testing its next-generation system ahead of a launch. The release of DeepSeek's low-cost models DeepSeek-V3 and R1 triggered a global tech stock selloff last year, causing investors to question whether U.S. AI firms needed to spend billions of dollars on AI computing power. Since then, there has been a great deal of interest in DeepSeek-V4, a next-generation model that has yet to be released. The mysterious free model, called Hunter Alpha, surfaced on the AI gateway platform OpenRouter on March 11 without any developer attribution and was later described by the platform as a "stealth model." In a time of both misinformation and too much information, quality journalism is more crucial than ever.

large language model, machine learning, natural language, (11 more...)

The Japan Times

Country:

Asia > Taiwan (0.44)
Asia > Middle East > Iran (0.44)
Asia > China > Beijing > Beijing (0.27)
(6 more...)

Industry:

Transportation > Ground > Road (0.99)
Transportation > Electric Vehicle (0.99)
Media > News (0.71)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MemoryFormer : Minimize Transformer Computation by Removing Fully-Connected Layers

Neural Information Processing SystemsMar-19-2026, 02:07:08 GMT

In order to reduce the computational complexity of large language models, great efforts have been made to to improve the efficiency of transformer models such as linear attention and flash-attention. However, the model size and corresponding computational complexity are constantly scaled up in pursuit of higher performance. In this work, we present MemoryFormer, a novel transformer architecture which significantly reduces the computational complexity (FLOPs) from a new perspective. We eliminate nearly all the computations of the transformer model except for the necessary computation required by the multi-head attention operation. This is made possible by utilizing an alternative method for feature transformation to replace the linear projection of fully-connected layers. Specifically, we first construct a group of in-memory lookup tables that store a large amount of discrete vectors to replace the weight matrix used in linear projection.

artificial intelligence, large language model, natural language, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.59)

Add feedback

Grounding Multimodal Large Language Models in Actions

Neural Information Processing SystemsMar-19-2026, 02:06:22 GMT

Multimodal Large Language Models (MLLMs) have demonstrated a wide range of capabilities across many domains including Embodied AI. In this work, we study how to best ground a MLLM into different embodiments and their associated action spaces, including both continuous and discrete actions. For continuous actions, a set of learned tokenizations that capture an action at various resolutions allows for sufficient modeling precision, yielding the best performance on downstream tasks. For discrete actions, semantically aligning these actions with the native output token space of the MLLM leads to the strongest performance. We arrive at these lessons via a thorough study of seven action grounding approaches on five different environments, encompassing over 114 embodied tasks.

artificial intelligence, large language model, natural language, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.32)

Add feedback

DropBP: Accelerating Fine-Tuning of Large Language Models by Dropping Backward Propagation

Neural Information Processing SystemsMar-19-2026, 02:06:01 GMT

Large language models (LLMs) have achieved significant success across various domains. However, training these LLMs typically involves substantial memory and computational costs during both forward and backward propagation. While parameter-efficient fine-tuning (PEFT) considerably reduces the training memory associated with parameters, it does not address the significant computational costs and activation memory. In this paper, we propose Dropping Backward Propagation (DropBP), a novel approach designed to reduce computational costs and activation memory while maintaining accuracy. DropBP randomly drops layers during backward propagation, which is essentially equivalent to training shallow submodules generated by undropped layers and residual connections. Additionally, DropBP calculates the sensitivity of each layer to assign an appropriate drop rate, thereby stabilizing the training process. DropBP is not only applicable to full fine-tuning but can also be orthogonally integrated with all types of PEFT by dropping layers during backward propagation. Specifically, DropBP can reduce training time by 44% with comparable accuracy to the baseline, accelerate convergence to the same perplexity by 1.5$\times$, and enable training with a sequence length 6.2$\times$ larger on a single NVIDIA-A100 GPU.

large language model, machine learning, natural language, (11 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.85)
Information Technology > Artificial Intelligence > Machine Learning (0.76)

Add feedback

Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving

Neural Information Processing SystemsMar-19-2026, 01:04:34 GMT

Today's best LLMs clearly possess some reasoning processes. The paper gives evidence that they also have metacognitive knowledge, including ability to name skills and procedures to apply given a task. We explore this primarily in context of math reasoning, developing a prompt-guided interaction procedure to get a powerful LLM to assign sensible skill labels to math questions, followed by having it perform semantic clustering to obtain coarser families of skill labels. These coarse skill labels look interpretable to humans.To validate that these skill labels are meaningful and relevant to the LLM's reasoning processes we perform the following experiments.

artificial intelligence, large language model, natural language, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)

Add feedback

Revisiting Few-Shot Object Detection with Vision-Language Models

Neural Information Processing SystemsMar-19-2026, 00:42:53 GMT

The era of vision-language models (VLMs) trained on web-scale datasets challenges conventional formulations of "open-world perception. In this work, we revisit the task of few-shot object detection (FSOD) in the context of recent foundational VLMs. First, we point out that zero-shot predictions from VLMs such as GroundingDINO significantly outperform state-of-the-art few-shot detectors (48 vs. 33 AP) on COCO. Despite their strong zero-shot performance, such foundation models may still be sub-optimal. For example, trucks on the web may be defined differently from trucks for a target applications such as autonomous vehicle perception.

artificial intelligence, large language model, natural language, (9 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (0.65)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.50)

Add feedback

OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI

Neural Information Processing SystemsMar-19-2026, 00:10:04 GMT

The evolution of Artificial Intelligence (AI) has been significantly accelerated by advancements in Large Language Models (LLMs) and Large Multimodal Models (LMMs), gradually showcasing potential cognitive reasoning abilities in problem-solving and scientific discovery (i.e., AI4Science) once exclusive to human intellect. To comprehensively evaluate current models' performance in cognitive reasoning abilities, we introduce OlympicArena, which includes 11,163 bilingual problems across both text-only and interleaved text-image modalities. These challenges encompass a wide range of disciplines spanning seven fields and 62 international Olympic competitions, rigorously examined for data leakage. We argue that the challenges in Olympic competition problems are ideal for evaluating AI's cognitive reasoning due to their complexity and interdisciplinary nature, which are essential for tackling complex scientific challenges and facilitating discoveries. Beyond evaluating performance across various disciplines using answer-only criteria, we conduct detailed experiments and analyses from multiple perspectives. We delve into the models' cognitive reasoning abilities, their performance across different modalities, and their outcomes in process-level evaluations, which are vital for tasks requiring complex reasoning with lengthy solutions. Our extensive evaluations reveal that even advanced models like GPT-4o only achieve a 39.97\% overall accuracy (28.67\% for mathematics and 29.71\% for physics), illustrating current AI limitations in complex reasoning and multimodal integration. Through the OlympicArena, we aim to advance AI towards superintelligence, equipping it to address more complex challenges in science and beyond. We also provide a comprehensive set of resources to support AI research, including a benchmark dataset, an open-source annotation platform, a detailed evaluation tool, and a leaderboard with automatic submission features.

artificial intelligence, large language model, natural language, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)

Add feedback

Pretrained Multilingual Transformers Reveal Quantitative Distance Between Human Languages

Zhao, Yue, Gu, Jiatao, Jeretič, Paloma, Su, Weijie

arXiv.org Machine LearningMar-19-2026

Understanding the distance between human languages is central to linguistics, anthropology, and tracing human evolutionary history. Yet, while linguistics has long provided rich qualitative accounts of cross-linguistic variation, a unified and scalable quantitative approach to measuring language distance remains lacking. In this paper, we introduce a method that leverages pretrained multilingual language models as systematic instruments for linguistic measurement. Specifically, we show that the spontaneously emerged attention mechanisms of these models provide a robust, tokenization-agnostic measure of cross-linguistic distance, termed Attention Transport Distance (ATD). By treating attention matrices as probability distributions and measuring their geometric divergence via optimal transport, we quantify the representational distance between languages during translation. Applying ATD to a large and diverse set of languages, we demonstrate that the resulting distances recover established linguistic groupings with high fidelity and reveal patterns aligned with geographic and contact-induced relationships. Furthermore, incorporating ATD as a regularizer improves transfer performance in low-resource machine translation. Our results establish a principled foundation for testing linguistic hypotheses using artificial neural networks. This framework transforms multilingual models into powerful tools for quantitative linguistic discovery, facilitating more equitable multilingual AI.

large language model, machine learning, natural language, (17 more...)

arXiv.org Machine Learning

2603.17912

Country:

Africa > Niger (0.06)
North America > United States > Pennsylvania (0.04)
Europe (0.04)
(5 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

WaterMax: breaking the LLM watermark detectability-robustness-quality trade-off

Neural Information Processing SystemsMar-18-2026, 23:42:47 GMT

Watermarking is a technical means to dissuade malfeasant usage of Large Language Models.This paper proposes a novel watermarking scheme, so-called WaterMax, that enjoys high detectability while sustaining the quality of the generated text of the original LLM.Its new design leaves the LLM untouched (no modification of the weights, logits or temperature).WaterMax balances robustness and computational complexity contrary to the watermarking techniques of the literature inherently provoking a trade-off between quality and robustness.Its performance is both theoretically proven and experimentally validated.It outperforms all the SotA techniques under the most complete benchmark suite.

large language model, natural language, proceedings, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.92)

Add feedback

Ad Auctions for LLMs via Retrieval Augmented Generation

Neural Information Processing SystemsMar-18-2026, 23:08:18 GMT

In the field of computational advertising, the integration of ads into the outputs of large language models (LLMs) presents an opportunity to support these services without compromising content integrity. This paper introduces novel auction mechanisms for ad allocation and pricing within the textual outputs of LLMs, leveraging retrieval-augmented generation (RAG). We propose a \emph{segment auction} where an ad is probabilistically retrieved for each discourse segment (paragraph, section, or entire output) according to its bid and relevance, following the RAG framework, and priced according to competing bids. We show that our auction maximizes logarithmic social welfare, a new notion of welfare that balances allocation efficiency and fairness, and we characterize the associated incentive-compatible pricing rule. These results are extended to multi-ad allocation per segment. An empirical evaluation validates the feasibility and effectiveness of our approach over several ad auction scenarios, and exhibits inherent tradeoffs in metrics as we allow the LLM more flexibility to allocate ads.

large language model, machine learning, natural language, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback