AITopics

Country: North America > Canada (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsDec-25-2025, 05:51:29 GMT

No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models

name change, revisiting efficient training algorithm, transformer-based language model, (3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.64)

Neural Information Processing SystemsDec-25-2025, 02:11:13 GMT

Ouroboros: On Accelerating Training of Transformer-Based Language Models

Language models are essential for natural language processing (NLP) tasks, such as machine translation and text summarization. Remarkable performance has been demonstrated recently across many NLP domains via a Transformer-based language model with over a billion parameters, verifying the benefits of model size. Model parallelism is required if a model is too large to fit in a single computing device. Current methods for model parallelism either suffer from backward locking in backpropagation or are not applicable to language models. We propose the first model-parallel algorithm that speeds the training of Transformer-based language models. We also prove that our proposed algorithm is guaranteed to converge to critical points for non-convex problems. Extensive experiments on Transformer and Transformer-XL language models demonstrate that the proposed algorithm obtains a much faster speedup beyond data parallelism, with comparable or better accuracy.

accelerating training, ouroboro, transformer-based language model, (5 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)

Neural Information Processing SystemsDec-24-2025, 09:32:49 GMT

Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping

Recently, Transformer-based language models have demonstrated remarkable performance across many NLP domains. However, the unsupervised pre-training step of these models suffers from unbearable overall computational expenses. Current methods for accelerating the pre-training either rely on massive parallelism with advanced hardware or are not applicable to language models. In this work, we propose a method based on progressive layer dropping that speeds the training of Transformer-based language models, not at the cost of excessive hardware resources but from model architecture change and training technique boosted efficiency. Extensive experiments on BERT show that the proposed method achieves a 25% reduction of computation cost in FLOPS and a 24% reduction in the end-to-end wall-clock training time. Furthermore, we show that our pre-trained models are equipped with strong knowledge transferability, achieving similar or even higher accuracy in downstream tasks to baseline models.

artificial intelligence, machine learning, natural language, (5 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsOct-2-2025, 06:47:26 GMT

Ouroboros: On Accelerating Training of Transformer-Based Language Models

Qian Yang, Zhouyuan Huo, Wenlin Wang, Lawrence Carin

We also prove that our proposed algorithm is guaranteed to converge to critical points for non-convex problems.

artificial intelligence, machine learning, natural language, (17 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceSep-3-2025

Structure and Destructure: Dual Forces in the Making of Knowledge Engines

Chen, Yihong

The making of knowledge engines in natural language processing has been shaped by two seemingly distinct paradigms: one grounded in structure, the other driven by massively available unstructured data. The structured paradigm leverages predefined symbolic interactions, such as knowledge graphs, as priors and designs models to capture them. In contrast, the unstructured paradigm centers on scaling transformer architectures with increasingly vast data and model sizes, as seen in modern large language models. Despite their divergence, this thesis seeks to establish conceptual connections bridging these paradigms. Two complementary forces, structure and destructure, emerge across both paradigms: structure organizes seen symbolic interactions, while destructure, through periodic embedding resets, improves model plasticity and generalization to unseen scenarios. These connections form a new recipe for developing general knowledge engines that can support transparent, controllable, and adaptable intelligent systems.

large language model, machine learning, natural language, (22 more...)

2509.00949

Country:

Asia (1.00)
North America > United States > New York (0.67)
North America > United States > California (0.67)
Europe > United Kingdom > England (0.45)

Genre:

Research Report > New Finding (1.00)
Personal > Honors (1.00)
Overview (1.00)

Industry:

Media > Film (1.00)
Leisure & Entertainment > Sports (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(8 more...)

Sikka, Varin, Sikka, Vishal

Hallucination Stations: On Some Basic Limitations of Transformer-Based Language Models

arXiv.org Artificial IntelligenceJul-16-2025

In this paper we explore hallucinations and related capability limitations in LLMs and LLM - based agents from the perspective of computational complexity . We show that beyond a certain complexity, LLMs are incapable of carrying out computational and agentic tasks or verifying the ir accuracy . Introduction With widespread adoption of transformer - based language models ("LLMs") in AI, there is significant interest in the limits of LLMs' capabilities, specifically so - called "hallucinations", occurrences in which LLMs provide spurious, factually incorrect or nonsensical [1, 2] information when prompted on certain subjects. Further more, there is growing interest in "agentic" uses of LLMs - that is, using LLMs to create "agents" that act autonomously or semi - autonomously to carry out various tasks, including tasks with applications in the real world. This makes it important to understand the types of tasks LLMs can and cannot perform.

large language model, machine learning, natural language, (18 more...)

2507.07505

Country: North America > United States (0.14)

Genre: Research Report (0.51)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceJun-27-2025

Progtuning: Progressive Fine-tuning Framework for Transformer-based Language Models

Ji, Xiaoshuang, Zhao, Zhendong, Chen, Xiaojun, Zhao, Xin, Liu, Zeyao

Fine-tuning is a promising technique for leveraging Transformer-based language models in downstream tasks. As model sizes continue to grow, updating all model parameters becomes increasingly costly. Parameter-efficient fine-tuning methods effectively address this issue by selectively updating a small subset of parameters. However, fine-tuning and most existing parameter-efficient fine-tuning methods require updating the same number of parameters as the initial size, ignoring the unequal contribution across Transformer blocks and leading to extremely inefficient allocation of computing resources. In this paper, we propose Progtuning, the novel fine-tuning framework combined with progressive learning for Transformer-based language models. Specifically, Progtuning progressively reduces the number of updated transformer blocks based on the contribution. Remarkably, Progtuning optimizes resource allocation and reduces the number of updated parameters by approximately 25\%, while still maintaining competitive performance. And it also exhibits high adaptability with parameter-efficient fine-tuning methods, demonstrating excellent performance across various adaptation scenarios.

large language model, machine learning, natural language, (17 more...)

2506.21119

Country: Asia > China (0.15)

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Levine, Lionel, Santerre, John, Young, Alex S., Levine, T. Barry, Campion, Francis, Sarrafzadeh, Majid

PRISM: A Transformer-based Language Model of Structured Clinical Event Data

arXiv.org Artificial IntelligenceJun-16-2025

--We introduce PRISM (Predictive Reasoning in Sequential Medicine), a transformer-based architecture designed to model the sequential progression of clinical decision-making processes. Unlike traditional approaches that rely on isolated diagnostic classification, PRISM frames clinical trajectories as tokenized sequences of events -- including diagnostic tests, laboratory results, and diagnoses -- and learns to predict the most probable next steps in the patient diagnostic journey. Leveraging a large custom clinical vocabulary and an autoregressive training objective, PRISM demonstrates the ability to capture complex dependencies across longitudinal patient timelines. Experimental results show substantial improvements over random baselines in next-token prediction tasks, with generated sequences reflecting realistic diagnostic pathways, laboratory result progressions, and clinician ordering behaviors. These findings highlight the feasibility of applying generative language modeling techniques to structured medical event data, enabling applications in clinical decision support, simulation, and education. PRISM establishes a foundation for future advancements in sequence-based healthcare modeling, bridging the gap between machine learning architectures and real-world diagnostic reasoning. Accurate and timely clinical decision-making is fundamental to high-quality patient care.

large language model, machine learning, natural language, (20 more...)

2506.11082

Country: North America > United States (0.15)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Health Care Technology (0.91)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Neural Information Processing SystemsMay-26-2025, 22:18:13 GMT

No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models

The computation necessary for training Transformer-based language models has skyrocketed in recent years.This trend has motivated research on efficient training algorithms designed to improve training, validation, and downstream performance faster than standard training. In this work, we revisit three categories of such algorithms: dynamic architectures (layer stacking, layer dropping), batch selection (selective backprop., When pre-training BERT and T5 with a fixed computation budget using such methods, we find that their training, validation, and downstream gains vanish compared to a baseline with a fully-decayed learning rate. We define an evaluation protocol that enables computation to be done on arbitrary machines by mapping all computation time to a reference machine which we call reference system time.

revisiting efficient training algorithm, transformer-based language model, validation, (1 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)