AITopics | Large Language Model

Collaborating Authors

Large Language Model

News Overviews Instructional Materials AI-Alerts Classics

From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence

Finzi, Marc, Qiu, Shikai, Jiang, Yiding, Izmailov, Pavel, Kolter, J. Zico, Wilson, Andrew Gordon

arXiv.org Machine LearningJan-7-2026

Can we learn more from data than existed in the generating process itself? Can new and useful information be constructed from merely applying deterministic transformations to existing data? Can the learnable content in data be evaluated without considering a downstream task? On these questions, Shannon information and Kolmogorov complexity come up nearly empty-handed, in part because they assume observers with unlimited computational capacity and fail to target the useful information content. In this work, we identify and exemplify three seeming paradoxes in information theory: (1) information cannot be increased by deterministic transformations; (2) information is independent of the order of data; (3) likelihood modeling is merely distribution matching. To shed light on the tension between these results and modern practice, and to quantify the value of data, we introduce epiplexity, a formalization of information capturing what computationally bounded observers can learn from data. Epiplexity captures the structural content in data while excluding time-bounded entropy, the random unpredictable content exemplified by pseudorandom number generators and chaotic dynamical systems. With these concepts, we demonstrate how information can be created with computation, how it depends on the ordering of the data, and how likelihood modeling can produce more complex programs than present in the data generating process itself. We also present practical procedures to estimate epiplexity which we show capture differences across data sources, track with downstream performance, and highlight dataset interventions that improve out-of-distribution generalization. In contrast to principles of model selection, epiplexity provides a theoretical foundation for data selection, guiding how to select, generate, or transform data for learning systems.

information, large language model, machine learning, (19 more...)

arXiv.org Machine Learning

2601.0322

Country: North America > United States (0.67)

Genre: Research Report (0.63)

Industry:

Education (0.92)
Information Technology > Security & Privacy (0.67)
Leisure & Entertainment > Games > Chess (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)

Add feedback

Chronicals: A High-Performance Framework for LLM Fine-Tuning with 3.51x Speedup over Unsloth

Nair, Arjun S.

arXiv.org Machine LearningJan-7-2026

Large language model fine-tuning is bottlenecked by memory: a 7B parameter model requires 84GB--14GB for weights, 14GB for gradients, and 56GB for FP32 optimizer states--exceeding even A100-40GB capacity. We present Chronicals, an open-source training framework achieving 3.51x speedup over Unsloth through four synergistic optimizations: (1) fused Triton kernels eliminating 75% of memory traffic via RMSNorm (7x), SwiGLU (5x), and QK-RoPE (2.3x) fusion; (2) Cut Cross-Entropy reducing logit memory from 5GB to 135MB through online softmax computation; (3) LoRA+ with theoretically-derived 16x differential learning rates between adapter matrices; and (4) Best-Fit Decreasing sequence packing recovering 60-75% of compute wasted on padding. On Qwen2.5-0.5B with A100-40GB, Chronicals achieves 41,184 tokens/second for full fine-tuning versus Unsloth's 11,736 tokens/second (3.51x). For LoRA at rank 32, we reach 11,699 tokens/second versus Unsloth MAX's 2,857 tokens/second (4.10x). Critically, we discovered that Unsloth's reported 46,000 tokens/second benchmark exhibited zero gradient norms--the model was not training. We provide complete mathematical foundations: online softmax correctness proofs, FlashAttention IO complexity bounds O(N^2 d^2 M^{-1}), LoRA+ learning rate derivations from gradient magnitude analysis, and bin-packing approximation guarantees. All implementations, benchmarks, and proofs are available at https://github.com/Ajwebdevs/Chronicals with pip installation via https://pypi.org/project/chronicals/.

large language model, machine learning, natural language, (21 more...)

arXiv.org Machine Learning

2601.02609

Country: Europe (0.27)

Genre: Research Report > New Finding (0.67)

Industry: Education (0.65)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Detecting and Mitigating Treatment Leakage in Text-Based Causal Inference: Distillation and Sensitivity Analysis

Daoud, Adel, Johansson, Richard, Jerzak, Connor T.

arXiv.org Machine LearningJan-7-2026

Text-based causal inference increasingly employs textual data as proxies for unobserved confounders, yet this approach introduces a previously undertheorized source of bias: treatment leakage. Treatment leakage occurs when text intended to capture confounding information also contains signals predictive of treatment status, thereby inducing post-treatment bias in causal estimates. Critically, this problem can arise even when documents precede treatment assignment, as authors may employ future-referencing language that anticipates subsequent interventions. Despite growing recognition of this issue, no systematic methods exist for identifying and mitigating treatment leakage in text-as-confounder applications. This paper addresses this gap through three contributions. First, we provide formal statistical and set-theoretic definitions of treatment leakage that clarify when and why bias occurs. Second, we propose four text distillation methods -- similarity-based passage removal, distant supervision classification, salient feature removal, and iterative nullspace projection -- designed to eliminate treatment-predictive content while preserving confounder information. Third, we validate these methods through simulations using synthetic text and an empirical application examining International Monetary Fund structural adjustment programs and child mortality. Our findings indicate that moderate distillation optimally balances bias reduction against confounder retention, whereas overly stringent approaches degrade estimate precision.

information, large language model, machine learning, (22 more...)

arXiv.org Machine Learning

2601.024

Country: North America > United States (1.00)

Genre: Research Report > New Finding (1.00)

Industry:

Government (1.00)
Health & Medicine > Therapeutic Area > Pediatrics/Neonatology (0.49)
Health & Medicine > Therapeutic Area > Immunology (0.46)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
(2 more...)

Add feedback

Stand Up for Research, Innovation, and Education

MIT Technology ReviewJan-6-2026, 22:00:00 GMT

Our community is standing up for MIT and its mission to serve the nation and the world. And we need you to join us at this critical moment. This story was part of our September/October 2025 issue. We're learning more about what vitamin D does to our bodies Jessica Hamzelou OpenAI's new LLM exposes the secrets of how AI really works Will Douglas Heaven China figured out how to sell EVs. Now it has to deal with their aging batteries. We're learning more about what vitamin D does to our bodies The sunshine vitamin could affect your immune system and heart health.

great ai hype correction, innovation, share story, (8 more...)

MIT Technology Review

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.06)
Asia > China > Beijing > Beijing (0.06)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.60)

Add feedback

The AI Safety Demo That Caused Alarm in Washington

TIME - TechJan-6-2026, 15:03:37 GMT

Welcome back to, TIME's new twice-weekly newsletter about AI. If you're reading this in your browser, why not subscribe to have the next one delivered straight to your inbox? Late last year, an AI researcher opened his laptop and showed me something jaw-dropping. Lucas Hansen, co-founder of nonprofit CivAI, was showing me an app he built that coaxed popular AI models into giving what appeared to be detailed step-by-step instructions for creating poliovirus and anthrax. Any safeguards that these models had were stripped away.

ai safety demo, caused alarm, turley, (10 more...)

TIME - Tech

Country:

North America > United States > District of Columbia > Washington (0.05)
North America > United States > California > San Francisco County > San Francisco (0.05)
Europe > Germany (0.05)
(2 more...)

Industry:

Government (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.30)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.58)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.37)

Add feedback

What are small language models and how do they differ from large ones?

AIHubJan-6-2026, 11:27:15 GMT

What are small language models and how do they differ from large ones? Microsoft recently released its latest small language model that can operate directly on the user's computer. If you haven't followed the AI industry closely, you might be asking: what exactly a small language model (SLM)? As AI becomes increasingly central to how we work, learn and solve problems, understanding the different types of AI models has never been more important. Large language models (LLMs) such as ChatGPT, Claude, Gemini and others are in widespread use.

language model, small language model, university, (10 more...)

AIHub

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
Asia > India (0.05)

Industry: Transportation (0.32)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.36)

Add feedback

Leading AI expert delays timeline for its possible destruction of humanity

The GuardianJan-6-2026, 06:00:29 GMT

Former OpenAI employee Daniel Kokotajlo says progress to AGI is'somewhat slower' than first predicted

kokotajlo, possible destruction, real world, (10 more...)

The Guardian

Country:

Europe > Ukraine (0.07)
North America > United States > New York (0.06)
Oceania > Australia (0.05)
Asia > China (0.05)

Industry:

Leisure & Entertainment > Sports (0.72)
Government > Regional Government > North America Government > United States Government (0.31)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.37)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.37)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.37)

Add feedback

Rokid introduces display-free AI smartglasses at CES 2026

EngadgetJan-6-2026, 01:00:17 GMT

Style supports multiple AI engines and third-party integrations. Smartglasses company Rokid has introduced new display-free AI glasses at CES 2026. Dubbed Style, the glasses are intended for all-day use and are compatible with users' corrective prescriptions. Style supports multiple AI engines, including ChatGPT and DeepSeek, instead of being locked to any LLM. The glasses can also work with Google Maps and Microsoft AI translation.

glasses, rokid introduce display-free ai smartglass, term and privacy policy, (7 more...)

Engadget

Industry:

Semiconductors & Electronics (0.38)
Information Technology (0.38)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.58)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.58)

Add feedback

Personalizing black-box models for nonparametric regression with minimax optimality

Li, Sai, Zhang, Linjun

arXiv.org Machine LearningJan-6-2026

Recent advances in large-scale models, including deep neural networks and large language models, have substantially improved performance across a wide range of learning tasks. The widespread availability of such pre-trained models creates new opportunities for data-efficient statistical learning, provided they can be effectively integrated into downstream tasks. Motivated by this setting, we study few-shot personalization, where a pre-trained black-box model is adapted to a target domain using a limited number of samples. We develop a theoretical framework for few-shot personalization in nonparametric regression and propose algorithms that can incorporate a black-box pre-trained model into the regression procedure. We establish the minimax optimal rate for the personalization problem and show that the proposed method attains this rate. Our results clarify the statistical benefits of leveraging pre-trained models under sample scarcity and provide robustness guarantees when the pre-trained model is not informative. We illustrate the finite-sample performance of the methods through simulations and an application to the California housing dataset with several pre-trained models.

large language model, machine learning, pre-trained model, (21 more...)

arXiv.org Machine Learning

2601.01432

Country: North America > United States > California (0.25)

Genre: Research Report > New Finding (0.66)

Industry: Transportation > Air (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Investigating the Multilingual Calibration Effects of Language Model Instruction-Tuning

Huang, Jerry, Lu, Peng, Zeng, Qiuhao, Iwasawa, Yusuke, Matsuo, Yutaka, Chandar, Sarath, Marrese-Taylor, Edison, Li, Irene

arXiv.org Machine LearningJan-6-2026

Ensuring that deep learning models are well-calibrated in terms of their predictive uncertainty is essential in maintaining their trustworthiness and reliability, yet despite increasing advances in foundation model research, the relationship between such large language models (LLMs) and their calibration remains an open area of research. In this work, we look at a critical gap in the calibration of LLMs within multilingual settings, in an attempt to better understand how the data scarcity can potentially lead to different calibration effects and how commonly used techniques can apply in these settings. Our analysis on two multilingual benchmarks, over 29 and 42 languages respectively, reveals that even in low-resource languages, model confidence can increase significantly after instruction-tuning on high-resource language SFT datasets. However, improvements in accuracy are marginal or non-existent, resulting in mis-calibration, highlighting a critical shortcoming of standard SFT for multilingual languages. Furthermore, we observe that the use of label smoothing to be a reasonable method alleviate this concern, again without any need for low-resource SFT data, maintaining better calibration across all languages. Overall, this highlights the importance of multilingual considerations for both training and tuning LLMs in order to improve their reliability and fairness in downstream use.

large language model, machine learning, natural language, (21 more...)

arXiv.org Machine Learning

2601.01362

Country:

North America > United States (1.00)
Europe (0.92)
Asia > Middle East > UAE (0.45)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback