AITopics | Education

Collaborating Authors

Education

From Unstructured Data to Demand Counterfactuals: Theory and Practice

Christensen, Timothy, Compiani, Giovanni

arXiv.org Machine LearningJan-12-2026

Empirical models of demand for differentiated products rely on low-dimensional product representations to capture substitution patterns. These representations are increasingly proxied by applying ML methods to high-dimensional, unstructured data, including product descriptions and images. When proxies fail to capture the true dimensions of differentiation that drive substitution, standard workflows will deliver biased counterfactuals and invalid inference. We develop a practical toolkit that corrects this bias and ensures valid inference for a broad class of counterfactuals. Our approach applies to market-level and/or individual data, requires minimal additional computation, is efficient, delivers simple formulas for standard errors, and accommodates data-dependent proxies, including embeddings from fine-tuned ML models. It can also be used with standard quantitative attributes when mismeasurement is a concern. In addition, we propose diagnostics to assess the adequacy of the proxy construction and dimension. The approach yields meaningful improvements in predicting counterfactual substitution in both simulations and an empirical application.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2601.05374

Country: North America > United States (1.00)

Genre: Research Report (0.81)

Industry:

Automobiles & Trucks (0.46)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.92)
Information Technology > Information Management (0.84)

Add feedback

The Download: the case for AI slop, and helping CRISPR fulfill its promise

MIT Technology ReviewJan-9-2026, 13:10:00 GMT

If I were to locate the moment AI slop broke through into popular consciousness, I'd pick the video of rabbits bouncing on a trampoline that went viral last summer. For many savvy internet users, myself included, it was the first time we were fooled by an AI video, and it ended up spawning a wave of almost identical generated clips. My first reaction was that, broadly speaking, all of this sucked. That's become a familiar refrain, in think pieces and at dinner parties. Everything online is slop now--the internet "enshittified," with AI taking much of the blame. But then friends started sharing AI clips in group chats that were compellingly weird, or funny.

ai slop, download, mit technology review, (14 more...)

MIT Technology Review

Country:

Asia > China (0.07)
North America > United States > New York (0.05)
North America > United States > New Jersey (0.05)
(2 more...)

Industry:

Education > Health & Safety > School Nutrition (1.00)
Health & Medicine > Therapeutic Area (0.73)
Health & Medicine > Pharmaceuticals & Biotechnology (0.71)
Government > Regional Government > North America Government > United States Government (0.48)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.30)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

America's new dietary guidelines ignore decades of scientific research

MIT Technology ReviewJan-8-2026, 17:10:50 GMT

America's new dietary guidelines ignore decades of scientific research An emphasis on fruit, vegetables, and whole foods is welcome--but it's wrong to suggest steak and beef tallow should be prominent. The new year has barely begun, but the first days of 2026 have brought big news for health. On Monday, the US's federal health agency upended its recommendations for routine childhood vaccinations--a move that health associations worry puts children at unnecessary risk of preventable disease. There was more news from the federal government on Wednesday, when health secretary Robert F. Kennedy Jr. and his colleagues at the Departments of Health and Human Services and Agriculture unveiled new dietary guidelines for Americans . And they are causing a bit of a stir. RFK Jr's plan to improve America's diet is missing the point That's partly because they recommend products like red meat, butter, and beef tallow--foods that have been linked to cardiovascular disease, and that nutrition experts have been recommending people in their diets.

guideline, recommendation, scientific research, (11 more...)

MIT Technology Review

Country:

North America > United States > New York (0.05)
North America > United States > Massachusetts (0.05)
North America > United States > District of Columbia > Washington (0.05)
Asia > China (0.05)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Consumer Health (1.00)
Education > Health & Safety > School Nutrition (1.00)
Government > Regional Government > North America Government > United States Government (0.88)

Technology:

Information Technology > Communications > Social Media (0.99)
Information Technology > Artificial Intelligence (0.71)

Add feedback

Learning Shrinks the Hard Tail: Training-Dependent Inference Scaling in a Solvable Linear Model

Levi, Noam

arXiv.org Machine LearningJan-8-2026

We analyze neural scaling laws in a solvable model of last-layer fine-tuning where targets have intrinsic, instance-heterogeneous difficulty. In our Latent Instance Difficulty (LID) model, each input's target variance is governed by a latent ``precision'' drawn from a heavy-tailed distribution. While generalization loss recovers standard scaling laws, our main contribution connects this to inference. The pass@$k$ failure rate exhibits a power-law decay, $k^{-β_\text{eff}}$, but the observed exponent $β_\text{eff}$ is training-dependent. It grows with sample size $N$ before saturating at an intrinsic limit $β$ set by the difficulty distribution's tail. This coupling reveals that learning shrinks the ``hard tail'' of the error distribution: improvements in the model's generalization error steepen the pass@$k$ curve until irreducible target variance dominates. The LID model yields testable, closed-form predictions for this behavior, including a compute-allocation rule that favors training before saturation and inference attempts after. We validate these predictions in simulations and in two real-data proxies: CIFAR-10H (human-label variance) and a maths teacher-student distillation task.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

2601.03764

Country:

North America (0.28)
Europe (0.28)

Genre: Research Report (0.85)

Industry: Education > Curriculum > Subject-Specific Education (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Online Learning with Limited Information in the Sliding Window Model

Braverman, Vladimir, Garg, Sumegha, Wang, Chen, Woodruff, David P., Zhou, Samson

arXiv.org Machine LearningJan-8-2026

Motivated by recent work on the experts problem in the streaming model, we consider the experts problem in the sliding window model. The sliding window model is a well-studied model that captures applications such as traffic monitoring, epidemic tracking, and automated trading, where recent information is more valuable than older data. Formally, we have $n$ experts, $T$ days, the ability to query the predictions of $q$ experts on each day, a limited amount of memory, and should achieve the (near-)optimal regret $\sqrt{nW}\text{polylog}(nT)$ regret over any window of the last $W$ days. While it is impossible to achieve such regret with $1$ query, we show that with $2$ queries we can achieve such regret and with only $\text{polylog}(nT)$ bits of memory. Not only are our algorithms optimal for sliding windows, but we also show for every interval $\mathcal{I}$ of days that we achieve $\sqrt{n|\mathcal{I}|}\text{polylog}(nT)$ regret with $2$ queries and only $\text{polylog}(nT)$ bits of memory, providing an exponential improvement on the memory of previous interval regret algorithms. Building upon these techniques, we address the bandit problem in data streams, where $q=1$, achieving $n T^{2/3}\text{polylog}(T)$ regret with $\text{polylog}(nT)$ memory, which is the first sublinear regret in the streaming model in the bandit setting with polylogarithmic memory; this can be further improved to the optimal $\mathcal{O}(\sqrt{nT})$ regret if the best expert's losses are in a random order.

data mining, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

2601.03533

Country:

Europe > France (0.27)
Europe > Austria (0.27)

Genre: Research Report (0.49)

Industry:

Information Technology > Security & Privacy (1.00)
Education > Educational Setting > Online (0.51)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications (0.87)
(3 more...)

Add feedback

AI Models Are Starting to Learn by Asking Themselves Questions

WIREDJan-7-2026, 19:00:00 GMT

An AI model that learns without human input--by posing interesting queries for itself--might point the way to superintelligence. Even the smartest artificial intelligence models are essentially copycats. They learn either by consuming examples of human work or by trying to solve problems that have been set for them by human instructors. But perhaps AI can, in fact, learn in a more human way--by figuring out interesting questions to ask itself and attempting to find the right answer. A project from Tsinghua University, the Beijing Institute for General Artificial Intelligence (BIGAI), and Pennsylvania State University shows that AI can learn to reason in this way by playing with computer code.

absolute zero, ai model, university, (14 more...)

WIRED

Country:

North America > United States > Pennsylvania (0.25)
Asia > China > Beijing > Beijing (0.25)
North America > United States > North Carolina (0.05)
(5 more...)

Industry:

Information Technology (1.00)
Education (0.90)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence

Finzi, Marc, Qiu, Shikai, Jiang, Yiding, Izmailov, Pavel, Kolter, J. Zico, Wilson, Andrew Gordon

arXiv.org Machine LearningJan-7-2026

Can we learn more from data than existed in the generating process itself? Can new and useful information be constructed from merely applying deterministic transformations to existing data? Can the learnable content in data be evaluated without considering a downstream task? On these questions, Shannon information and Kolmogorov complexity come up nearly empty-handed, in part because they assume observers with unlimited computational capacity and fail to target the useful information content. In this work, we identify and exemplify three seeming paradoxes in information theory: (1) information cannot be increased by deterministic transformations; (2) information is independent of the order of data; (3) likelihood modeling is merely distribution matching. To shed light on the tension between these results and modern practice, and to quantify the value of data, we introduce epiplexity, a formalization of information capturing what computationally bounded observers can learn from data. Epiplexity captures the structural content in data while excluding time-bounded entropy, the random unpredictable content exemplified by pseudorandom number generators and chaotic dynamical systems. With these concepts, we demonstrate how information can be created with computation, how it depends on the ordering of the data, and how likelihood modeling can produce more complex programs than present in the data generating process itself. We also present practical procedures to estimate epiplexity which we show capture differences across data sources, track with downstream performance, and highlight dataset interventions that improve out-of-distribution generalization. In contrast to principles of model selection, epiplexity provides a theoretical foundation for data selection, guiding how to select, generate, or transform data for learning systems.

information, large language model, machine learning, (19 more...)

arXiv.org Machine Learning

2601.0322

Country: North America > United States (0.67)

Genre: Research Report (0.63)

Industry:

Education (0.92)
Information Technology > Security & Privacy (0.67)
Leisure & Entertainment > Games > Chess (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)

Add feedback

Chronicals: A High-Performance Framework for LLM Fine-Tuning with 3.51x Speedup over Unsloth

Nair, Arjun S.

arXiv.org Machine LearningJan-7-2026

Large language model fine-tuning is bottlenecked by memory: a 7B parameter model requires 84GB--14GB for weights, 14GB for gradients, and 56GB for FP32 optimizer states--exceeding even A100-40GB capacity. We present Chronicals, an open-source training framework achieving 3.51x speedup over Unsloth through four synergistic optimizations: (1) fused Triton kernels eliminating 75% of memory traffic via RMSNorm (7x), SwiGLU (5x), and QK-RoPE (2.3x) fusion; (2) Cut Cross-Entropy reducing logit memory from 5GB to 135MB through online softmax computation; (3) LoRA+ with theoretically-derived 16x differential learning rates between adapter matrices; and (4) Best-Fit Decreasing sequence packing recovering 60-75% of compute wasted on padding. On Qwen2.5-0.5B with A100-40GB, Chronicals achieves 41,184 tokens/second for full fine-tuning versus Unsloth's 11,736 tokens/second (3.51x). For LoRA at rank 32, we reach 11,699 tokens/second versus Unsloth MAX's 2,857 tokens/second (4.10x). Critically, we discovered that Unsloth's reported 46,000 tokens/second benchmark exhibited zero gradient norms--the model was not training. We provide complete mathematical foundations: online softmax correctness proofs, FlashAttention IO complexity bounds O(N^2 d^2 M^{-1}), LoRA+ learning rate derivations from gradient magnitude analysis, and bin-packing approximation guarantees. All implementations, benchmarks, and proofs are available at https://github.com/Ajwebdevs/Chronicals with pip installation via https://pypi.org/project/chronicals/.

large language model, machine learning, natural language, (21 more...)

arXiv.org Machine Learning

2601.02609

Country: Europe (0.27)

Genre: Research Report > New Finding (0.67)

Industry: Education (0.65)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

First Provably Optimal Asynchronous SGD for Homogeneous and Heterogeneous Data

Maranjyan, Artavazd

arXiv.org Machine LearningJan-7-2026

Artificial intelligence has advanced rapidly through large neural networks trained on massive datasets using thousands of GPUs or TPUs. Such training can occupy entire data centers for weeks and requires enormous computational and energy resources. Yet the optimization algorithms behind these runs have not kept pace. Most large scale training still relies on synchronous methods, where workers must wait for the slowest device, wasting compute and amplifying the effects of hardware and network variability. Removing synchronization seems like a simple fix, but asynchrony introduces staleness, meaning updates computed on outdated models. This makes analysis difficult, especially when delays arise from system level randomness rather than algorithmic choices. As a result, the time complexity of asynchronous methods remains poorly understood. This dissertation develops a rigorous framework for asynchronous first order stochastic optimization, focusing on the core challenge of heterogeneous worker speeds. Within this framework, we show that with proper design, asynchronous SGD can achieve optimal time complexity, matching guarantees previously known only for synchronous methods. Our first contribution, Ringmaster ASGD, attains optimal time complexity in the homogeneous data setting by selectively discarding stale updates. The second, Ringleader ASGD, extends optimality to heterogeneous data, common in federated learning, using a structured gradient table mechanism. Finally, ATA improves resource efficiency by learning worker compute time distributions and allocating tasks adaptively, achieving near optimal wall clock time with less computation. Together, these results establish asynchronous optimization as a theoretically sound and practically efficient foundation for distributed learning, showing that coordination without synchronization can be both feasible and optimal.

machine learning, natural language, optimization problem, (17 more...)

arXiv.org Machine Learning

doi: 10.25781/KAUST-WH234

2601.02523

Country:

Asia (0.67)
North America (0.45)

Genre: Research Report > New Finding (1.00)

Industry:

Energy (1.00)
Education (1.00)
Health & Medicine > Therapeutic Area (0.45)
Information Technology > Services (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Starstruck

MIT Technology ReviewJan-6-2026, 22:00:00 GMT

Aomawa Shields '97 was equally enticed by the prospect of studying stars and the dream of becoming one herself. Today, she draws from her exploration of acting and astronomy to search for life on other planets. Few people, if any, contemplate stars--celestial or cinematic--the way Aomawa Shields does. An astronomer and astrobiologist, Shields explores the potential habitability of planets beyond our solar system. But she is also a classically trained actor--and that's helped shape her professional trajectory in unexpected ways. Today, Shields is an associate professor in the Department of Physics and Astronomy at the University of California, Irvine, where she oversees a research team that uses computer models to explore conditions on exoplanets, or planets that revolve around stars other than the sun.

astronomy, planet, shield, (16 more...)

MIT Technology Review

Country:

North America > United States > California > Orange County > Irvine (0.24)
North America > United States > Massachusetts (0.05)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(2 more...)

Industry:

Education (1.00)
Media > Music (0.49)
Government > Regional Government > North America Government > United States Government (0.47)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence (0.91)

Add feedback