Large Language Model
When the AI bubble bursts, humans will finally have their chance to take back control Rafael Behr
The US economy is pumped up on tech-bro vanity. I f AI did not change your life in 2025, next year it will. That is one of few forecasts that can be made with confidence in unpredictable times. This is not an invitation to believe the hype about what the technology can do today, or may one day achieve. The hype doesn't need your credence.
Secret mixtures of experts inside your LLM
Despite being one of the earliest neural network layers, the Multilayer Perceptron (MLP) is arguably one of the least understood parts of the transformer architecture due to its dense computation and lack of easy visualization. This paper seeks to understand the MLP layers in dense LLM models by hypothesizing that these layers secretly approximately perform a sparse computation -- namely, that they can be well approximated by sparsely-activating Mixture of Experts (MoE) layers. Our hypothesis is based on a novel theoretical connection between MoE models and Sparse Autoencoder (SAE) structure in activation space. We empirically validate the hypothesis on pretrained LLMs, and demonstrate that the activation distribution matters -- these results do not hold for Gaussian data, but rather rely crucially on structure in the distribution of neural network activations. Our results shine light on a general principle at play in MLP layers inside LLMs, and give an explanation for the effectiveness of modern MoE-based transformers. Additionally, our experimental explorations suggest new directions for more efficient MoE architecture design based on low-rank routers.
OpenAI's Child Exploitation Reports Increased Sharply This Year
OpenAI's Child Exploitation Reports Increased Sharply This Year The company made 80 times as many reports to the National Center for Missing & Exploited Children during the first six months of 2025 as it did in the same period a year prior. OpenAI sent 80 times as many child exploitation incident reports to the National Center for Missing & Exploited Children during the first half of 2025 as it did during a similar time period in 2024, according to a recent update from the company. The NCMEC's CyberTipline is a Congressionally authorized clearinghouse for reporting child sexual abuse material (CSAM) and other forms of child exploitation. Companies are required by law to report apparent child exploitation to the CyberTipline. When a company sends a report, NCMEC reviews it and then forwards it to the appropriate law enforcement agency for investigation.
Five AI Developments That Changed Everything This Year
President Donald Trump speaks in the Roosevelt Room flanked by Masayoshi Son, Larry Ellison, and Sam Altman at the White House on January 21, 2025. President Donald Trump speaks in the Roosevelt Room flanked by Masayoshi Son, Larry Ellison, and Sam Altman at the White House on January 21, 2025. In case you missed it, 2025 was a big year for AI. It became an economic force, propping up the stock market, and a geopolitical pawn, redrawing the frontlines of Great Power competition. It had both global and deeply personal effects, changing the ways that we think, write, and relate.
Brain Gear Is the Hot New Wearable
Smartwatches are cool and all, but have you considered wearable neurotech? Ten years ago, a Fitbit was about as sophisticated a wearable as you could get. Then came the sleeker, more unassuming Oura ring . Now there's a new breed of wearables--built for your head. Instead of tracking your step count, heart rate, and skin temperature, these devices are designed to read your brain waves.
Tech Disrupted Friendship. It's Time to Bring It Back
Two decades ago, social media promised to connect people with pals far and wide. Twenty years online has left us turning to AI for kinship. IRL companionship is the future. Anyone looking for a vibe check on the populace's current feelings about AI would do well to check out the walls of the New York City subway system. This fall, alongside posters for everything from dating apps to Skechers, a newcomer made its debut: Friend .
Mitigating Forgetting in Low Rank Adaptation
Sliwa, Joanna, Schneider, Frank, Hennig, Philipp, Hernandez-Lobato, Jose Miguel
Parameter-efficient fine-tuning methods, such as Low-Rank Adaptation (LoRA), enable fast specialization of large pre-trained models to different downstream applications. However, this process often leads to catastrophic forgetting of the model's prior domain knowledge. We address this issue with LaLoRA, a weight-space regularization technique that applies a Laplace approximation to Low-Rank Adaptation. Our approach estimates the model's confidence in each parameter and constrains updates in high-curvature directions, preserving prior knowledge while enabling efficient target-domain learning. By applying the Laplace approximation only to the LoRA weights, the method remains lightweight. We evaluate LaLoRA by fine-tuning a Llama model for mathematical reasoning and demonstrate an improved learning-forgetting trade-off, which can be directly controlled via the method's regularization strength. We further explore different loss landscape curvature approximations for estimating parameter confidence, analyze the effect of the data used for the Laplace approximation, and study robustness across hyperparameters.
Smoothing DiLoCo with Primal Averaging for Faster Training of LLMs
Defazio, Aaron, Mishchenko, Konstantin, Raman, Parameswaran, Shi, Hao-Jun Michael, Xiao, Lin
We propose Generalized Primal Averaging (GPA), an extension of Nesterov's method in its primal averaging formulation that addresses key limitations of recent averaging-based optimizers such as single-worker DiLoCo and Schedule-Free (SF) in the non-distributed setting. These two recent algorithmic approaches improve the performance of base optimizers, such as AdamW, through different iterate averaging strategies. Schedule-Free explicitly maintains a uniform average of past weights, while single-worker DiLoCo performs implicit averaging by periodically aggregating trajectories, called pseudo-gradients, to update the model parameters. However, single-worker DiLoCo's periodic averaging introduces a two-loop structure, increasing its memory requirements and number of hyperparameters. GPA overcomes these limitations by decoupling the interpolation constant in the primal averaging formulation of Nesterov. This decoupling enables GPA to smoothly average iterates at every step, generalizing and improving upon single-worker DiLoCo. Empirically, GPA consistently outperforms single-worker DiLoCo while removing the two-loop structure, simplifying hyperparameter tuning, and reducing its memory overhead to a single additional buffer. On the Llama-160M model, GPA provides a 24.22% speedup in terms of steps to reach the baseline (AdamW's) validation loss. Likewise, GPA achieves speedups of 12% and 27% on small and large batch setups, respectively, to attain AdamW's validation accuracy on the ImageNet ViT workload. Furthermore, we prove that for any base optimizer with regret bounded by $O(\sqrt{T})$, where $T$ is the number of iterations, GPA can match or exceed the convergence guarantee of the original optimizer, depending on the choice of interpolation constants.
You can now tweak how warm and enthusiastic ChatGPT's responses are
LG TVs add'delete' option for Copilot You can now tweak how warm and enthusiastic ChatGPT's responses are OpenAI is letting users decide between more, less or default options to adjust ChatGPT's personality. OpenAI gave its AI chatbot a professional makeover with the latest GPT-5.2 For anyone who's finding ChatGPT rude or sassy, OpenAI has some welcome news since it's letting users further customize its personality with extra warmth or enthusiasm. You can now adjust specific characteristics in ChatGPT, like warmth, enthusiasm, and emoji use. In a post on X, OpenAI revealed that users can adjust characteristics under new Warm, Enthusiastic, Header & Lists and Emoji options found in the Personalization settings.
Get This All-In-One AI Platform for 86% Off Before the New Year
When you purchase through links in our articles, we may earn a small commission. Lifetime access to the 1minAI Advanced Business Plan is now $74.97 (MSRP $540). The end of the year is when people start hunting for upgrades that will actually make life easier in January. If you have spent most of 2024 bouncing between ChatGPT, Gemini, Claude, and your favorite image generator, this holiday deal rolls everything into one place. For $74.97, you can lock in lifetime access to the 1minAI Advanced Business Plan and begin 2025 with every AI tool you could possibly need .