Large Language Model
How to Scale Mixture-of-Experts: From muP to the Maximally Scale-Stable Parameterization
Vankadara, Leena Chennuru, Haas, Moritz, Hayward, Luke, Bordt, Sebastian, Breccia, Alessandro
Recent frontier large language models predominantly rely on Mixture-of-Experts (MoE) architectures. Despite empirical progress, there is still no principled understanding of how hyperparameters should scale with network width $N$, expert width $N_e$, number of experts $M$, sparsity $K$, and depth $L$ to ensure both stability and optimal performance at scale. We take a principled step toward resolving this gap by analyzing three different scaling regimes: (I) co-scaling $N\asymp N_e$, (II) co-scaling $N\asymp M\asymp K$, and (III) full proportional scaling of $N, N_e, M$, and $K$. For each regime, we develop a novel Dynamical Mean Field Theory (DMFT) description of the limiting training dynamics of MoEs that provides a formal foundation for our analysis. Within this framework, we derive the unique parameterization for SGD and Adam satisfying all maximal-update ($ฮผ$) desiderata. We then show that the resulting $ฮผ$P prescription does not reliably induce monotonic improvement with scale or robust learning-rate transfer. We trace these pathologies to scale-dependent observables in the aggregation dynamics, which motivates a refined set of desiderata that we term maximal scale stability. Guided by this principle, we derive a Maximally Scale-Stable Parameterization (MSSP) for both SGD and Adam in all three scaling regimes, and characterize the corresponding limiting dynamics - qualitatively distinct from the $ฮผ$P limit - through a separate DMFT analysis. Experiments verify that MSSP robustly recovers learning rate transfer and monotonic improvement with scale across regimes. Combined with existing depth-scaling theory, these results provide a complete scaling prescription for MoE architectures as a function of width, depth, expert width, and number of experts.
Language-Induced Priors for Domain Adaptation
Chen, Qiyuan, Zhou, Jiayu, Kontar, Raed Al
Domain adaptation faces a fundamental paradox in the cold-start regime. When target data is scarce, statistical methods fail to distinguish relevant source domains from irrelevant ones, which often leads to negative transfer. In this paper, we address this challenge by leveraging expert textual descriptions of the target domain, a resource that is often available but overlooked. We propose a probabilistic framework that translates these semantic descriptions into a choice model, namely a Language-Induced Prior (LIP), that learns the preferences from a pretrained Large Language Model (LLM). The LIP is then integrated into an Expectation-Maximization algorithm to identify source relevance. Methodologically, this framework is compatible with any parametric model where a likelihood is available. It allows the LIP to guide the selection of sources when target signals are weak, while gradually refining these choices as samples accumulate. Theoretically, we prove that the estimator roughly matches an oracle cold-start MSE under a correct prior, while remaining asymptotically consistent regardless of the quality of the LIP. Empirically, we validated the framework on a descriptive (Gaussian estimation), a predictive (C-MAPSS dataset), and a prescriptive task (MuJoCo hopper).
InfoSFT: Learn More and Forget Less with Information-Aware Token Weighting
Sabbaghi, Mahdi, Pappas, George, Javanmard, Adel, Hassani, Hamed
Supervised fine-tuning (SFT) provides the standard approach for teaching LLMs new behaviors from offline expert demonstrations. However, standard SFT uniformly fits all samples -- including those with low likelihood under the base model -- which can disproportionately drive training updates toward overfitting specific samples rather than learning the target behavior. Moreover, adapting to these unlikely samples induces substantial policy shifts that degrade prior capabilities. Existing methods mitigate this by filtering, regenerating, or down-weighting low-likelihood data. In doing so, they often suppress precisely the novel behaviors the base model has yet to learn. We propose InfoSFT, a principled weighting scheme for the SFT objective that concentrates learning signals on maximally informative, medium-confidence tokens -- those neither overly familiar to the base model nor too unlikely to cause instability. Requiring only a one-line modification to the standard token-wise loss, InfoSFT demonstrably improves generalization over vanilla SFT and likelihood-weighted baselines across math, code, and chain-of-thought tasks with diverse model families, while better preserving pre-existing capabilities.
Text Knows What, Tables Know When: Clinical Timeline Reconstruction via Retrieval-Augmented Multimodal Alignment
Kumar, Sayantan, Noroozizadeh, Shahriar, Kim, Juyong, Weiss, Jeremy C.
Reconstructing precise clinical timelines is essential for modeling patient trajectories and forecasting risk in complex, heterogeneous conditions like sepsis. While unstructured clinical narratives offer semantically rich and contextually complete descriptions of a patient's course, they often lack temporal precision and contain ambiguous event timing. Conversely, structured electronic health record (EHR) data provides precise temporal anchors but misses a substantial portion of clinically meaningful events. We introduce a retrieval-augmented multimodal alignment framework that bridges this gap to improve the temporal precision of absolute clinical timelines extracted from text. Our approach formulates timeline reconstruction as a graph-based multistep process: it first extracts central anchor events from narratives to build an initial temporal scaffold, places non-central events relative to this backbone, and then calibrates the timeline using retrieved structured EHR rows as external temporal evidence. Evaluated using instruction-tuned large language models on the i2m4 benchmark spanning MIMIC-III and MIMIC-IV, our multimodal pipeline consistently improves absolute timestamp accuracy (AULTC) and improves temporal concordance across nearly all evaluated models over unimodal text-only reconstruction, without compromising event match rates. Furthermore, our empirical gap analysis reveals that 34.8% of text-derived events are entirely absent from tabular records, demonstrating that aligning these modalities can produce a more temporally faithful and clinically informative reconstruction of patient trajectories than either source alone.
High-stakes courtroom drama of Musk v OpenAI hears closing arguments
OpenAI's CEO, Sam Altman, arrives at the federal courthouse in Oakland, California, on Thursday. OpenAI's CEO, Sam Altman, arrives at the federal courthouse in Oakland, California, on Thursday. Nine-person jury to consider whether AI firm bilked world's richest person and unjustly enriched themselves Closing arguments began on Thursday in Elon Musk's lawsuit against Sam Altman and OpenAI, bringing the weeks-long courtroom battle between the two tech moguls nearer to a decision. A nine-person jury is set to deliberate and return a verdict on whether they believe the AI firm and Altman are liable in the case. The trial, which began last month in an Oakland, California, federal courthouse, has gripped Silicon Valley and featured some of the tech industry's biggest names as witnesses.
Closing arguments begin in Elon Musk's landmark lawsuit against OpenAI
Closing arguments begin in Elon Musk's landmark lawsuit against OpenAI Lawyers for OpenAI and Elon Musk began closing arguments in a landmark trial that could impact the future of the ChatGPT maker. On Thursday, each side presented a concluding statement to jurors, who will decide whether OpenAI and its leaders profited from a venture that was meant to be a "charity". Musk sued OpenAI, its CEO Sam Altman and its president Greg Brockman, alleging that the company strayed from its founding mission to build AI that was safe and beneficial to humanity. Musk was not present for the closing statements on Thursday, as he is currently in China on a diplomatic visit with United States President Donald Trump. His lawyer, Steven Molo, used his final remarks to accuse OpenAI of breaching its charitable trust by enriching investors and insiders at the nonprofit's expense.
Sam Altman Is Taking a Lot of Punches on the Witness Stand
Elon Musk's team seems to have one main goal: make the OpenAI boss look like a liar. Musk's wins so far mainly involve making OpenAI and Altman look ridiculous. Get your news from a source that's not owned and controlled by oligarchs. Can you trust Sam Altman? That was one of the central themes at the high-profile trial between the OpenAI CEO and Elon Musk in California this week, as Musk's lawyers peppered Altman with questions on his work relationships, including his temporary ouster from OpenAI three years ago by a mistrustful board of directors .
OpenAI brings its Codex coding app to mobile
Since debuting last spring, OpenAI's Codex coding app has seen standalone Mac and Windows releases, so it was only a matter of time before OpenAI gave people a way to access their Codex projects on mobile. Starting today, all ChatGPT users, including those using the chatbot through OpenAI's Go and Free tiers, can use the software through the ChatGPT app on Android and iOS. To be clear, you won't be using Codex to program on your phone. Instead, the ChatGPT mobile app is acting here as a intermediary between you and whatever environment you've set it up for your coding projects, whether that be a physical device like a Mac mini or a remote space managed by your company. That might seem limiting, but it does mean your files, credentials and permissions stay secure on the machine where Codex is running.
Trump's Tech Posse in China, Who's Winning in Musk v. Altman, and Hantavirus Conspiracy Theories
Today on, we discuss how Donald Trump's visit to China could influence conversations between world leaders at a moment when the economic and foreign policy stakes couldn't be higher. This week on, the team dives into Trump's selected entourage for his high-stakes visit to China, ranging from Silicon Valley's tech billionaires to director Brett Ratner. We also break down the latest developments in Elon Musk's lawsuit against Sam Altman, alleging that OpenAI abandoned its original nonprofit mission for profit-driven goals, and whether either side is actually gaining an edge in the trial. Plus, Leah shares with us some of the most outlandish conspiracy theories that have been swirling around the hantavirus outbreak. Elon Musk Had'Hair-Raising' Idea of Passing OpenAI On to His Kids, Sam Altman Says Write to us at [email protected] . You can always listen to this week's podcast through the audio player on this page, but if you want to subscribe for free to get every episode, here's how: If you're on an iPhone or iPad, open the app called Podcasts, or just tap this link . The high profile testimonies we've heard this week, including from OpenAI CEO, Sam Altman himself, have resurfaced a lot of past events and a lot of drama, but we're asking will this actually be consequential to the trial's verdict? He's accompanied by a select number of Silicon Valley's top CEOs. We'll discuss how their presence could influence conversations between world leaders at a moment when the economic and foreign policy stakes could not be higher for the US. A lot of them have been recycling very similar conspiracy theories from the Covid-19 pandemic . We're going to tell you what they're sharing and also how to spot this kind of harmful misinformation.
The ChatGPT desktop app for Mac just got hit with a security breach
OpenAI's ChatGPT app for Mac just experienced a security breach involving two employee devices, according to a report by . The company is issuing a software update to users that's rolling out now, but won't arrive for everyone until June 12. The why of it all is a bit convoluted, stemming from a security issue involving open-source code. A widely-used open-source library was compromised and two devices at the company were impacted. Upon identification of the malicious activity, we worked quickly to investigate, contain and take steps to protect our systems, the company wrote in a blog post.