Large Language Model
Representation Without Reward: A JEPA Audit for LLM Fine-Tuning
Joint-embedding predictive architectures (JEPAs) propose that a model should learn more useful abstractions when trained to predict latent representations rather than observed outputs. For autoregressive language-model fine-tuning the principle entails a stricter requirement: the induced hidden-state geometry must reach the language-model head \emph{and} improve the decoded task metric. We test that requirement under a fixed Llama-3.2-1B-Instruct LoRA harness on natural-language-to-regex generation, comparing twenty-two training-time auxiliaries across trajectory-shape regularisation, distributional constraints, predictor/target asymmetry, Fisher-metric Jacobi residuals, and a decoder-visible JEPA objective constructed to lie in cross-entropy's positive cone. The empirical answer is a structured null: several auxiliaries clear single-cell paired $ฮฑ= 0.10$ without correction (T3-Local at $ฮ= +2.53$~pp, $p = 0.003$ being the strongest), but none survives Bonferroni or Holm--Bonferroni at the relevant family-wise threshold, even though many change curvature, anisotropy, variance, and gradient direction. Decoder-visible JEPA yields the first positive auxiliary--cross-entropy gradient cosine in the study, yet exact match remains inside seed noise; a full-fine-tuning replication of the same auxiliary at $n = 5$ seeds reproduces the null on both benchmarks (TURK: $ฮ= +0.04$~pp, $p_{\text{paired}} = 0.96$; SYNTH: $ฮ= +0.52$~pp, $p_{\text{paired}} = 0.28$), so the null is robust across LoRA and full fine-tuning for the decoder-visible construction. Hidden-state representation work and decoded-task accuracy are therefore weakly coupled in this regime; we accordingly reframe LLM-domain JEPA evaluation as a coupling problem, in which the operative question is under which metrics useful hidden geometry becomes decoder-visible task signal.
$ฯ$-Balancing for Mixture-of-Experts Training
Chen, Lizhang, Li, Jonathan, Wang, Qi, Liao, Runlong, Li, Shuozhe, Liang, Chen, Lao, Ni, Liu, Qiang
Mixture-of-Experts (MoE) models rely on balanced expert utilization to fully realize their scalability. However, existing load-balancing methods are largely heuristic and operate on noisy mini-batch assignment statistics, introducing bias relative to population-level objectives. We propose $ฯ$-balancing, a principled framework that directly targets population-level expert balance by minimizing a strictly convex, symmetric, and differentiable potential of the expected routing distribution. Using convex duality, we derive an equivalent min-max formulation and obtain a simple online algorithm via mirror descent, yielding an efficient EMA-based routing adjustment with negligible overhead. Across large-scale pretraining and downstream fine-tuning, $ฯ$-balancing consistently outperforms prior Switch-style and loss-free baselines, demonstrating more stable and effective expert utilization.
Reasoning Models Don't Just Think Longer, They Move Differently
Gjรธlbye, Anders, Hansen, Lars Kai, Koyejo, Sanmi
Reasoning-trained language models often spend more tokens on harder problems, but longer chains of thought do not show whether a model is merely computing for more steps or following a different internal trajectory. We study this distinction through hidden-state trajectories during chain-of-thought generation across competitive programming, mathematics, and Boolean satisfiability. Raw trajectory geometry is strongly shaped by generation length: longer generations mechanically alter path statistics, so difficulty-dependent comparisons are misleading without adjustment. After residualizing trajectory statistics on length, difficulty remains systematically coupled to corrected trajectory geometry across all domains studied. The clearest reasoning-specific separation appears in the code domain, where harder problems show more direct corrected trajectories and less heterogeneous local curvature in reasoning-trained models than in matched instruction-tuned baselines. Corrected difficulty-geometry coupling is weaker, but still present, in mathematics and Boolean satisfiability. Prompt-stage linear probes do not mirror the code-domain separation, and behavioral annotations show that stronger corrected coupling co-occurs with strategy shifts and uncertainty monitoring. Together, these findings establish length correction as a prerequisite for generation-time trajectory analysis and show that reasoning training can be associated with distinct corrected trajectory geometry, with the strength of the effect depending on the domain.
OpenAI is offering ChatGPT Plus to citizens of Malta for a year
OpenAI has signed deals with fintech startups, tech giants and even Disney, but it's breaking new ground by announcing a world's first partnership with the country of Malta. In a post on its website, OpenAI said that it would provide ChatGPT Plus for one year to every Maltese resident or citizen. Malta is the first country to launch a partnership of this scale because we refuse to let our citizens stay behind in the digital age, Silvio Schembri, Malta's minister for Economy, Enterprise and Strategic Projects, said in a statement. We are putting our people at the very forefront of global change. For the approximately 574,250 residents living in Malta, they'll have to complete a course developed by the University of Malta before launching the ChatGPT Plus subscription, which costs $20 a month in the US.
What we learned from the cringey courtroom drama between Elon Musk and Sam Altman
Both Musk and Altman took the stand for hours, facing combative cross-examinations that painted them each as untrustworthy. Both Musk and Altman took the stand for hours, facing combative cross-examinations that painted them each as untrustworthy. Two of the world's richest people faced an airing of their dirty laundry amid their messy, bitter feud over OpenAI A nine-person jury is set to decide whether Elon Musk's allegations of "stealing a charity" against Sam Altman and OpenAI are legitimate, with deliberations to begin in earnest on Monday. Whatever its outcome, the case has been an illuminating, at times exhausting, look behind the scenes at the history of OpenAI and how some of the most powerful figures in the tech industry operate. Attorneys for both sides have introduced reams of private text messages, emails and even diary entries to support their arguments.
Cybercriminal Twins Caught After They Forgot to Turn Off Microsoft Teams Recording
Plus: Instructure's Canvas ransomware debacle comes to a close, an alleged dark net market kingpin gets arrested, OpenAI workers fall victim to a supply chain attack, and more. The worst part of your iPhone getting stolen may not be the theft itself. Instead, it's the phishing attacks waged against people in your contacts. New research this week shows that there's a thriving ecosystem for tools that let criminals unlock iPhones and target the phone numbers they find inside. Foxconn, the electronics manufacturing giant known for its role in building iPhones, revealed this week that it recently "suffered a cyberattack."
Musk v. Altman week 3: Elon Musk and Sam Altman traded blows over each other's credibility. Now the jury will pick a side.
Musk v. Altman week 3: Elon Musk and Sam Altman traded blows over each other's credibility. Now the jury will pick a side. The trial spilled plenty of dirt--and raised more questions than answers about how the AI giant should be governed. In the final week of the trial, lawyers traded blows over Elon Musk's and OpenAI CEO Sam Altman's credibility. Altman was grilled on his alleged history of lying and self-dealing involving companies that do business with OpenAI. But he fired back, painting Musk as a power-seeker who wanted to control the development of artificial general intelligence (AGI)--powerful AI that can compete with humans on most cognitive tasks.
ChatGPT will offer personalized financial advice (if you connect your bank account)
OpenAI is rolling out a preview of a new personal finance feature inside of ChatGPT. Starting today, Pro users in the US can connect their financial accounts to ChatGPT in order to get more personalized advice from the chatbot. To hear OpenAI tell it, every month more than 200 million users already turn to ChatGPT for guidance on managing their money. By building a framework that allows those people to connect their accounts to its servers, ChatGPT can go from offering generic advice to helping those same users take actions that more directly improve their lives. The integration is made possible through a partnership OpenAI has signed with Plaid, which offers connections to more than 12,000 financial institutions, including banks like Citi and Chase, in addition to services like Affirm and Robinhood.
The Download: China's AI drama factory and the WHO's missing health targets
Plus: as their trial goes to the jury, Musk and Altman face lying accusations. China's short drama industry is fueled by bite-sized, melodramatic, and smutty shows built for smartphone scrolling. Now, many are being made entirely with AI: no actors, camera operators, cinematographers, or CGI specialists required. An average of 470 AI-generated short dramas were released every day in January. Production timelines have shrunk from months to weeks, while costs have dropped by up to 90%. Storytelling is also increasingly driven by performance data.