Large Language Model
Reasoning Models Don't Just Think Longer, They Move Differently
Gjølbye, Anders, Hansen, Lars Kai, Koyejo, Sanmi
Reasoning-trained language models often spend more tokens on harder problems, but longer chains of thought do not show whether a model is merely computing for more steps or following a different internal trajectory. We study this distinction through hidden-state trajectories during chain-of-thought generation across competitive programming, mathematics, and Boolean satisfiability. Raw trajectory geometry is strongly shaped by generation length: longer generations mechanically alter path statistics, so difficulty-dependent comparisons are misleading without adjustment. After residualizing trajectory statistics on length, difficulty remains systematically coupled to corrected trajectory geometry across all domains studied. The clearest reasoning-specific separation appears in the code domain, where harder problems show more direct corrected trajectories and less heterogeneous local curvature in reasoning-trained models than in matched instruction-tuned baselines. Corrected difficulty-geometry coupling is weaker, but still present, in mathematics and Boolean satisfiability. Prompt-stage linear probes do not mirror the code-domain separation, and behavioral annotations show that stronger corrected coupling co-occurs with strategy shifts and uncertainty monitoring. Together, these findings establish length correction as a prerequisite for generation-time trajectory analysis and show that reasoning training can be associated with distinct corrected trajectory geometry, with the strength of the effect depending on the domain.
OpenAI is offering ChatGPT Plus to citizens of Malta for a year
OpenAI has signed deals with fintech startups, tech giants and even Disney, but it's breaking new ground by announcing a world's first partnership with the country of Malta. In a post on its website, OpenAI said that it would provide ChatGPT Plus for one year to every Maltese resident or citizen. Malta is the first country to launch a partnership of this scale because we refuse to let our citizens stay behind in the digital age, Silvio Schembri, Malta's minister for Economy, Enterprise and Strategic Projects, said in a statement. We are putting our people at the very forefront of global change. For the approximately 574,250 residents living in Malta, they'll have to complete a course developed by the University of Malta before launching the ChatGPT Plus subscription, which costs $20 a month in the US.
What we learned from the cringey courtroom drama between Elon Musk and Sam Altman
Both Musk and Altman took the stand for hours, facing combative cross-examinations that painted them each as untrustworthy. Both Musk and Altman took the stand for hours, facing combative cross-examinations that painted them each as untrustworthy. Two of the world's richest people faced an airing of their dirty laundry amid their messy, bitter feud over OpenAI A nine-person jury is set to decide whether Elon Musk's allegations of "stealing a charity" against Sam Altman and OpenAI are legitimate, with deliberations to begin in earnest on Monday. Whatever its outcome, the case has been an illuminating, at times exhausting, look behind the scenes at the history of OpenAI and how some of the most powerful figures in the tech industry operate. Attorneys for both sides have introduced reams of private text messages, emails and even diary entries to support their arguments.
Cybercriminal Twins Caught After They Forgot to Turn Off Microsoft Teams Recording
Plus: Instructure's Canvas ransomware debacle comes to a close, an alleged dark net market kingpin gets arrested, OpenAI workers fall victim to a supply chain attack, and more. The worst part of your iPhone getting stolen may not be the theft itself. Instead, it's the phishing attacks waged against people in your contacts. New research this week shows that there's a thriving ecosystem for tools that let criminals unlock iPhones and target the phone numbers they find inside. Foxconn, the electronics manufacturing giant known for its role in building iPhones, revealed this week that it recently "suffered a cyberattack."
Musk v. Altman week 3: Elon Musk and Sam Altman traded blows over each other's credibility. Now the jury will pick a side.
Musk v. Altman week 3: Elon Musk and Sam Altman traded blows over each other's credibility. Now the jury will pick a side. The trial spilled plenty of dirt--and raised more questions than answers about how the AI giant should be governed. In the final week of the trial, lawyers traded blows over Elon Musk's and OpenAI CEO Sam Altman's credibility. Altman was grilled on his alleged history of lying and self-dealing involving companies that do business with OpenAI. But he fired back, painting Musk as a power-seeker who wanted to control the development of artificial general intelligence (AGI)--powerful AI that can compete with humans on most cognitive tasks.
ChatGPT will offer personalized financial advice (if you connect your bank account)
OpenAI is rolling out a preview of a new personal finance feature inside of ChatGPT. Starting today, Pro users in the US can connect their financial accounts to ChatGPT in order to get more personalized advice from the chatbot. To hear OpenAI tell it, every month more than 200 million users already turn to ChatGPT for guidance on managing their money. By building a framework that allows those people to connect their accounts to its servers, ChatGPT can go from offering generic advice to helping those same users take actions that more directly improve their lives. The integration is made possible through a partnership OpenAI has signed with Plaid, which offers connections to more than 12,000 financial institutions, including banks like Citi and Chase, in addition to services like Affirm and Robinhood.
The Download: China's AI drama factory and the WHO's missing health targets
Plus: as their trial goes to the jury, Musk and Altman face lying accusations. China's short drama industry is fueled by bite-sized, melodramatic, and smutty shows built for smartphone scrolling. Now, many are being made entirely with AI: no actors, camera operators, cinematographers, or CGI specialists required. An average of 470 AI-generated short dramas were released every day in January. Production timelines have shrunk from months to weeks, while costs have dropped by up to 90%. Storytelling is also increasingly driven by performance data.
AI is still waiting for its VisiCalc moment
PCWorld explores how AI still lacks a transformative "killer app" like VisiCalc was for early personal computers, despite recent advances like Anthropic's Claude for Small Business. While new AI tools integrate with platforms like QuickBooks and PayPal for business tasks, public skepticism remains high due to reliability concerns and unpredictable AI behavior. The industry continues searching for universally valuable AI applications beyond specialized uses, as current solutions haven't achieved the widespread adoption that would make AI truly indispensable. The arrival of Claude for Small Business earlier this week marked an interesting moment-and a savvy strategic move-for Anthropic. Rather than saddling web browsers with more AI slop or trying to slather AI onto perfectly good user interfaces that don't need improving, Anthropic is attempting something both less flashy and potentially more fruitful: finding a practical, agentic AI-powered application for everyday business owners looking to make ends meet. The bag of tricks included in Claude for Small Business is somewhat predictable, running the gamut from "ready-to-run" agentic workflows to connectors for PayPal, QuickBooks, HubSpot, Canva, DocuSign, and more. With these tools, business owners can use Claude to help to plan their payrolls, reconcile their books, analyze their cash flow, spin up promotional campaigns, and so forth.
Security researchers, aided by Anthropic's Mythos, claim to have breached macOS
Security researchers, aided by Anthropic's Mythos, claim to have breached macOS Security researchers, aided by Anthropic's Mythos, claim to have breached macOS Apple's operating systems are known for their security, especially compared to their rivals in mobile and computing. Now, security researchers from a Palo Alto-based company called Calif claim they were able to breach macOS after designing a privilege escalation exploit with help from Anthropic's Claude Mythos Preview . As The Wall Street Journal reports, the exploit could be used to access parts of the MacBook that should be inaccessible and, thus, allows the attacker to take control of a Mac computer. The researchers worked with Mythos to identify the vulnerabilities and to help them with the exploit's development. Mythos Preview was able to identify the bugs quickly, because they belonged to known classes.
xAI introduces its coding agent called Grok Build
It's called Grok Build, and it's still in its early beta version that's initially only available to SuperGrok Heavy subscribers paying $300 per month for the service. It says it will take user feedback from the early beta release to improve the product. SuperGrok Heavy users can install the beta from xAI's website and then log into their account to be able to access it. As Bloomberg notes, xAI has been trying to catch up to its rival companies like Anthropic and OpenAI. Elon Musk, the company's founder and CEO, previously admitted that it has fallen behind its competitors when it comes to coding.