Goto

Collaborating Authors

 reasoning model


OpenAI is throwing everything into building a fully automated researcher

MIT Technology Review

OpenAI is refocusing its research efforts and throwing its resources into a new grand challenge. The San Francisco firm has set its sights on building what it calls an AI researcher, a fully automated agent-based system that will be able to go off and tackle large, complex problems by itself. OpenAI says that this new research goal will be its "North Star" for the next few years, pulling together multiple research strands, including work on reasoning models, agents, and interpretability .



What's next for Chinese open-source AI

MIT Technology Review

Chinese open models are spreading fast, from Hugging Face to Silicon Valley. In this photo illustration, the DeepSeek apps is seen on a phone in front of a flag of China on January 28, 2025 in Hong Kong, China. The past year has marked a turning point for Chinese AI. Since DeepSeek released its R1 reasoning model in January 2025, Chinese companies have repeatedly delivered AI models that match the performance of leading Western models at a fraction of the cost. Just last week the Chinese firm Moonshot AI released its latest open-weight model, Kimi K2.5, which came close to top proprietary systems such as Anthropic's Claude Opus on some early benchmarks. The difference: K2.5 is roughly one-seventh Opus's price.


Meet the new biologists treating LLMs like aliens

MIT Technology Review

By studying large language models as if they were living things instead of computer programs, scientists are discovering some of their secrets for the first time. How large is a large language model? Think about it this way. In the center of San Francisco there's a hill called Twin Peaks from which you can view nearly the entire city. Picture all of it--every block and intersection, every neighborhood and park, as far as you can see--covered in sheets of paper. Now picture that paper filled with numbers. LLMs contain a LOT of parameters. That's one way to visualize a large language model, or at least a medium-size one: Printed out in 14-point type, a 200-billion-parameter model, such as GPT4o (released by OpenAI in 2024), could fill 46 square miles of paper--roughly enough to cover San Francisco.


What's next for AI in 2026

MIT Technology Review

Our AI writers make their big bets for the coming year--here are five hot trends to watch. In an industry in constant flux, sticking your neck out to predict what's coming next may seem reckless. But for the last few years we've done just that--and we're doing it again. How did we do last time? Here are our big bets for the next 12 months. The last year shaped up as a big one for Chinese open-source models.


AI Wrapped: The 14 AI terms you couldn't avoid in 2025

MIT Technology Review

AI Wrapped: The 14 AI terms you couldn't avoid in 2025 From "superintelligence" to "slop," here are the words and phrases that defined another year of AI craziness. If the past 12 months have taught us anything, it's that the AI hype train is showing no signs of slowing. It's hard to believe that at the beginning of the year, DeepSeek had yet to turn the entire industry on its head, Meta was better known for trying (and failing) to make the metaverse cool than for its relentless quest to dominate superintelligence, and vibe coding wasn't a thing. If that's left you feeling a little confused, fear not. As we near the end of 2025, our writers have taken a look back over the AI terms that dominated the year, for better or worse. Make sure you take the time to brace yourself for what promises to be another bonkers year.


Five AI Developments That Changed Everything This Year

TIME - Tech

President Donald Trump speaks in the Roosevelt Room flanked by Masayoshi Son, Larry Ellison, and Sam Altman at the White House on January 21, 2025. President Donald Trump speaks in the Roosevelt Room flanked by Masayoshi Son, Larry Ellison, and Sam Altman at the White House on January 21, 2025. In case you missed it, 2025 was a big year for AI. It became an economic force, propping up the stock market, and a geopolitical pawn, redrawing the frontlines of Great Power competition. It had both global and deeply personal effects, changing the ways that we think, write, and relate.


OpenAI Rolls Back ChatGPT's Model Router System for Most Users

WIRED

As OpenAI scrambles to improve ChatGPT, it's ditching a feature in its free tier that contributed to last summer's user revolt. OpenAI has quietly reversed a major change to how hundreds of millions of people use ChatGPT . On a low-profile blog that tracks product changes, the company said that it rolled back ChatGPT's model router--an automated system that sends complicated user questions to more advanced "reasoning" models--for users on its Free and $5-a-month Go tiers. Instead, those users will now default to GPT-5.2 Instant, the fastest and cheapest-to-serve version of OpenAI's new model series. Free and Go users will still be able to access reasoning models, but they will have to select them manually.


The great AI hype correction of 2025

MIT Technology Review

Four ways to think about this year's reckoning When OpenAI released a free web app called ChatGPT in late 2022, it changed the course of an entire industry--and several world economies. Millions of people started talking to their computers, and their computers started talking back. We were enchanted, and we expected more. Technology companies scrambled to stay ahead, putting out rival products that outdid one another with each new release: voice, images, video. With nonstop one-upmanship, AI companies have presented each new product drop as a major breakthrough, reinforcing a widespread faith that this technology would just keep getting better. Boosters told us that progress was exponential.


Benchmarking World-Model Learning

Warrier, Archana, Nguyen, Dat, Naim, Michelangelo, Jain, Moksh, Liang, Yichao, Schroeder, Karen, Yang, Cambridge, Tenenbaum, Joshua B., Vollmer, Sebastian, Ellis, Kevin, Tavares, Zenna

arXiv.org Artificial Intelligence

Model-learning agents should gather information to learn world models that support many downstream tasks and inferences, such as predicting unobserved states, estimating near- and far-term consequences of actions, planning action sequences, and detecting changes in dynamics. Current methods for learning and evaluating world models diverge from this goal: training and evaluation are anchored to next-frame prediction, and success is scored by reward maximization in the same environment. We propose WorldTest, a protocol to evaluate model-learning agents that separates reward-free interaction from a scored test phase in a different but related environment. WorldTest is open-ended $\unicode{x2014}$ models should support many different tasks unknown ahead of time $\unicode{x2014}$ and agnostic to model representation, allowing comparison across approaches. We instantiated WorldTest with AutumnBench, a suite of 43 interactive grid-world environments and 129 tasks across three families: masked-frame prediction, planning, and predicting changes to the causal dynamics. We compared 517 human participants and three frontier models on AutumnBench. We found that humans outperform the models, and scaling compute improves performance only in some environments but not others. WorldTest provides a novel template $\unicode{x2014}$ reward-free exploration, derived tests, and behavior-based scoring $\unicode{x2014}$ to evaluate what agents learn about environment dynamics, and AutumnBench exposes significant headroom in world-model learning.