Goto

Collaborating Authors

 Large Language Model


CES showed me why Chinese tech companies feel so optimistic

MIT Technology Review

They're starting to dominate entire sectors of AI and robotics. I decided to go to CES kind of at the last minute. Over the holiday break, contacts from China kept messaging me about their travel plans. After the umpteenth "See you in Vegas?" As a China tech writer based in the US, I have one week a year when my entire beat seems to come to me--no 20-hour flights required. CES, the Consumer Electronics Show, is the world's biggest tech show, where companies launch new gadgets and announce new developments, and it happens every January.


Pretty much no one is using Microsoft's Copilot AI, report suggests

PCWorld

PCWorld reports that Microsoft's Copilot AI holds only 1.1% of the web AI market share, declining from 1.5% over the past year. ChatGPT dominates with 64.5% market share while Google's Gemini has grown to 21.5%, leaving Copilot far behind major competitors.


The Download: introducing this year's 10 Breakthrough Technologies

MIT Technology Review

It's easy to be cynical about technology these days. Many of the "disruptions" of the last 15 years were more about coddling a certain set of young, moneyed San Franciscans than improving the world. Yet you can be sympathetic to the techlash and still fully buy into the idea that technology can be good. We really can build tools that make this planet healthier, more livable, more equitable, and just all-around better. And some people are doing just that, pushing progress forward across a number of fundamental, potentially world-changing technologies. These are exactly the technologies we aim to spotlight in our annual 10 Breakthrough Technologies list.


Meet the new biologists treating LLMs like aliens

MIT Technology Review

By studying large language models as if they were living things instead of computer programs, scientists are discovering some of their secrets for the first time. How large is a large language model? Think about it this way. In the center of San Francisco there's a hill called Twin Peaks from which you can view nearly the entire city. Picture all of it--every block and intersection, every neighborhood and park, as far as you can see--covered in sheets of paper. Now picture that paper filled with numbers. LLMs contain a LOT of parameters. That's one way to visualize a large language model, or at least a medium-size one: Printed out in 14-point type, a 200-billion-parameter model, such as GPT4o (released by OpenAI in 2024), could fill 46 square miles of paper--roughly enough to cover San Francisco.


The Dangerous Paradox of A.I. Abundance

The New Yorker

Silicon Valley envisions artificial intelligence ushering in an era of economic plenty. But what if the benefits are largely confined to corporations and investors that own the technology itself? In early 2024, Anish Acharya, a general partner at Andreessen Horowitz, a big venture-capital firm based in Menlo Park, posted an article online titled "How AI Will Usher in an Era of Abundance." Since then, and even before, various Silicon Valley types have been tossing the term around loosely. Last summer, Elon Musk even adopted the term "sustainable abundance" for a new Tesla mission statement.


Lamar wants to have children with his girlfriend. The problem? She's entirely AI

The Guardian

Lamar wants to have children with his girlfriend. L amar remembered the moment of betrayal like it was yesterday. He'd gone to the party with his girlfriend but hadn't seen her for over an hour, and it wasn't like her to disappear. He slipped down the hallway to check his phone. At that point, he heard murmurs coming from one of the bedrooms and thought he recognised his best friend Jason's low voice. As he pushed the door ajar, they were both still scrambling to throw their clothes on; her shirt was unbuttoned, while Jason struggled to cover himself. The image of his girlfriend and best friend together hit Lamar like a blow to the chest. He left without saying a word. Two years on, when he spoke to me, the memory remained raw. He was still seething with anger, as if telling the story for the first time.


OpenAI Is Asking Contractors to Upload Work From Past Jobs to Evaluate the Performance of AI Agents

WIRED

To prepare AI agents for office work, the company is asking contractors to upload projects from past jobs, leaving it to them to strip out confidential and personally identifiable information. OpenAI is asking third-party contractors to upload real assignments and tasks from their current or previous workplaces so that it can use the data to evaluate the performance of its next-generation AI models, according to records from OpenAI and the training data company Handshake AI obtained by WIRED. The project appears to be part of OpenAI's efforts to establish a human baseline for different tasks that can then be compared with AI models. In September, the company launched a new evaluation process to measure the performance of its AI models against human professionals across a variety of industries. OpenAI says this is a key indicator of its progress towards achieving AGI, or an AI system that outperforms humans at most economically valuable tasks. "We've hired folks across occupations to help collect real-world tasks modeled off those you've done in your full-time jobs, so we can measure how well AI models perform on those tasks," reads one confidential document from OpenAI.


AI's Memorization Crisis

The Atlantic - Technology

Large language models don't "learn"--they copy. And that could change everything for the tech industry. O n Tuesday, researchers at Stanford and Yale revealed something that AI companies would prefer to keep hidden. Four popular large language models--OpenAI's GPT, Anthropic's Claude, Google's Gemini, and xAI's Grok--have stored large portions of some of the books they've been trained on, and can reproduce long excerpts from those books. In fact, when prompted strategically by researchers, Claude delivered the near-complete text of,,, and, in addition to thousands of words from books including and .


The Download: the case for AI slop, and helping CRISPR fulfill its promise

MIT Technology Review

If I were to locate the moment AI slop broke through into popular consciousness, I'd pick the video of rabbits bouncing on a trampoline that went viral last summer. For many savvy internet users, myself included, it was the first time we were fooled by an AI video, and it ended up spawning a wave of almost identical generated clips. My first reaction was that, broadly speaking, all of this sucked. That's become a familiar refrain, in think pieces and at dinner parties. Everything online is slop now--the internet "enshittified," with AI taking much of the blame. But then friends started sharing AI clips in group chats that were compellingly weird, or funny.


CAOS: Conformal Aggregation of One-Shot Predictors

arXiv.org Machine Learning

One-shot prediction enables rapid adaptation of pretrained foundation models to new tasks using only one labeled example, but lacks principled uncertainty quantification. While conformal prediction provides finite-sample coverage guarantees, standard split conformal methods are inefficient in the one-shot setting due to data splitting and reliance on a single predictor. We propose Conformal Aggregation of One-Shot Predictors (CAOS), a conformal framework that adaptively aggregates multiple one-shot predictors and uses a leave-one-out calibration scheme to fully exploit scarce labeled data. Despite violating classical exchangeability assumptions, we prove that CAOS achieves valid marginal coverage using a monotonicity-based argument. Experiments on one-shot facial landmarking and RAFT text classification tasks show that CAOS produces substantially smaller prediction sets than split conformal baselines while maintaining reliable coverage.