Goto

Collaborating Authors

 Large Language Model


OpenAI Is Asking Contractors to Upload Work From Past Jobs to Evaluate the Performance of AI Agents

WIRED

To prepare AI agents for office work, the company is asking contractors to upload projects from past jobs, leaving it to them to strip out confidential and personally identifiable information. OpenAI is asking third-party contractors to upload real assignments and tasks from their current or previous workplaces so that it can use the data to evaluate the performance of its next-generation AI models, according to records from OpenAI and the training data company Handshake AI obtained by WIRED. The project appears to be part of OpenAI's efforts to establish a human baseline for different tasks that can then be compared with AI models. In September, the company launched a new evaluation process to measure the performance of its AI models against human professionals across a variety of industries. OpenAI says this is a key indicator of its progress towards achieving AGI, or an AI system that outperforms humans at most economically valuable tasks. "We've hired folks across occupations to help collect real-world tasks modeled off those you've done in your full-time jobs, so we can measure how well AI models perform on those tasks," reads one confidential document from OpenAI.


AI's Memorization Crisis

The Atlantic - Technology

Large language models don't "learn"--they copy. And that could change everything for the tech industry. O n Tuesday, researchers at Stanford and Yale revealed something that AI companies would prefer to keep hidden. Four popular large language models--OpenAI's GPT, Anthropic's Claude, Google's Gemini, and xAI's Grok--have stored large portions of some of the books they've been trained on, and can reproduce long excerpts from those books. In fact, when prompted strategically by researchers, Claude delivered the near-complete text of,,, and, in addition to thousands of words from books including and .


The Download: the case for AI slop, and helping CRISPR fulfill its promise

MIT Technology Review

If I were to locate the moment AI slop broke through into popular consciousness, I'd pick the video of rabbits bouncing on a trampoline that went viral last summer. For many savvy internet users, myself included, it was the first time we were fooled by an AI video, and it ended up spawning a wave of almost identical generated clips. My first reaction was that, broadly speaking, all of this sucked. That's become a familiar refrain, in think pieces and at dinner parties. Everything online is slop now--the internet "enshittified," with AI taking much of the blame. But then friends started sharing AI clips in group chats that were compellingly weird, or funny.


CAOS: Conformal Aggregation of One-Shot Predictors

arXiv.org Machine Learning

One-shot prediction enables rapid adaptation of pretrained foundation models to new tasks using only one labeled example, but lacks principled uncertainty quantification. While conformal prediction provides finite-sample coverage guarantees, standard split conformal methods are inefficient in the one-shot setting due to data splitting and reliance on a single predictor. We propose Conformal Aggregation of One-Shot Predictors (CAOS), a conformal framework that adaptively aggregates multiple one-shot predictors and uses a leave-one-out calibration scheme to fully exploit scarce labeled data. Despite violating classical exchangeability assumptions, we prove that CAOS achieves valid marginal coverage using a monotonicity-based argument. Experiments on one-shot facial landmarking and RAFT text classification tasks show that CAOS produces substantially smaller prediction sets than split conformal baselines while maintaining reliable coverage.


AI Devices Are Coming. Will Your Favorite Apps Be Along for the Ride?

WIRED

Will Your Favorite Apps Be Along for the Ride? Tech companies are calling AI the next platform. But some developers are reluctant to let AI agents stand between them and their users. Silicon Valley giants like Amazon, Meta, and OpenAI are racing to develop "operating systems" for AI-powered devices--and 2026 is likely the year these efforts will start to take off. The devices are largely built around a future where AI agents can take actions on a user's behalf, without requiring them to visit an app or website.


The Download: mimicking pregnancy's first moments in a lab, and AI parameters explained

MIT Technology Review

The Download: mimicking pregnancy's first moments in a lab, and AI parameters explained Plus: Google and Character.AI have settled a lawsuit linking their AI to the death of a teenager At first glance, it looks like the start of a human pregnancy: A ball-shaped embryo presses into the lining of the uterus then grips tight, burrowing in as the first tendrils of a future placenta appear. This is implantation--the moment that pregnancy officially begins. Only none of it is happening inside a body. These images were captured in a Beijing laboratory, inside a microfluidic chip, as scientists watched the scene unfold. In three recent papers published by Cell Press, scientists report what they call the most accurate efforts yet to mimic the first moments of pregnancy in the lab. They've taken human embryos from IVF centers and let these merge with "organoids" made of endometrial cells, which form the lining of the uterus.


Google Is Adding an 'AI Inbox' to Gmail That Summarizes Emails

WIRED

Google Is Adding an'AI Inbox' to Gmail That Summarizes Emails New Gmail features, powered by the Gemini model, are part of Google's continued push for users to incorporate AI into their daily life and conversations. Google is putting even more generative AI tools into Gmail as part of its goal to further personalize user inboxes and streamline searches. On Thursday, the company announced a new "AI Inbox" tab, currently in a beta testing phase, that reads every message in a user's Gmail and suggests a list of to-dos and key topics, based on what it summarizes . In Google's example of what this AI Inbox could look like in Gmail, the new tab takes context from a user's messages and suggests they reschedule their dentist appointment, reply to a request from their child's sports coach, and pay an upcoming fee before the deadline. Also under the AI Inbox tab is a list of important topics worth browsing, nestled beneath the action items at the top.


Musk lawsuit over OpenAI for-profit conversion can go to trial, US judge says

The Guardian

Elon Musk, who co-founded OpenAI, is suing the ChatGPT developer and its CEO, Sam Altman, left, over claims its leaders violated founding nonprofit mission. Elon Musk, who co-founded OpenAI, is suing the ChatGPT developer and its CEO, Sam Altman, left, over claims its leaders violated founding nonprofit mission. Judge says there is plenty of evidence to suggest OpenAI's leaders made assurances nonprofit structure would be kept Elon Musk's lawsuit against OpenAI is to go to trial after a US judge said there is plenty of evidence to support the billionaire's case. The world's richest man, who co-founded OpenAI, is suing the ChatGPT developer and its chief executive, Sam Altman, over claims its leaders violated the organisation's founding mission by shifting to a for-profit model. The US district judge Yvonne Gonzalez Rogers in Oakland, California, told a hearing there was plenty of evidence that suggested OpenAI's leaders made assurances that its original nonprofit structure was going to be maintained.


The Daring Attempt to End the Memory Shortage Crisis

WIRED

The supply shortage of the RAM needed to build phones and PCs isn't going away. But a few companies have a plan to solve it. A supply shortage is the last thing tech companies want to talk about at CES . The annual trade show is their chance to promote new products and drum up excitement for what's coming, not discuss the one thing that could make selling new products in 2026 an uphill battle. But I've read the reports.


A path to natural language through tokenisation and transformers

arXiv.org Machine Learning

Natural languages exhibit striking regularities in their statistical structure, including notably the emergence of Zipf's and Heaps' laws. Despite this, it remains broadly unclear how these properties relate to the modern tokenisation schemes used in contemporary transformer models. In this note, we analyse the information content (as measured by the Shannon entropy) of various corpora under the assumption of a Zipfian frequency distribution, and derive a closed-form expression for the slot entropy expectation value. We then empirically investigate how byte--pair encoding (BPE) transforms corpus statistics, showing that recursive applications of BPE drive token frequencies toward a Zipfian power law while inducing a characteristic growth pattern in empirical entropy. Utilizing the ability of transformers to learn context dependent token probability distributions, we train language models on corpora tokenised at varying BPE depths, revealing that the model predictive entropies increasingly agree with Zipf-derived predictions as the BPE depth increases. Attention-based diagnostics further indicate that deeper tokenisation reduces local token dependencies, bringing the empirical distribution closer to the weakly dependent (near IID) regime. Together, these results clarify how BPE acts not only as a compression mechanism but also as a statistical transform that reconstructs key informational properties of natural language.