metr
This is the most misunderstood graph in AI
To some, METR's "time horizon plot" indicates that AI utopia--or apocalypse--is close at hand. The truth is more complicated. Every time OpenAI, Google, or Anthropic drops a new frontier large language model, the AI community holds its breath. It doesn't exhale until METR, an AI research nonprofit whose name stands for "Model Evaluation & Threat Research," updates a now-iconic graph that has played a major role in the AI discourse since it was first released in March of last year. The graph suggests that certain AI capabilities are developing at an exponential rate, and more recent model releases have outperformed that already impressive trend. That was certainly the case for Claude Opus 4.5, the latest version of Anthropic's most powerful model, which was released in late November.
- North America > United States > Massachusetts (0.04)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- Asia > China (0.04)
Five ways that AI is learning to improve itself
By the same token, Clune says, automating AI research and development could have enormous upsides. On our own, we humans might not be able to think up the innovations and improvements that will allow AI to one day tackle prodigious problems like cancer and climate change. For now, human ingenuity is still the primary engine of AI advancement; otherwise, Meta would hardly have made such exorbitant offers to attract researchers to its superintelligence lab. But AI is already contributing to its own development, and it's set to take even more of a role in the years to come. Here are five ways that AI is making itself better.
In the Loop: AI Promised Faster Coding. This Study Disagrees
Conventional wisdom states that this has accelerated software engineering significantly. A new study by METR, published last week, set out to measure the degree to which AI speeds up the work of experienced software developers. The results were very unexpected. What the study found -- METR measured the speed of 16 developers working on complex software projects, both with and without AI assistance. After finishing their tasks, the developers estimated that access to AI had accelerated their work by 20% on average.
METR: Image Watermarking with Large Number of Unique Messages
Varlamov, Alexander, Diatlova, Daria, Spirin, Egor
Improvements in diffusion models have boosted the quality of image generation, which has led researchers, companies, and creators to focus on improving watermarking algorithms. This provision would make it possible to clearly identify the creators of generative art. The main challenges that modern watermarking algorithms face have to do with their ability to withstand attacks and encrypt many unique messages, such as user IDs. In this paper, we present METR: Message Enhanced Tree-Ring, which is an approach that aims to address these challenges. METR is built on the Tree-Ring watermarking algorithm, a technique that makes it possible to encode multiple distinct messages without compromising attack resilience or image quality. This ensures the suitability of this watermarking algorithm for any Diffusion Model. In order to surpass the limitations on the quantity of encoded messages, we propose METR++, an enhanced version of METR. This approach, while limited to the Latent Diffusion Model architecture, is designed to inject a virtually unlimited number of unique messages. We demonstrate its robustness to attacks and ability to encrypt many unique messages while preserving image quality, which makes METR and METR++ hold great potential for practical applications in real-world settings. Our code is available at https://github.com/deepvk/metr
Nobody Knows How to Safety-Test AI
Beth Barnes and three of her colleagues sit cross-legged in a semicircle on a damp lawn on the campus of the University of California, Berkeley. They are describing their attempts to interrogate artificial intelligence chatbots. "They are, in some sense, these vast alien intelligences," says Barnes, 26, who is the founder and CEO of Model Evaluation and Threat Research (METR), an AI-safety nonprofit. "They know so much about whether the next word is going to be'is' versus'was.' We're just playing with a tiny bit on the surface, and there's all this, miles and miles underneath," she says, gesturing at the potentially immense depths of large language models' capabilities. Researchers at METR look a lot like Berkeley students--the four on the lawn are in their twenties and dressed in jeans or sweatpants.
- North America > United States > California > Alameda County > Berkeley (0.24)
- Asia > China (0.04)