Large Language Model
What even is the AI bubble?
What even is the AI bubble? Everyone in tech agrees we're in a bubble. They just can't agree on what it looks like -- or what happens when it pops. In July, a widely cited MIT study claimed that 95% of organizations that invested in generative AI were getting "zero return." While the study itself was more nuanced than the headlines, for many it still felt like the first hard data point confirming what skeptics had muttered for months: Hype around AI might be outpacing reality. Then, in August, OpenAI CEO Sam Altman said what everyone in Silicon Valley had been whispering.
The AI doomers feel undeterred
But they certainly wish people were still taking their warnings really seriously. It's a weird time to be an AI doomer. This small but influential community of researchers, scientists, and policy experts believes, in the simplest terms, that AI could get so good it could be bad--very, very bad--for humanity. Though many of these people would be more likely to describe themselves as advocates for AI safety than as literal doomsayers, they warn that AI poses an existential risk to humanity. They argue that absent more regulation, the industry could hurtle toward systems it can't control. They commonly expect such systems to follow the creation of artificial general intelligence (AGI), a slippery concept generally understood as technology that can do whatever humans can do, and better. Though this is far from a universally shared perspective in the AI field, the doomer crowd has had some notable success over the past several years: helping shape AI policy coming from the Biden administration, organizing prominent calls for international "red lines " to prevent AI risks, and getting a bigger (and more influential) megaphone as some of its adherents win science's most prestigious awards. But a number of developments over the past six months have put them on the back foot.
The ultimate prompt engineering AI: side-by-side results and unlimited credits
When you purchase through links in our articles, we may earn a small commission. Save 87% on a ChatPlayground AI lifetime subscription that includes access to every top model and unlimited credits. If you've been experimenting with AI tools, you already know the pain: every platform has its strengths, its weak spots, and its own subscription. ChatPlayground AI solves that by putting GPT-4o, Claude Sonnet 4, Gemini 1.5 Flash, and more into one clean interface. Then, it lets you run the against all of them at once.
Causal Judge Evaluation: Calibrated Surrogate Metrics for LLM Systems
LLM-as-judge evaluation has become the de facto standard for scaling model assessment, but the practice is statistically unsound: uncalibrated scores can invert preferences, naive confidence intervals on uncalibrated scores achieve near-0% coverage, and importance-weighted estimators collapse under limited overlap despite high effective sample size (ESS). We introduce Causal Judge Evaluation (CJE), a framework that fixes all three failures. On n=4,961 Chatbot Arena prompts (after filtering from 5k), CJE achieves 99% pairwise ranking accuracy at full sample size (94% averaged across configurations), matching oracle quality, at 14x lower cost (for ranking 5 policies) by calibrating a 16x cheaper judge on just 5% oracle labels (~250 labels). CJE combines three components: (i) AutoCal-R, reward calibration via mean-preserving isotonic regression; (ii) SIMCal-W, weight stabilization via stacking of S-monotone candidates; and (iii) Oracle-Uncertainty Aware (OUA) inference that propagates calibration uncertainty into confidence intervals. We formalize the Coverage-Limited Efficiency (CLE) diagnostic, which explains why IPS-style estimators fail even when ESS exceeds 90%: the logger rarely visits regions where target policies concentrate. Key findings: SNIPS inverts rankings even with reward calibration (38% pairwise, negative Kendall's tau) due to weight instability; calibrated IPS remains near-random (47%) despite weight stabilization, consistent with CLE; OUA improves coverage from near-0% to ~86% (Direct) and ~96% (stacked-DR), where naive intervals severely under-cover.
For the First Time, AI Analyzes Language as Well as a Human Expert
If language is what makes us human, what does it mean now that large language models have gained "metalinguistic" abilities? Among the myriad abilities that humans possess, which ones are uniquely human? Language has been a top candidate at least since Aristotle, who wrote that humanity was "the animal that has language." Even as large language models such as ChatGPT superficially replicate ordinary speech, researchers want to know if there are specific aspects of human language that simply have no parallels in the communication systems of other animals or artificially intelligent devices. In particular, researchers have been exploring the extent to which language models can reason about language itself.
Sam Altman Got What He Wanted
OpenAI turned 10 yesterday, and President Donald Trump incidentally gave the company a very special birthday gift: a sweeping executive order aiming to dismantle and preempt many state-level regulations of artificial intelligence. "There's only going to be one winner here, and it's probably going to be the U.S. or China," Trump said in a press conference announcing the order. And for the United States to win, "we have to be unified. Almost all of the AI industry's biggest players have been pushing for this move. OpenAI has been asking all year for the Trump administration to preempt state-level AI regulations, which the company believes would be burdensome in various ways; Microsoft, Google, Meta, Nvidia, and the major venture-capital firm Andreessen Horowitz have made similar requests.
Why Disney's Most Scandalous Deal Is Such a Grim Development
The Industry Disney's Deal With OpenAI Is So Much Worse Than You Think The $1 billion partnership allows users to create A.I.-generated images of the company's iconic characters. That's not going to end well for anyone. Enter your email to receive alerts for this author. You can manage your newsletter subscriptions at any time. You're already subscribed to the aa_Nitish_Pahwa newsletter.
Google's real-time AI translation looks like it could change the world
When you purchase through links in our articles, we may earn a small commission. Google's real-time AI translation looks like it could change the world If Google can pull this off, traveling overseas to a foreign country won't be that foreign any more. Google Gemini has launched real-time, continuous translation using your phone and a pair of connected earbuds, in what looks like a powerful transformative change to the way in which we interact with speakers from other countries. Google buried its announcement in an update to Gemini voice model updates on Friday, but the additional translation features look like they could change the way in which people interact with foreign speakers. Google is launching a beta of Google Translate to accommodate both real-time translation and two-way conversations, powered by Gemini.
How people used Microsoft Copilot in 2025, from coding to philosophy
When you purchase through links in our articles, we may earn a small commission. In the run-up to Valentine's Day, Microsoft saw a surge in conversations about relationships and personal development. Microsoft has released a new report showing what people used its AI assistant Copilot for in 2025. The analysis is based on 37.5 million de-identified conversations and shows that in addition to productivity, Copilot is used for health, relationships and personalized guidance. Health was particularly prevalent on mobile, with users turning to Copilot around the clock for tips on exercise, routines, and wellness.