Large Language Model
AURA: A Diagnostic Framework for Tracking User Satisfaction of Interactive Planning Agents
Kim, Takyoung, Singh, Janvijay, Mehri, Shuhaib, Acikgoz, Emre Can, Mukherjee, Sagnik, Bozdag, Nimet Beyza, Shashidhar, Sumuk, Tur, Gokhan, Hakkani-Tür, Dilek
The growing capabilities of large language models (LLMs) in instruction-following and context-understanding lead to the era of agents with numerous applications. Among these, task planning agents have become especially prominent in realistic scenarios involving complex internal pipelines, such as context understanding, tool management, and response generation. However, existing benchmarks predominantly evaluate agent performance based on task completion as a proxy for overall effectiveness. We hypothesize that merely improving task completion is misaligned with maximizing user satisfaction, as users interact with the entire agentic process and not only the end result. To address this gap, we propose AURA, an Agent-User inteRaction Assessment framework that conceptualizes the behavioral stages of interactive task planning agents. AURA offers a comprehensive assessment of agent through a set of atomic LLM evaluation criteria, allowing researchers and practitioners to diagnose specific strengths and weaknesses within the agent's decision-making pipeline. Our analyses show that agents excel in different behavioral stages, with user satisfaction shaped by both outcomes and intermediate behaviors. We also highlight future directions, including systems that leverage multiple agents and the limitations of user simulators in task planning.
OpenAI's head of ChatGPT says posts appearing to show in-app ads are 'not real or not ads'
OpenAI's head of ChatGPT says posts appearing to show in-app ads are'not real or not ads' However, OpenAI exec Nick Turley said that the company will take a thoughtful approach if they pursue ads. Those might not exactly be ads you're seeing on ChatGPT, at least according to OpenAI. Nick Turley, OpenAI's head of ChatGPT, clarified the confusion around potential ads appearing with the AI chatbot. In a post on X, Turley said there are no live tests for ads and that any screenshots you've seen are either not real or not ads. The OpenAI exec's explanation comes after another post from former xAI employee Benjamin De Kraker on X that has gained traction, which featured a screenshot showing an option to shop at Target within a ChatGPT conversation.
A robot walks into a bar: can a Melbourne researcher get AI to do comedy?
An ensemble of about 10 robots - which will not be androids but ground vehicles between 40cm and 2m tall - will work with humans to learn how to be funny. An ensemble of about 10 robots - which will not be androids but ground vehicles between 40cm and 2m tall - will work with humans to learn how to be funny. A robot walks into a bar: can a Melbourne researcher get AI to do comedy? Robots can make humans laugh - mostly when they fall over - but a new research project is looking at whether robots using AI could ever be genuinely funny. If you ask ChatGPT for a funny joke, it will serve you up something that belongs in a Christmas cracker: "Why don't skeletons fight each other? Because they don't have the guts."
As Key Talent Abandons Apple, Meet the New Generation of Leaders Taking On the Old Guard
Players walk clockwise in a circle. When the music stops, everyone sits in a chair. Big Tech is setting in motion its plans for the next gen of lead designers, engineers, AI chiefs, and even CEOs. In Cupertino, Apple execs with familiar faces are retiring or reducing responsibilities. Well, chief operating officer Jeff Williams retired in November, and the speculation is that CEO Tim Cook could follow in the near term. Lisa Jackson, who has led Apple's sustainability efforts since 2013, is now set to retire in January too.
New AI tool is like a private, no-subscription ChatGPT on your desktop
When you purchase through links in our articles, we may earn a small commission. Get a lifetime license to Pansophy's fully private, on-device AI assistant for $79 and run ChatGPT-style tools locally with zero subscriptions, zero data sharing, and zero usage limits (MSRP $199). AI tools are getting more powerful, but also more expensive, more restricted, and more invasive. That's why this new desktop AI tool is turning so many heads: it gives you ChatGPT-style intelligence with none of the trade-offs. Pansophy is a fully local AI desktop assistant that runs entirely on your PC, Mac, Chromebook, or Linux device.
WIRED Roundup: DOGE Isn't Dead, Facebook Dating Is Real, and Amazon's AI Ambitions
WIRED Roundup: DOGE Isn't Dead, Facebook Dating Is Real, and Amazon's AI Ambitions In this episode of, we bring you the news of the week, then dive into how some DOGE operatives are still at work in the federal government--despite reports claiming otherwise. Uncanny Valley host Zoë Schiffer is joined by senior editor Leah Feiger to discuss five stories you need to know about this week, from how Amazon is trying to catch up in the AI race to why Facebook Dating is more popular than ever. Then, they dive into how--despite recent reports claiming that it's over--DOGE operatives are still very much working across federal agencies. Who the Hell Is Actually Using Facebook Dating? Sex Workers Built an'Anti-OnlyFans' to Take Control of Their Profits Here's What Its Operatives Are Doing Now Write to us at uncannyvalley@wired.com . You can always listen to this week's podcast through the audio player on this page, but if you want to subscribe for free to get every episode, here's how: If you're on an iPhone or iPad, open the app called Podcasts, or just tap this link . Today on the show, we're bringing you five stories that you need to know about this week, including how despite some reports claiming that the so-called Department of Government Efficiency is pretty much over, DOGE people are actually still at work across federal agencies. I'm joined today by our senior politics editor, Leah Feiger. How are you doing today? I am great because I've spent the day with you, but our gentle listeners don't know that. So the first story this week is one that I saw and I thought, you know what? Leah's going to want to talk about Amazon's artificial intelligence prowess.
Learning How Learning Works
In 2023, Noam Chomsky, considered the founder of modern linguistics, wrote that LLMs "learn humanly possible and humanly impossible languages with equal facility." However, in the Mission: Impossible Language Models paper that received a Best Paper award at the 2024 Association of Computational Linguistics (ACL) conference, researchers shared the results of their testing of Chomsky's theory, having discovered that language models actually struggle with learning languages with non-standard characters. Rogers Jeffrey Leo John, CTO of DataChat Inc., a company that he cofounded while working at the University of Wisconsin as a data science researcher, said the Mission: Impossible paper challenged the idea that LLMs can learn impossible languages as effectively as natural ones. "The models [studied for the paper] exhibited clear difficulties in acquiring and processing languages that deviate significantly from natural linguistic structures," said John. "Further, the researchers' findings support the idea that certain linguistic structures are universally preferred or more learnable both by humans and machines, highlighting the importance of natural language patterns in model training. This finding could also explain why LLMs, and even humans, can grasp certain languages easily and not others."
Here's What You Should Know About Launching an AI Startup
Here's What You Should Know About Launching an AI Startup AI startups say the promise of turning dazzling models into useful products is harder than anyone expected. Three founders discuss what it takes. Julie Bornstein thought it would be a cinch to implement her idea for an AI startup . Her résumé in digital commerce is impeccable: VP of ecommerce at Nordstrom, COO of the startup Stitch Fix, and founder of a personalized shopping platform acquired by Pinterest . Fashion has been her obsession since she was a Syracuse high schooler inhaling spreads in Seventeen and hanging out in local malls.
The New York Times and Chicago Tribune sue Perplexity over alleged copyright infringement
Both publications claim the AI company scraped their works for LLM training and often reproduced their content verbatim. The said it had sent Perplexity several cease-and-desist demands to stop using its content until the two reached an agreement, but the AI company persisted in doing so. First, by scraping its website (including in real time) to train AI models and feed content into the likes of the Claude chatbot and Comet browser . The also says Perplexity damaged its brand by falsely attributing completely fabricated information (aka hallucinations) to the newspaper. The also filed a lawsuit against Perplexity for similar reasons.