Goto

Collaborating Authors

 investigation


Simulating Viva Voce Examinations to Evaluate Clinical Reasoning in Large Language Models

Neural Information Processing Systems

Clinical reasoning in medicine is a hypothesis-driven process where physicians refine diagnoses from limited information through targeted history, physical examination, and diagnostic investigations. In contrast, current medical benchmarks for large language models (LLMs) primarily assess knowledge recall through single-turn questions, where complete clinical information is provided upfront. To address this gap, we introduce VivaBench, a multi-turn benchmark that evaluates sequential clinical reasoning in LLM agents. Our dataset comprises 1152 physiciancurated clinical vignettes structured as interactive scenarios that simulate a viva voce examination in medical training, requiring agents to actively probe for relevant findings, select appropriate investigations, and synthesize information across multiple steps to reach a diagnosis. We evaluated several state-of-the-art LLMs and found that while models demonstrate competence in diagnosing conditions within well-described clinical presentations, their performance degrades significantly when required to navigate diagnostic uncertainty. Our analysis identified several failure modes that mirror common issues in clinical practice, including: (1) fixation on initial hypotheses, (2) excessive investigation ordering, (3) premature diagnostic closure, and (4) missing critical conditions. These patterns reveal fundamental limitations in how current LLMs manage uncertainty and gather information sequentially. Through VivaBench, we provide a standardized benchmark for evaluating conversational medical AI systems for real-world clinical decision support. Beyond medical applications, we contribute to the larger corpus of research on agentic AI by demonstrating how sequential reasoning trajectories can diverge in complex decision-making environments.


Carvalho resigns as LAUSD superintendent amid federal investigation

Los Angeles Times

Things to Do in L.A. Tap to enable a layout that focuses on the article. Alberto Carvalho, who resigned Sunday as LAUSD superintendent, addresses students at an elementary school in 2023. This is read by an automated voice. Please report any issues or inconsistencies here . Alberto Carvalho resigned Sunday night.


UK's top AI regulator quits after 'inappropriate' humour

BBC News

UK's top data and AI regulator quits after'inappropriate' humour John Edwards, the UK's information commissioner, has resigned following a workplace investigation. I have accepted that there have been occasions where I exercised poor judgement and made attempts at humour that were inappropriate and caused offence, he said in a statement on Friday. The Information Commissioner's Office (ICO) is responsible for regulating AI in the UK and also oversees data protection regulation and the freedom of information law. Edwards' resignation was confirmed by the government, which said it had come after an independent probe that took place regarding allegations made against him. The government expects the highest standards of conduct from all senior leaders in public life, said a spokesperson for the Department for Science, Innovation and Technology (DSIT).


3 Amazon Workers Say They're Under Investigation for Speaking Out About Data Centers

WIRED

The software engineers filed a complaint with Seattle's civil rights office accusing Amazon of illegally retaliating against them for expressing their personal political beliefs. Earlier this month, five current Amazon employees publicly urged Seattle City Council to regulate data centers . It was an unprecedented act of advocacy by tech workers, and now three of the staffers say they are under internal investigation for what they understand to be allegedly representing themselves as spokespeople for the company without prior approval. "It's a totally ridiculous claim," says one of the affected employees, Patrick Schloesser. The three software engineers, who work in different divisions of Amazon and all live in Seattle, believe they are being unfairly targeted for expressing their political beliefs.


Gavin Newsom Says Trump DOJ Is Politically Targeting Him

TIME - Tech

Follow this section to personalize your feed and get instant alerts. Follow Go to your personalized feed WHY FOLLOW? Smart Alerts: Get notified about major news as it happens. Follow this tag to personalize your feed and get instant alerts. Follow Go to your personalized feed WHY FOLLOW?


Investigation by The Atlantic reveals many millions of songs used for AI music training

Engadget

Taylor Swift, Bad Bunny and many, many more artists have had their work fed into AI models. We're always glad to see more publications and groups digging deeper into artificial intelligence and its impact. Today, has published four searchable databases of music that has been used to train AI models. The scope is pretty staggering, with 12 million tracks in one database, 9 million in another, and the two final ones each containing about 100,000 songs. The full results and payout from that suit are still pending, though the initial settlement was for $1.5 billion.


Simulating Viva Voce Examinations to Evaluate Clinical Reasoning in Large Language Models

Neural Information Processing Systems

Clinical reasoning in medicine is a hypothesis-driven process where physicians refine diagnoses from limited information through targeted history, physical examination, and diagnostic investigations. In contrast, current medical benchmarks for large language models (LLMs) primarily assess knowledge recall through single-turn questions, where complete clinical information is provided upfront. To address this gap, we introduce VivaBench, a multi-turn benchmark that evaluates sequential clinical reasoning in LLM agents. Our dataset consists of 1762 physician-curated clinical vignettes structured as interactive scenarios that simulate a $ \textit{viva voce}$ (oral) examination in medical training, requiring agents to actively probe for relevant findings, select appropriate investigations, and synthesize information across multiple steps to reach a diagnosis. While current LLMs demonstrate competence in diagnosing conditions from well-described clinical presentations, their performance degrades significantly when required to navigate iterative diagnostic reasoning under uncertainty in our evaluation. Our analysis identified several failure modes that mirror common cognitive errors in clinical practice, including: (1) fixation on initial hypotheses, (2) inappropriate investigation ordering, (3) premature diagnostic closure, and (4) failing to screen for critical conditions. These patterns reveal fundamental limitations in how current LLMs reason and make decisions under uncertainty. Through VivaBench, we provide a standardized benchmark for evaluating conversational medical AI systems for real-world clinical decision support. Beyond medical applications, we contribute to the larger corpus of research on agentic AI by demonstrating how sequential reasoning trajectories can diverge in complex decision-making environments.


Officer accused of using AI to 'create evidence'

BBC News

Officer accused of using AI to'create evidence' Police have launched a criminal investigation into an officer accused of using artificial intelligence (AI) systems to create evidential material in a number of cases. The Derbyshire Police officer has been removed from frontline duties, pending the outcome of the investigation, said the force. The officer is alleged to have perverted the course of justice, but no arrests have been made, said police. A Crown Prosecution Service spokesperson said they were working with police, adding: We are engaging with defence teams and the courts in appropriate cases. They added: As police inquiries continue, it would not be appropriate to comment further.


OpenAI is facing investigation from a group of state attorneys general

Engadget

The company says it will'engage constructively' with them. OpenAI is under investigation by a coalition of state attorneys general, according to the Wall Street Journal . On Friday, June 12, the company received a subpoena seeking information and documents related to its activities and impact on users. said it viewed the subpoena sent by New York's attorney general. Based on what the publication saw, the AGs are asking for documentation about the company's advertising, user engagement and retention, as well as its handling of its users' data and health information. They also want to know about the company's activities related to minor and senior users, its deep learning models, its policies and its models' sycophancy.


Why You Might Already Own SpaceX Shares, Siri's AI Makeover, and Knicks Owner's Surveillance Machine

WIRED

Today on, we take an early look at the SpaceX IPO and why you might find yourself among the investors without even realizing it. This week on, our hosts discuss SpaceX officially going public and who will benefit the most from it, as well as Apple's WWDC and the brand-new release of Siri AI. They also get into how Meta removed a face-recognition feature after a WIRED report exposed it--and later in the show: an investigation into how New York Knicks' owner James Dolan created an extensive surveillance system inside all of his Madison Square Garden properties. Write to us at [email protected] . You can always listen to this week's podcast through the audio player on this page, but if you want to subscribe for free to get every episode, here's how: If you're on an iPhone or iPad, open the app called Podcasts, or just tap this link . Before we start, two quick things. If you've been enjoying listening to the show, would appreciate it if you took a second to rate it in your app of choice. It really helps us reach more people. Second, if you have any questions related to tech, privacy, or politics that you would like me, Zoë, and Leah to take on, now is the time to submit them to [email protected] . It doesn't matter how big or how small, we want to hear from you and get you answers. I'm a little tired, but it's because I got to see Lionel Messi play soccer last night and score a goal on a penalty kick. It was a friendly of Argentina versus Iceland. You'll never guess who won. Is that an obvious thing? It's far from their first attempt, but it's going to stick this time. We're also taking an early look at the SpaceX IPO this week, which is slated to become the world's largest IPO of all time. We'll get into who is slated to benefit the most. Elon Musk, who is already the world's richest man, but on track to become even richer and why you might find yourself among the investors without even realizing it. And in case you missed it, WIRED reporters recently uncovered that Meta had silently embedded code that would power a face-recognition system for its smart classes in the Meta AI app on millions of people's phones.