Goto

Collaborating Authors

 management plan


Towards physician-centered oversight of conversational diagnostic AI

Vedadi, Elahe, Barrett, David, Harris, Natalie, Wulczyn, Ellery, Reddy, Shashir, Ruparel, Roma, Schaekermann, Mike, Strother, Tim, Tanno, Ryutaro, Sharma, Yash, Lee, Jihyeon, Hughes, Cían, Slack, Dylan, Palepu, Anil, Freyberg, Jan, Saab, Khaled, Liévin, Valentin, Weng, Wei-Hung, Tu, Tao, Liu, Yun, Tomasev, Nenad, Kulkarni, Kavita, Mahdavi, S. Sara, Guu, Kelvin, Barral, Joëlle, Webster, Dale R., Manyika, James, Hassidim, Avinatan, Chou, Katherine, Matias, Yossi, Kohli, Pushmeet, Rodman, Adam, Natarajan, Vivek, Karthikesalingam, Alan, Stutz, David

arXiv.org Artificial Intelligence

Recent work has demonstrated the promise of conversational AI systems for diagnostic dialogue. However, real-world assurance of patient safety means that providing individual diagnoses and treatment plans is considered a regulated activity by licensed professionals. Furthermore, physicians commonly oversee other team members in such activities, including nurse practitioners (NPs) or physician assistants/associates (PAs). Inspired by this, we propose a framework for effective, asynchronous oversight of the Articulate Medical Intelligence Explorer (AMIE) AI system. We propose guardrailed-AMIE (g-AMIE), a multi-agent system that performs history taking within guardrails, abstaining from individualized medical advice. Afterwards, g-AMIE conveys assessments to an overseeing primary care physician (PCP) in a clinician cockpit interface. The PCP provides oversight and retains accountability of the clinical decision. This effectively decouples oversight from intake and can thus happen asynchronously. In a randomized, blinded virtual Objective Structured Clinical Examination (OSCE) of text consultations with asynchronous oversight, we compared g-AMIE to NPs/PAs or a group of PCPs under the same guardrails. Across 60 scenarios, g-AMIE outperformed both groups in performing high-quality intake, summarizing cases, and proposing diagnoses and management plans for the overseeing PCP to review. This resulted in higher quality composite decisions. PCP oversight of g-AMIE was also more time-efficient than standalone PCP consultations in prior work. While our study does not replicate existing clinical practices and likely underestimates clinicians' capabilities, our results demonstrate the promise of asynchronous oversight as a feasible paradigm for diagnostic AI systems to operate under expert human oversight for enhancing real-world care.


Towards Conversational AI for Disease Management

Palepu, Anil, Liévin, Valentin, Weng, Wei-Hung, Saab, Khaled, Stutz, David, Cheng, Yong, Kulkarni, Kavita, Mahdavi, S. Sara, Barral, Joëlle, Webster, Dale R., Chou, Katherine, Hassidim, Avinatan, Matias, Yossi, Manyika, James, Tanno, Ryutaro, Natarajan, Vivek, Rodman, Adam, Tu, Tao, Karthikesalingam, Alan, Schaekermann, Mike

arXiv.org Artificial Intelligence

While large language models (LLMs) have shown promise in diagnostic dialogue, their capabilities for effective management reasoning - including disease progression, therapeutic response, and safe medication prescription - remain under-explored. We advance the previously demonstrated diagnostic capabilities of the Articulate Medical Intelligence Explorer (AMIE) through a new LLM-based agentic system optimised for clinical management and dialogue, incorporating reasoning over the evolution of disease and multiple patient visit encounters, response to therapy, and professional competence in medication prescription. To ground its reasoning in authoritative clinical knowledge, AMIE leverages Gemini's long-context capabilities, combining in-context retrieval with structured reasoning to align its output with relevant and up-to-date clinical practice guidelines and drug formularies. In a randomized, blinded virtual Objective Structured Clinical Examination (OSCE) study, AMIE was compared to 21 primary care physicians (PCPs) across 100 multi-visit case scenarios designed to reflect UK NICE Guidance and BMJ Best Practice guidelines. AMIE was non-inferior to PCPs in management reasoning as assessed by specialist physicians and scored better in both preciseness of treatments and investigations, and in its alignment with and grounding of management plans in clinical guidelines. To benchmark medication reasoning, we developed RxQA, a multiple-choice question benchmark derived from two national drug formularies (US, UK) and validated by board-certified pharmacists. While AMIE and PCPs both benefited from the ability to access external drug information, AMIE outperformed PCPs on higher difficulty questions. While further research would be needed before real-world translation, AMIE's strong performance across evaluations marks a significant step towards conversational AI as a tool in disease management.


Towards Democratization of Subspeciality Medical Expertise

O'Sullivan, Jack W., Palepu, Anil, Saab, Khaled, Weng, Wei-Hung, Cheng, Yong, Chu, Emily, Desai, Yaanik, Elezaby, Aly, Kim, Daniel Seung, Lan, Roy, Tang, Wilson, Tapaskar, Natalie, Parikh, Victoria, Jain, Sneha S., Kulkarni, Kavita, Mansfield, Philip, Webster, Dale, Gottweis, Juraj, Barral, Joelle, Schaekermann, Mike, Tanno, Ryutaro, Mahdavi, S. Sara, Natarajan, Vivek, Karthikesalingam, Alan, Ashley, Euan, Tu, Tao

arXiv.org Artificial Intelligence

The scarcity of subspecialist medical expertise, particularly in rare, complex and life-threatening diseases, poses a significant challenge for healthcare delivery. This issue is particularly acute in cardiology where timely, accurate management determines outcomes. We explored the potential of AMIE (Articulate Medical Intelligence Explorer), a large language model (LLM)-based experimental AI system optimized for diagnostic dialogue, to potentially augment and support clinical decision-making in this challenging context. We curated a real-world dataset of 204 complex cases from a subspecialist cardiology practice, including results for electrocardiograms, echocardiograms, cardiac MRI, genetic tests, and cardiopulmonary stress tests. We developed a ten-domain evaluation rubric used by subspecialists to evaluate the quality of diagnosis and clinical management plans produced by general cardiologists or AMIE, the latter enhanced with web-search and self-critique capabilities. AMIE was rated superior to general cardiologists for 5 of the 10 domains (with preference ranging from 9% to 20%), and equivalent for the rest. Access to AMIE's response improved cardiologists' overall response quality in 63.7% of cases while lowering quality in just 3.4%. Cardiologists' responses with access to AMIE were superior to cardiologist responses without access to AMIE for all 10 domains. Qualitative examinations suggest AMIE and general cardiologist could complement each other, with AMIE thorough and sensitive, while general cardiologist concise and specific. Overall, our results suggest that specialized medical LLMs have the potential to augment general cardiologists' capabilities by bridging gaps in subspecialty expertise, though further research and validation are essential for wide clinical utility.


Duncannon, Nature Conservancy using artificial intelligence to create forest management plan

#artificialintelligence

The technology coupled with hands-on work and measurements is used to create a forest management plan. The Duncannon Borough Watershed is a 1,600-acre property key to generating money in the local community. "In 300 spots, we measured every tree for a tenth of an acre," said Josh Parrish, the director of the Working Woodlands program at the Nature Conservancy. Understanding what you have is important in moving forward. So, the Nature Conservancy is doing just that by working with a company that uses artificial intelligence.


How Machines are Learning for Modern Agriculture

#artificialintelligence

Arthur Samuel, an eccentric computer engineer at Stanford University, took part in what could be considered the most important game of checkers ever played. Arthur challenged the then reigning Connecticut state champion to match wits with a computer he programmed to play checkers.a Surprisingly enough, this event is not an artifact of recent history; the fateful game took place in 1961. Decades prior to the personal computer revolution, Professor Samuel built a working prototype capable of what we now call, "machine learning." Rather than programming the 500 quintillion b potential scenarios on a checkerboard, Arthur instructed the computer to react based on games it had played in the past.


Storm damage to forests costs billions – here's how artificial intelligence can help

#artificialintelligence

High-intensity storms cause billions of pounds of damage every year, and climate change is set to make this worse in future. We already appear to be seeing more frequent and intense windstorms. Ex-hurricane Ophelia and Storm Eleanor both wreaked havoc in the British Isles over the winter, including injuries, power cuts and severe travel delays. It's not only commuters and households that are affected. Every year across Europe, the number of trees that commercial forests lose to storms is equivalent to the annual amount of timber felled in Poland.


How artificial intelligence can help repair storm damage

#artificialintelligence

High-intensity storms cause billions of pounds of damage every year, and climate change is set to make this worse in future. We already appear to be seeing more frequent and intense windstorms. Hurricane Ophelia and Storm Eleanor both wreaked havoc in the British Isles over the winter, including injuries, power cuts and severe travel delays. It's not only commuters and households that are affected. Every year across Europe, the number of trees that commercial forests lose to storms is equivalent to the annual amount of timber felled in Poland.