Goto

Collaborating Authors

 symptom checkers


Evaluating Rare Disease Diagnostic Performance in Symptom Checkers: A Synthetic Vignette Simulation Approach

Nishibayashi, Takashi, Kanazawa, Seiji, Yamada, Kumpei

arXiv.org Artificial Intelligence

Symptom Checkers (SCs) provide medical information tailored to user symptoms. A critical challenge in SC development is preventing unexpected performance degradation for individual diseases, especially rare diseases, when updating algorithms. This risk stems from the lack of practical pre-deployment evaluation methods. For rare diseases, obtaining sufficient evaluation data from user feedback is difficult. To evaluate the impact of algorithm updates on the diagnostic performance for individual rare diseases before deployment, this study proposes and validates a novel Synthetic Vignette Simulation Approach. This approach aims to enable this essential evaluation efficiently and at a low cost. To estimate the impact of algorithm updates, we generated synthetic vignettes from disease-phenotype annotations in the Human Phenotype Ontology (HPO), a publicly available knowledge base for rare diseases curated by experts. Using these vignettes, we simulated SC interviews to predict changes in diagnostic performance. The effectiveness of this approach was validated retrospectively by comparing the predicted changes with actual performance metrics using the R-squared ($R^2$) coefficient. Our experiment, covering eight past algorithm updates for rare diseases, showed that the proposed method accurately predicted performance changes for diseases with phenotype frequency information in HPO (n=5). For these updates, we found a strong correlation for both Recall@8 change ($R^2$ = 0.83,$p$ = 0.031) and Precision@8 change ($R^2$ = 0.78,$p$ = 0.047). Our proposed method enables the pre-deployment evaluation of SC algorithm changes for individual rare diseases. This evaluation is based on a publicly available medical knowledge database created by experts, ensuring transparency and explainability for stakeholders. Additionally, SC developers can efficiently improve diagnostic performance at a low cost.


C-PATH: Conversational Patient Assistance and Triage in Healthcare System

Shi, Qi, Han, Qiwei, Soares, Cláudia

arXiv.org Artificial Intelligence

Navigating healthcare systems can be complex and overwhelming, creating barriers for patients seeking timely and appropriate medical attention. In this paper, we introduce C-PATH (Conversational Patient Assistance and Triage in Healthcare), a novel conversational AI system powered by large language models (LLMs) designed to assist patients in recognizing symptoms and recommending appropriate medical departments through natural, multi-turn dialogues. C-PATH is fine-tuned on medical knowledge, dialogue data, and clinical summaries using a multi-stage pipeline built on the LLaMA3 architecture. A core contribution of this work is a GPT-based data augmentation framework that transforms structured clinical knowledge from DDXPlus into lay-person-friendly conversations, allowing alignment with patient communication norms. We also implement a scalable conversation history management strategy to ensure long-range coherence. Evaluation with GPTScore demonstrates strong performance across dimensions such as clarity, informativeness, and recommendation accuracy. Quantitative benchmarks show that C-PATH achieves superior performance in GPT-rewritten conversational datasets, significantly outperforming domain-specific baselines. C-PATH represents a step forward in the development of user-centric, accessible, and accurate AI tools for digital health assistance and triage.


A Scalable Approach to Benchmarking the In-Conversation Differential Diagnostic Accuracy of a Health AI

Bhatt, Deep, Ayyagari, Surya, Mishra, Anuruddh

arXiv.org Artificial Intelligence

Diagnostic errors in healthcare persist as a critical challenge, with increasing numbers of patients turning to online resources for health information. While AI-powered healthcare chatbots show promise, there exists no standardized and scalable framework for evaluating their diagnostic capabilities. This study introduces a scalable benchmarking methodology for assessing health AI systems and demonstrates its application through August, an AI-driven conversational chatbot. Our methodology employs 400 validated clinical vignettes across 14 medical specialties, using AI-powered patient actors to simulate realistic clinical interactions. In systematic testing, August achieved a top-one diagnostic accuracy of 81.8% (327/400 cases) and a top-two accuracy of 85.0% (340/400 cases), significantly outperforming traditional symptom checkers. The system demonstrated 95.8% accuracy in specialist referrals and required 47% fewer questions compared to conventional symptom checkers (mean 16 vs 29 questions), while maintaining empathetic dialogue throughout consultations. These findings demonstrate the potential of AI chatbots to enhance healthcare delivery, though implementation challenges remain regarding real-world validation and integration of objective clinical data. This research provides a reproducible framework for evaluating healthcare AI systems, contributing to the responsible development and deployment of AI in clinical settings.


Enhancing Medical Support in the Arabic Language Through Personalized ChatGPT Assistance

Issa, Mohamed, Abdelwahed, Ahmed

arXiv.org Artificial Intelligence

This Paper discusses the growing popularity of online medical diagnosis as an alternative to traditional doctor visits. It highlights the limitations of existing tools and emphasizes the advantages of using ChatGPT, which provides real-time, personalized medical diagnosis at no cost. The paragraph summarizes a research study that evaluated the performance of ChatGPT in Arabic medical diagnosis. The study involved compiling a dataset of disease information and generating multiple messages for each disease using different prompting techniques. ChatGPT's performance was assessed by measuring the similarity between its responses and the actual diseases. The results showed promising performance, with average scores of around 76% for similarity measures. Various prompting techniques were used, and chain prompting demonstrated a relative advantage. The study also recorded an average response time of 6.12 seconds for the ChatGPT API, which is considered acceptable but has room for improvement. While ChatGPT cannot replace human doctors entirely, the findings suggest its potential in emergency cases and addressing general medical inquiries. Overall, the study highlights ChatGPT's viability as a valuable tool in the medical field.


Revolutionizing Healthcare: The Top 14 Uses Of ChatGPT In Medicine And Wellness

#artificialintelligence

Over the past few years, artificial intelligence (AI) has made significant advancements in the healthcare industry. One of the most prominent AI-powered tools is ChatGPT, a natural language processing model developed by OpenAI. ChatGPT is capable of generating human-like responses to a wide range of queries, making it an ideal tool for healthcare applications. From personalized treatment plans to remote patient monitoring, ChatGPT is transforming the way healthcare providers deliver care to their patients. Let's explore a few different uses of ChatGPT in the healthcare sector and discuss the benefits that this revolutionary technology offers to patients, doctors, and researchers.


How artificial intelligence is changing the GP-patient relationship - Pulse Today

#artificialintelligence

'Alexa, what are the early signs of a stroke?' GPs may no longer be the first port of call for patients looking to understand their ailments. 'Dr Google' is already well established in patients' minds, and now they have a host of apps using artificial intelligence (AI), allowing them to input symptoms and receive a suggested diagnosis or advice without the need for human interaction. And policymakers are on board. Matt Hancock is the most tech-friendly health secretary ever, NHS England chief executive Simon Stevens wants England to lead the world in AI, and the prime minister last month announced £250m for a national AI lab to help cut waiting times and detect diseases earlier. Amazon even agreed a partnership with NHS England in July to allow people to access health information via its voice-activated assistant Alexa.


COVID-19 in differential diagnosis of online symptom assessments

Kannan, Anitha, Chen, Richard, Venkataraman, Vignesh, Tso, Geoffrey J., Amatriain, Xavier

arXiv.org Artificial Intelligence

The COVID-19 pandemic has magnified an already existing trend of people looking for healthcare solutions online. One class of solutions are symptom checkers, which have become very popular in the context of COVID-19. Traditional symptom checkers, however, are based on manually curated expert systems that are inflexible and hard to modify, especially in a quickly changing situation like the one we are facing today. That is why all COVID-19 existing solutions are manual symptom checkers that can only estimate the probability of this disease and cannot contemplate alternative hypothesis or come up with a differential diagnosis. While machine learning offers an alternative, the lack of reliable data does not make it easy to apply to COVID-19 either. In this paper we present an approach that combines the strengths of traditional AI expert systems and novel deep learning models. In doing so we can leverage prior knowledge as well as any amount of existing data to quickly derive models that best adapt to the current state of the world and latest scientific knowledge. We use the approach to train a COVID-19 aware differential diagnosis model that can be used for medical decision support both for doctors or patients. We show that our approach is able to accurately model new incoming data about COVID-19 while still preserving accuracy on conditions that had been modeled in the past. While our approach shows evident and clear advantages for an extreme situation like the one we are currently facing, we also show that its flexibility generalizes beyond this concrete, but very important, example.


Medical Advice From a Bot: The Unproven Promise of Babylon Health

#artificialintelligence

Hamish Fraser first encountered Babylon Health in 2017 when he and a colleague helped test the accuracy of several artificial intelligence-powered symptom checkers, meant to offer medical advice for anyone with a smartphone, for Wired U.K. Among the competitors, Babylon's symptom checker performed worst in identifying common illnesses, including asthma and shingles. Fraser, then a health informatics expert at the University of Leeds in England, figured that the company would need to vastly improve to stick around. "At that point I had no prejudice or knowledge of any of them, so I had no axe to grind, and I thought'Oh that's not really good,'" says Fraser, now at Brown University. "I thought they would disappear, right? Much has changed since the Wired U.K. article came out. Since early 2018, the London-based Babylon Health has grown from just 300 employees to approximately 1,500. The company has a valuation of more than $2 billion and says it wants to "put an affordable and accessible health service in the hands of every person on earth." In England, Babylon operates the fifth-largest practice under the country's mostly government-funded National Health Service, allowing patients near London and Birmingham to video chat with doctors or be seen in a clinic if necessary. The company claims to have processed 700,000 digital consultations between patients and physicians, with plans to offer services in other U.K. cities in the future. "I thought they would disappear, right?


The accuracy vs. coverage trade-off in patient-facing diagnosis models

Kannan, Anitha, Fries, Jason Alan, Kramer, Eric, Chen, Jen Jen, Shah, Nigam, Amatriain, Xavier

arXiv.org Machine Learning

In these online tools, patients input their initial symptoms and then proceed to answer a series of questions that the system deems relevant to those symptoms. The output of these online tools is a differential diagnosis (ranked list of diseases) that helps educate patients on possible relevant health conditions. Online symptom checkers are powered by underlying diagnosis models or engines similar to those used for advising physicians in "clinical decision support tools"; the main difference in this scenario being that the resulting differential diagnosis is not directly shared with the patient, but rather used by a physician for professional evaluation. Diagnosis models must have high accuracy while covering a large space of symptoms and diseases to be useful to patients and physicians. Accuracy is critically important, as incorrect diagnoses can give patients unnecessary cause for concern.


How artificial intelligence is changing the GP-patient relationship

#artificialintelligence

'Alexa, what are the early signs of a stroke?' GPs may no longer be the first port of call for patients looking to understand their ailments. 'Dr Google' is already well established in patients' minds, and now they have a host of apps using artificial intelligence (AI), allowing them to input symptoms and receive a suggested diagnosis or advice without the need for human interaction. And policymakers are on board. Matt Hancock is the most tech-friendly health secretary ever, NHS England chief executive Simon Stevens wants England to lead the world in AI, and the prime minister last month announced £250m for a national AI lab to help cut waiting times and detect diseases earlier. Amazon even agreed a partnership with NHS England in July to allow people to access health information via its voice-activated assistant Alexa.