Artificial Intelligence health advice accuracy varies across languages and contexts
Garg, Prashant, Fetzer, Thiemo
–arXiv.org Artificial Intelligence
Using basic health statements authorized by UK and EU registers and ~9,100 journalist - vetted public - health assertions on topics such as abortion, COVID - 19 and politics from sources ranging from peer - reviewed journals and government advisories to social med ia and news across the political spectrum, we benchmark six leading large language models from in 21 languages, finding that -- despite high accuracy on English - centric textbook claims -- performance falls in multiple non - European languages and fluctuates by top ic and source, highlighting the urgency of comprehensive multilingual, domain - aware validation before deploying AI in global health communication. Main Text: Recent evidence suggests that 17 % of U.S. adults -- and a striking 25 % of those aged 18 - 29 -- now consult AI chatbots for health questions at least once a month (1), while in Australia nearly 10 % of adults did so in just the first half of 2024 (2). Beyond mere curiosity, these tools can substantially improve comprehension: running standard d ischarge notes through GPT - 4 reduced the average reading grade level from 11th to 6th and boosted patient - understandability scores from 13 % to 81 % (3). Yet as fluently as large language models (LLMs) can rephrase medical text, they lack formal clinical v etting and still rely on statistical patterns in their training data. When generative AI echoes unverified or dangerous claims, it risks amplifying harm.
arXiv.org Artificial Intelligence
Apr-28-2025
- Country:
- Asia
- Europe
- Belarus (0.15)
- Germany > North Rhine-Westphalia
- Cologne Region > Bonn (0.05)
- United Kingdom > England
- Greater London > London (0.05)
- West Midlands > Coventry (0.05)
- North America > United States (0.14)
- Oceania > Australia (0.25)
- Genre:
- Research Report (0.40)
- Industry:
- Technology: