Goto

Collaborating Authors

 FDA


AI tools could weaken doctors' skills in detecting colon cancer, study suggests

FOX News

Fox News anchor Bret Baier has the latest on the Murdoch Children's Research Institute's partnership with the Gladstone Institutes for the'Decoding Broken Hearts' initiative on'Special Report.' The benefits of artificial intelligence (AI) in the medical space are ever-growing, but evidence suggests it can also come with risks. A new study by European researchers investigated how AI can change the behavior of endoscopists when conducting a colonoscopy, and how their performance dips when not using AI. The research followed clinicians at four endoscopy centers in Poland participating in the ACCEPT (Artificial Intelligence in Colonoscopy for Cancer Prevention) trial, where AI tools for polyp detection were introduced at the end of 2021. Colonoscopies at these centers were randomly selected to be administered with or without AI assistance.


Design and Validation of a Responsible Artificial Intelligence-based System for the Referral of Diabetic Retinopathy Patients

arXiv.org Artificial Intelligence

Diabetic Retinopathy (DR) is a leading cause of vision loss in working-age individuals. Early detection of DR can reduce the risk of vision loss by up to 95%, but a shortage of retinologists and challenges in timely examination complicate detection. Artificial Intelligence (AI) models using retinal fundus photographs (RFPs) offer a promising solution. However, adoption in clinical settings is hindered by low-quality data and biases that may lead AI systems to learn unintended features. To address these challenges, we developed RAIS-DR, a Responsible AI System for DR screening that incorporates ethical principles across the AI lifecycle. RAIS-DR integrates efficient convolutional models for preprocessing, quality assessment, and three specialized DR classification models. We evaluated RAIS-DR against the FDA-approved EyeArt system on a local dataset of 1,046 patients, unseen by both systems. RAIS-DR demonstrated significant improvements, with F1 scores increasing by 5-12%, accuracy by 6-19%, and specificity by 10-20%. Additionally, fairness metrics such as Disparate Impact and Equal Opportunity Difference indicated equitable performance across demographic subgroups, underscoring RAIS-DR's potential to reduce healthcare disparities. These results highlight RAIS-DR as a robust and ethically aligned solution for DR screening in clinical settings. The code, weights of RAIS-DR are available at https://gitlab.com/inteligencia-gubernamental-jalisco/jalisco-retinopathy with RAIL.





This drug can turn your blood into mosquito poison

Popular Science

Breakthroughs, discoveries, and DIY tips sent every weekday. Mosquitoes may have just met their match: A prescription drug already used to treat a rare genetic disease in humans can make a person's blood poisonous to insecticide-resistant, malaria-carrying mosquitoes. New research published on July 31, 2025, in Parasites & Vectors found that the same drug, nitisinone, can even kill mosquitoes that simply land on a surface sprayed with the chemical. The findings could open up new avenues to stop the spread of diseases like malaria and dengue, especially as more mosquito populations evolve to become resistant to traditional prevention methods. Whether people will willingly offer their bodies as mosquito blood bait, though, remains less clear.


Health Insurance Coverage Rule Interpretation Corpus: Law, Policy, and Medical Guidance for Health Insurance Coverage Understanding

arXiv.org Artificial Intelligence

U.S. health insurance is complex, and inadequate understanding and limited access to justice have dire implications for the most vulnerable. Advances in natural language processing present an opportunity to support efficient, case-specific understanding, and to improve access to justice and healthcare. Yet existing corpora lack context necessary for assessing even simple cases. We collect and release a corpus of reputable legal and medical text related to U.S. health insurance. We also introduce an outcome prediction task for health insurance appeals designed to support regulatory and patient self-help applications, and release a labeled benchmark for our task, and models trained on it.


Strategic Hypothesis Testing

arXiv.org Artificial Intelligence

We examine hypothesis testing within a principal-agent framework, where a strategic agent, holding private beliefs about the effectiveness of a product, submits data to a principal who decides on approval. The principal employs a hypothesis testing rule, aiming to pick a p-value threshold that balances false positives and false negatives while anticipating the agent's incentive to maximize expected profitability. Building on prior work, we develop a game-theoretic model that captures how the agent's participation and reporting behavior respond to the principal's statistical decision rule. Despite the complexity of the interaction, we show that the principal's errors exhibit clear monotonic behavior when segmented by an efficiently computable critical p-value threshold, leading to an interpretable characterization of their optimal p-value threshold. We empirically validate our model and these insights using publicly available data on drug approvals. Overall, our work offers a comprehensive perspective on strategic interactions within the hypothesis testing framework, providing technical and regulatory insights.


Beyond Benchmarks: Dynamic, Automatic And Systematic Red-Teaming Agents For Trustworthy Medical Language Models

arXiv.org Artificial Intelligence

Ensuring the safety and reliability of large language models (LLMs) in clinical practice is critical to prevent patient harm and promote trustworthy healthcare applications of AI. However, LLMs are advancing so rapidly that static safety benchmarks often become obsolete upon publication, yielding only an incomplete and sometimes misleading picture of model trustworthiness. We demonstrate that a Dynamic, Automatic, and Systematic (DAS) red-teaming framework that continuously stress-tests LLMs can reveal significant weaknesses of current LLMs across four safety-critical domains: robustness, privacy, bias/fairness, and hallucination. A suite of adversarial agents is applied to autonomously mutate test cases, identify/evolve unsafe-triggering strategies, and evaluate responses, uncovering vulnerabilities in real time without human intervention. Applying DAS to 15 proprietary and open-source LLMs revealed a stark contrast between static benchmark performance and vulnerability under adversarial pressure. Despite a median MedQA accuracy exceeding 80\%, 94\% of previously correct answers failed our dynamic robustness tests. We observed similarly high failure rates across other domains: privacy leaks were elicited in 86\% of scenarios, cognitive-bias priming altered clinical recommendations in 81\% of fairness tests, and we identified hallucination rates exceeding 66\% in widely used models. Such profound residual risks are incompatible with routine clinical practice. By converting red-teaming from a static checklist into a dynamic stress-test audit, DAS red-teaming offers the surveillance that hospitals/regulators/technology vendors require as LLMs become embedded in patient chatbots, decision-support dashboards, and broader healthcare workflows. Our framework delivers an evolvable, scalable, and reliable safeguard for the next generation of medical AI.


Transparent AI: The Case for Interpretability and Explainability

arXiv.org Artificial Intelligence

As artificial intelligence systems increasingly inform high-stakes decisions across sectors, transparency has become foundational to responsible and trustworthy AI implementation. Leveraging our role as a leading institute in advancing AI research and enabling industry adoption, we present key insights and lessons learned from practical interpretability applications across diverse domains. This paper offers actionable strategies and implementation guidance tailored to organizations at varying stages of AI maturity, emphasizing the integration of interpretability as a core design principle rather than a retrospective add-on.