Goto

Collaborating Authors

 rajpurkar


AI chatbots fail to diagnose patients by talking with them

New Scientist

Advanced artificial intelligence models score well on professional medical exams but still flunk one of the most crucial physician tasks: talking with patients to gather relevant medical information and deliver an accurate diagnosis. "While large language models show impressive results on multiple-choice tests, their accuracy drops significantly in dynamic conversations," says Pranav Rajpurkar at Harvard University. That became evident when researchers developed a method for evaluating a clinical AI model's reasoning capabilities based on simulated doctor-patient conversations. The "patients" were based on 2000 medical cases primarily drawn from professional US medical board exams. "Simulating patient interactions enables the evaluation of medical history-taking skills, a critical component of clinical practice that cannot be assessed using case vignettes," says Shreya Johri, also at Harvard University.


Learning Generalized Medical Image Representations through Image-Graph Contrastive Pretraining

Khanna, Sameer, Michael, Daniel, Zitnik, Marinka, Rajpurkar, Pranav

arXiv.org Artificial Intelligence

Medical image interpretation using deep learning has shown promise but often requires extensive expert-annotated datasets. To reduce this annotation burden, we develop an Image-Graph Contrastive Learning framework that pairs chest X-rays with structured report knowledge graphs automatically extracted from radiology notes. Our approach uniquely encodes the disconnected graph components via a relational graph convolution network and transformer attention. In experiments on the CheXpert dataset, this novel graph encoding strategy enabled the framework to outperform existing methods that use image-text contrastive learning in 1% linear evaluation and few-shot settings, while achieving comparable performance to radiologists. By exploiting unlabeled paired images and text, our framework demonstrates the potential of structured clinical insights to enhance contrastive learning for medical images. This work points toward reducing demands on medical experts for annotations, improving diagnostic precision, and advancing patient care through robust medical image understanding.


No labels? No problem!

#artificialintelligence

Harvard Medical School scientists and colleagues at Stanford University have developed an artificial intelligence diagnostic tool that can detect diseases on chest X-rays directly from natural-language descriptions contained in accompanying clinical reports. The step is deemed a major advance in clinical AI design because most current AI models require laborious human annotation of vast reams of data before the labeled data are fed into the model to train it. A report on the work, published Sept. 15 in Nature Biomedical Engineering, shows that the model, called CheXzero, performed on par with human radiologists in its ability to detect pathologies on chest X-rays. The team has made the code for the model publicly available for other researchers. Most AI models require labeled datasets during their "training" so they can learn to correctly identify pathologies. This process is especially burdensome for medical image-interpretation tasks since it involves large-scale annotation by human clinicians, which is often expensive and time-consuming.


An AI used medical notes to teach itself to spot disease on chest x-rays

MIT Technology Review

The research, described in Nature Biomedical Engineering, found that the model was more effective at identifying issues such as pneumonia, collapsed lungs, and lesions than other self-supervised AI models. In fact, it was similar in accuracy to human radiologists. While others have tried to use unstructured medical data in this manner, this is the first time a team's AI model has learned from unstructured text and matched radiologists' performance, and it has demonstrated the ability to predict multiple diseases from a given x-ray with a high degree of accuracy, says Ekin Tiu, an undergraduate student at Stanford and a visiting researcher who coauthored the report. "We are the first to do that and demonstrate that effectively in this field," he says. The model's code has been made publicly available to other researchers in the hope it could be applied to CT scans, MRIs, and echocardiograms to help detect a wider range of diseases in other parts of the body, says Pranav Rajpurkar, an assistant professor of biomedical informatics in the Blavatnik Institute at Harvard Medical School, who led the project.


How self-supervised learning may boost medical AI progress

#artificialintelligence

Were you unable to attend Transform 2022? Check out all of the summit sessions in our on-demand library now! Self-supervised learning has been a fast-rising trend in artificial intelligence (AI) over the past couple of years, as researchers seek to take advantage of large-scale unannotated data to develop better machine learning models. In 2020, Yann Lecun, Meta's chief AI scientist, said supervised learning, which entails training an AI model on a labeled data set, would play a diminishing role as supervised learning came into wider use. "Most of what we learn as humans and most of what animals learn is in a self-supervised mode, not a reinforcement mode," he told a virtual session audience during the International Conference on Learning Representation (ICLR) 2020.


Calibrating for Class Weights by Modeling Machine Learning

Caplin, Andrew, Martin, Daniel, Marx, Philip

arXiv.org Artificial Intelligence

A much studied issue is the extent to which the confidence scores provided by machine learning algorithms are calibrated to ground truth probabilities. Our starting point is that calibration is seemingly incompatible with class weighting, a technique often employed when one class is less common (class imbalance) or with the hope of achieving some external objective (cost-sensitive learning). We provide a model-based explanation for this incompatibility and use our anthropomorphic model to generate a simple method of recovering likelihoods from an algorithm that is miscalibrated due to class weighting.


Unanswerable Questions about Images and Texts

Davis, Ernest

arXiv.org Artificial Intelligence

It will be useful to setting up a general, abstract framework in which to discuss these issues. Generally speaking AI systems, and for that matter computer programs of any kind for a particular task, the actual ultimate objective can be formulated as follows. There is a class X of inputs that are "reasonable" problems for Q. There is a class Y of possible outputs. The task defines a relation Q(x, y) meaning "y is a good output [or an acceptable output, or the best possible output] on the task for input x." We assume that for every x X there is at least one y Y such that Q(x, y).


Composing Answer from Multi-spans for Reading Comprehension

Zhang, Zhuosheng, Zhang, Yiqing, Zhao, Hai, Zhou, Xi, Zhou, Xiang

arXiv.org Artificial Intelligence

This paper presents a novel method to generate answers for non-extraction machine reading comprehension (MRC) tasks whose answers cannot be simply extracted as one span from the given passages. Using a pointer network-style extractive decoder for such type of MRC may result in unsatisfactory performance when the ground-truth answers are given by human annotators or highly re-paraphrased from parts of the passages. On the other hand, using generative decoder cannot well guarantee the resulted answers with well-formed syntax and semantics when encountering long sentences. Therefore, to alleviate the obvious drawbacks of both sides, we propose an answer making-up method from extracted multi-spans that are learned by our model as highly confident $n$-gram candidates in the given passage. That is, the returned answers are composed of discontinuous multi-spans but not just one consecutive span in the given passages anymore. The proposed method is simple but effective: empirical experiments on MS MARCO show that the proposed method has a better performance on accurately generating long answers, and substantially outperforms two competitive typical one-span and Seq2Seq baseline decoders.


AI Beat Humans at Reading! Maybe Not

WIRED

Microsoft and Chinese retailer Alibaba independently announced that they had made software that matched or outperformed humans on a reading-comprehension test devised at Stanford. Microsoft called it a "major milestone." Media coverage amplified the claims, with Newsweek estimating "millions of jobs at risk." Those jobs seem safe for a while. Closer examination of the tech giants' claims suggests their software hasn't yet drawn level with humans, even within the narrow confines of the test used.


AI models beat humans at reading comprehension, but they've still got a ways to go

@machinelearnbot

When computer models designed by tech giants Alibaba and Microsoft this month surpassed humans for the first time in a reading-comprehension test, both companies celebrated the success as a historic milestone. Luo Si, the chief scientist for natural-language processing at Alibaba's AI research unit, struck a poetic note, saying, "Objective questions such as'what causes rain' can now be answered with high accuracy by machines." Teaching a computer to read has for decades been one of artificial intelligence's holiest grails, and the feat seemed to signal a coming future in which AI could understand words and process meaning with the same fluidity humans take for granted every day. But computers aren't there yet -- and aren't even really that close, said AI experts who reviewed the test results. Instead, the accomplishment highlights not just how far the technology has progressed, but also how far it still has to go. "It's a large step" for the companies' marketing "but a small step for humankind," said Oren Etzioni, chief executive of the Allen Institute for Artificial Intelligence, an AI research group funded by Microsoft co-founder Paul Allen.