AITopics | Preston, Sam

Collaborating Authors

Preston, Sam

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Universal Abstraction: Harnessing Frontier Models to Structure Real-World Data at Scale

Wong, Cliff, Preston, Sam, Liu, Qianchu, Gero, Zelalem, Bagga, Jass, Zhang, Sheng, Jain, Shrey, Zhao, Theodore, Gu, Yu, Xu, Yanbo, Kiblawi, Sid, Weerasinghe, Roshanthi, Leidner, Rom, Young, Kristina, Piening, Brian, Bifulco, Carlo, Naumann, Tristan, Wei, Mu, Poon, Hoifung

arXiv.org Artificial IntelligenceFeb-2-2025

The vast majority of real-world patient information resides in unstructured clinical text, and the process of medical abstraction seeks to extract and normalize structured information from this unstructured input. However, traditional medical abstraction methods can require significant manual efforts that can include crafting rules or annotating training labels, limiting scalability. In this paper, we propose UniMedAbstractor (UMA), a zero-shot medical abstraction framework leveraging Large Language Models (LLMs) through a modular and customizable prompt template. We refer to our approach as universal abstraction as it can quickly scale to new attributes through its universal prompt template without curating attribute-specific training labels or rules. We evaluate UMA for oncology applications, focusing on fifteen key attributes representing the cancer patient journey, from short-context attributes (e.g., performance status, treatment) to complex long-context attributes requiring longitudinal reasoning (e.g., tumor site, histology, TNM staging). Experiments on real-world data show UMA's strong performance and generalizability. Compared to supervised and heuristic baselines, UMA with GPT-4o achieves on average an absolute 2-point F1/accuracy improvement for both short-context and long-context attribute abstraction. For pathologic T staging, UMA even outperforms the supervised model by 20 points in accuracy.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2502.00943

Country:

North America > United States > Oregon (0.14)
North America > United States > Washington > King County (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.69)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Health Care Providers & Services (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Large-Scale Domain-Specific Pretraining for Biomedical Vision-Language Processing

Zhang, Sheng, Xu, Yanbo, Usuyama, Naoto, Bagga, Jaspreet, Tinn, Robert, Preston, Sam, Rao, Rajesh, Wei, Mu, Valluri, Naveen, Wong, Cliff, Lungren, Matthew P., Naumann, Tristan, Poon, Hoifung

arXiv.org Artificial IntelligenceMar-1-2023

Contrastive pretraining on parallel image-text data has attained great success in vision-language processing (VLP), as exemplified by CLIP and related methods. However, prior explorations tend to focus on general domains in the web. Biomedical images and text are rather different, but publicly available datasets are small and skew toward chest X-ray, thus severely limiting progress. In this paper, we conducted by far the largest study on biomedical VLP, using 15 million figure-caption pairs extracted from biomedical research articles in PubMed Central. Our dataset (PMC-15M) is two orders of magnitude larger than existing biomedical image-text datasets such as MIMIC-CXR, and spans a diverse range of biomedical images. The standard CLIP method is suboptimal for the biomedical domain. We propose BiomedCLIP with domain-specific adaptations tailored to biomedical VLP. We conducted extensive experiments and ablation studies on standard biomedical imaging tasks from retrieval to classification to visual question-answering (VQA). BiomedCLIP established new state of the art in a wide range of standard datasets, substantially outperformed prior VLP approaches. Surprisingly, BiomedCLIP even outperformed radiology-specific state-of-the-art models such as BioViL on radiology-specific tasks such as RSNA pneumonia detection, thus highlighting the utility in large-scale pretraining across all biomedical image types. We will release our models at https://aka.ms/biomedclip to facilitate future research in biomedical VLP.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2303.00915

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback