AITopics | Oktay, Ozan

Collaborating Authors

Oktay, Ozan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MAIRA-2: Grounded Radiology Report Generation

Bannur, Shruthi, Bouzid, Kenza, Castro, Daniel C., Schwaighofer, Anton, Bond-Taylor, Sam, Ilse, Maximilian, Pérez-García, Fernando, Salvatelli, Valentina, Sharma, Harshita, Meissen, Felix, Ranjit, Mercy, Srivastav, Shaury, Gong, Julia, Falck, Fabian, Oktay, Ozan, Thieme, Anja, Lungren, Matthew P., Wetscherek, Maria Teodora, Alvarez-Valle, Javier, Hyland, Stephanie L.

arXiv.org Artificial IntelligenceJun-6-2024

Radiology reporting is a complex task that requires detailed image understanding, integration of multiple inputs, including comparison with prior imaging, and precise language generation. This makes it ideal for the development and use of generative multimodal models. Here, we extend report generation to include the localisation of individual findings on the image - a task we call grounded report generation. Prior work indicates that grounding is important for clarifying image understanding and interpreting AI-generated text. Therefore, grounded reporting stands to improve the utility and transparency of automated report drafting. To enable evaluation of grounded reporting, we propose a novel evaluation framework - RadFact - leveraging the reasoning capabilities of large language models (LLMs). RadFact assesses the factuality of individual generated sentences, as well as correctness of generated spatial localisations when present. We introduce MAIRA-2, a large multimodal model combining a radiology-specific image encoder with a LLM, and trained for the new task of grounded report generation on chest X-rays. MAIRA-2 uses more comprehensive inputs than explored previously: the current frontal image, the current lateral image, the prior frontal image and prior report, as well as the Indication, Technique and Comparison sections of the current report. We demonstrate that these additions significantly improve report quality and reduce hallucinations, establishing a new state of the art on findings generation (without grounding) on MIMIC-CXR while demonstrating the feasibility of grounded reporting as a novel and richer task.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2406.04449

Country:

Asia (0.67)
Europe > United Kingdom (0.27)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.46)

Industry:

Health & Medicine > Nuclear Medicine (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Challenges for Responsible AI Design and Workflow Integration in Healthcare: A Case Study of Automatic Feeding Tube Qualification in Radiology

Thieme, Anja, Rajamohan, Abhijith, Cooper, Benjamin, Groombridge, Heather, Simister, Robert, Wong, Barney, Woznitza, Nicholas, Pinnock, Mark Ames, Wetscherek, Maria Teodora, Morrison, Cecily, Richardson, Hannah, Pérez-García, Fernando, Hyland, Stephanie L., Bannur, Shruthi, Castro, Daniel C., Bouzid, Kenza, Schwaighofer, Anton, Ranjit, Mercy, Sharma, Harshita, Lungren, Matthew P., Oktay, Ozan, Alvarez-Valle, Javier, Nori, Aditya, Harris, Stephen, Jacob, Joseph

arXiv.org Artificial IntelligenceMay-8-2024

Nasogastric tubes (NGTs) are feeding tubes that are inserted through the nose into the stomach to deliver nutrition or medication. If not placed correctly, they can cause serious harm, even death to patients. Recent AI developments demonstrate the feasibility of robustly detecting NGT placement from Chest X-ray images to reduce risks of sub-optimally or critically placed NGTs being missed or delayed in their detection, but gaps remain in clinical practice integration. In this study, we present a human-centered approach to the problem and describe insights derived following contextual inquiry and in-depth interviews with 15 clinical stakeholders. The interviews helped understand challenges in existing workflows, and how best to align technical capabilities with user needs and expectations. We discovered the trade-offs and complexities that need consideration when choosing suitable workflow stages, target users, and design configurations for different AI proposals. We explored how to balance AI benefits and risks for healthcare staff and patients within broader organizational and medical-legal constraints. We also identified data issues related to edge cases and data biases that affect model training and evaluation; how data documentation practices influence data preparation and labelling; and how to measure relevant AI outcomes reliably in future evaluations. We discuss how our work informs design and development of AI applications that are clinically useful, ethical, and acceptable in real-world healthcare services.

artificial intelligence, data mining, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2405.05299

Country:

North America > United States (1.00)
Asia (0.67)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre:

Workflow (1.00)
Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
(2 more...)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
(6 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)

Add feedback

RadEdit: stress-testing biomedical vision models via diffusion image editing

Pérez-García, Fernando, Bond-Taylor, Sam, Sanchez, Pedro P., van Breugel, Boris, Castro, Daniel C., Sharma, Harshita, Salvatelli, Valentina, Wetscherek, Maria T. A., Richardson, Hannah, Lungren, Matthew P., Nori, Aditya, Alvarez-Valle, Javier, Oktay, Ozan, Ilse, Maximilian

arXiv.org Artificial IntelligenceDec-21-2023

Biomedical imaging datasets are often small and biased, meaning that real-world performance of predictive models can be substantially lower than expected from internal testing. This work proposes using generative image editing to simulate dataset shifts and diagnose failure modes of biomedical vision models; this can be used in advance of deployment to assess readiness, potentially reducing cost and patient harm. Existing editing methods can produce undesirable changes, with spurious correlations learned due to the co-occurrence of disease and treatment interventions, limiting practical applicability. To address this, we train a text-to-image diffusion model on multiple chest X-ray datasets and introduce a new editing method RadEdit that uses multiple masks, if present, to constrain changes and ensure consistency in the edited images. We consider three types of dataset shifts: acquisition shift, manifestation shift, and population shift, and demonstrate that our approach can diagnose failures and quantify model robustness without additional data collection, complementing more qualitative tools for explainable AI.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2312.12865

Country:

North America > United States (0.28)
Europe (0.28)

Genre: Research Report > Experimental Study (0.34)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Nuclear Medicine (0.94)
Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

MAIRA-1: A specialised large multimodal model for radiology report generation

Hyland, Stephanie L., Bannur, Shruthi, Bouzid, Kenza, Castro, Daniel C., Ranjit, Mercy, Schwaighofer, Anton, Pérez-García, Fernando, Salvatelli, Valentina, Srivastav, Shaury, Thieme, Anja, Codella, Noel, Lungren, Matthew P., Wetscherek, Maria Teodora, Oktay, Ozan, Alvarez-Valle, Javier

arXiv.org Artificial IntelligenceNov-22-2023

We present a radiology-specific multimodal model for the task for generating radiological reports from chest X-rays (CXRs). Our work builds on the idea that large language model(s) can be equipped with multimodal capabilities through alignment with pre-trained vision encoders. On natural images, this has been shown to allow multimodal models to gain image understanding and description capabilities. Our proposed model (MAIRA-1) leverages a CXR-specific image encoder in conjunction with a fine-tuned large language model based on Vicuna-7B, and text-based data augmentation, to produce reports with state-of-the-art quality. In particular, MAIRA-1 significantly improves on the radiologist-aligned RadCliQ metric and across all lexical metrics considered. Manual review of model outputs demonstrates promising fluency and accuracy of generated reports while uncovering failure modes not captured by existing evaluation practices. More information and resources can be found on the project website: https://aka.ms/maira.

large language model, machine learning, maira-1, (22 more...)

arXiv.org Artificial Intelligence

2311.13668

Country:

North America > United States (0.14)
Asia (0.14)

Genre: Research Report > Experimental Study (0.68)

Industry:

Health & Medicine > Nuclear Medicine (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Exploring the Boundaries of GPT-4 in Radiology

Liu, Qianchu, Hyland, Stephanie, Bannur, Shruthi, Bouzid, Kenza, Castro, Daniel C., Wetscherek, Maria Teodora, Tinn, Robert, Sharma, Harshita, Pérez-García, Fernando, Schwaighofer, Anton, Rajpurkar, Pranav, Khanna, Sameer Tajdin, Poon, Hoifung, Usuyama, Naoto, Thieme, Anja, Nori, Aditya V., Lungren, Matthew P., Oktay, Ozan, Alvarez-Valle, Javier

arXiv.org Artificial IntelligenceOct-23-2023

The recent success of general-domain large language models (LLMs) has significantly changed the natural language processing paradigm towards a unified foundation model across domains and applications. In this paper, we focus on assessing the performance of GPT-4, the most capable LLM so far, on the text-based applications for radiology reports, comparing against state-of-the-art (SOTA) radiology-specific models. Exploring various prompting strategies, we evaluated GPT-4 on a diverse range of common radiology tasks and we found GPT-4 either outperforms or is on par with current SOTA radiology models. With zero-shot prompting, GPT-4 already obtains substantial gains ($\approx$ 10% absolute improvement) over radiology models in temporal sentence similarity classification (accuracy) and natural language inference ($F_1$). For tasks that require learning dataset-specific style or schema (e.g. findings summarisation), GPT-4 improves with example-based prompting and matches supervised SOTA. Our extensive error analysis with a board-certified radiologist shows GPT-4 has a sufficient level of radiology knowledge with only occasional errors in complex context that require nuanced domain knowledge. For findings summarisation, GPT-4 outputs are found to be overall comparable with existing manually-written impressions.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2310.14573

Country:

Europe (0.28)
Asia > China (0.28)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Nuclear Medicine (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

No Fair Lunch: A Causal Perspective on Dataset Bias in Machine Learning for Medical Imaging

Jones, Charles, Castro, Daniel C., Ribeiro, Fabio De Sousa, Oktay, Ozan, McCradden, Melissa, Glocker, Ben

arXiv.org Artificial IntelligenceJul-31-2023

As machine learning methods gain prominence within clinical decision-making, addressing fairness concerns becomes increasingly urgent. Despite considerable work dedicated to detecting and ameliorating algorithmic bias, today's methods are deficient with potentially harmful consequences. Our causal perspective sheds new light on algorithmic bias, highlighting how different sources of dataset bias may appear indistinguishable yet require substantially different mitigation strategies. We introduce three families of causal bias mechanisms stemming from disparities in prevalence, presentation, and annotation. Our causal analysis underscores how current mitigation methods tackle only a narrow and often unrealistic subset of scenarios. We provide a practical three-step framework for reasoning about fairness in medical imaging, supporting the development of safe and equitable AI prediction models.

artificial intelligence, disparity, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2307.16526

Country:

North America > United States (1.00)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Compositional Zero-Shot Domain Transfer with Text-to-Text Models

Liu, Fangyu, Liu, Qianchu, Bannur, Shruthi, Pérez-García, Fernando, Usuyama, Naoto, Zhang, Sheng, Naumann, Tristan, Nori, Aditya, Poon, Hoifung, Alvarez-Valle, Javier, Oktay, Ozan, Hyland, Stephanie L.

arXiv.org Artificial IntelligenceMar-23-2023

Label scarcity is a bottleneck for improving task performance in specialised domains. We propose a novel compositional transfer learning framework (DoT5 - domain compositional zero-shot T5) for zero-shot domain transfer. Without access to in-domain labels, DoT5 jointly learns domain knowledge (from MLM of unlabelled in-domain free text) and task knowledge (from task training on more readily available general-domain data) in a multi-task manner. To improve the transferability of task training, we design a strategy named NLGU: we simultaneously train NLG for in-domain label-to-data generation which enables data augmentation for self-finetuning and NLU for label prediction. We evaluate DoT5 on the biomedical domain and the resource-lean subdomain of radiology, focusing on NLI, text summarisation and embedding learning. DoT5 demonstrates the effectiveness of compositional transfer learning through multi-task learning. In particular, DoT5 outperforms the current SOTA in zero-shot transfer by over 7 absolute points in accuracy on RadNLI. We validate DoT5 with ablations and a case study demonstrating its ability to solve challenging NLI examples requiring in-domain expertise.

machine learning, natural language, summarisation, (20 more...)

arXiv.org Artificial Intelligence

2303.13386

Country:

Europe (1.00)
North America > United States (0.93)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Nuclear Medicine (0.70)
Health & Medicine > Diagnostic Medicine > Imaging (0.70)
Automobiles & Trucks > Manufacturer (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Learning to Exploit Temporal Structure for Biomedical Vision-Language Processing

Bannur, Shruthi, Hyland, Stephanie, Liu, Qianchu, Pérez-García, Fernando, Ilse, Maximilian, Castro, Daniel C., Boecking, Benedikt, Sharma, Harshita, Bouzid, Kenza, Thieme, Anja, Schwaighofer, Anton, Wetscherek, Maria, Lungren, Matthew P., Nori, Aditya, Alvarez-Valle, Javier, Oktay, Ozan

arXiv.org Artificial IntelligenceMar-16-2023

Self-supervised learning in vision-language processing exploits semantic alignment between imaging and text modalities. Prior work in biomedical VLP has mostly relied on the alignment of single image and report pairs even though clinical notes commonly refer to prior images. This does not only introduce poor alignment between the modalities but also a missed opportunity to exploit rich self-supervision through existing temporal content in the data. In this work, we explicitly account for prior images and reports when available during both training and fine-tuning. Our approach, named BioViL-T, uses a CNN-Transformer hybrid multi-image encoder trained jointly with a text model. It is designed to be versatile to arising challenges such as pose variations and missing input images across time. The resulting model excels on downstream tasks both in single- and multi-image setups, achieving state-of-the-art performance on (I) progression classification, (II) phrase grounding, and (III) report generation, whilst offering consistent improvements on disease classification and sentence-similarity tasks. We release a novel multi-modal temporal benchmark dataset, MS-CXR-T, to quantify the quality of vision-language representations in terms of temporal semantics. Our experimental results show the advantages of incorporating prior images and reports to make most use of the data.

artificial intelligence, biomedical vision-language processing, exploit temporal structure, (1 more...)

arXiv.org Artificial Intelligence

2301.04558

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.60)

Add feedback

Making the Most of Text Semantics to Improve Biomedical Vision--Language Processing

Boecking, Benedikt, Usuyama, Naoto, Bannur, Shruthi, Castro, Daniel C., Schwaighofer, Anton, Hyland, Stephanie, Wetscherek, Maria, Naumann, Tristan, Nori, Aditya, Alvarez-Valle, Javier, Poon, Hoifung, Oktay, Ozan

arXiv.org Artificial IntelligenceJul-21-2022

Multi-modal data abounds in biomedicine, such as radiology images and reports. Interpreting this data at scale is essential for improving clinical care and accelerating clinical research. Biomedical text with its complex semantics poses additional challenges in vision--language modelling compared to the general domain, and previous work has used insufficiently adapted models that lack domain-specific language understanding. In this paper, we show that principled textual semantic modelling can substantially improve contrastive learning in self-supervised vision--language processing. We release a language model that achieves state-of-the-art results in radiology natural language inference through its improved vocabulary and novel language pretraining objective leveraging semantics and discourse characteristics in radiology reports. Further, we propose a self-supervised joint vision--language approach with a focus on better text modelling. It establishes new state of the art results on a wide range of publicly available benchmarks, in part by leveraging our new domain-specific language model. We release a new dataset with locally-aligned phrase grounding annotations by radiologists to facilitate the study of complex semantic modelling in biomedical vision--language processing. A broad evaluation, including on this new dataset, shows that our contrastive learning approach, aided by textual-semantic modelling, outperforms prior methods in segmentation tasks, despite only using a global-alignment objective.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-031-20059-5_1

2204.09817

Country: North America > United States > Minnesota (0.28)

Genre:

Research Report > Experimental Study (0.48)
Research Report > New Finding (0.48)

Industry:

Health & Medicine > Nuclear Medicine (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.67)

Add feedback