S-Chain: Structured Visual Chain-of-Thought For Medicine

Le-Duc, Khai, Nguyen, Duy M. H., Trinh, Phuong T. H., Nguyen, Tien-Phat, Diep, Nghiem T., Ngo, An, Vu, Tung, Vuong, Trinh, Nguyen, Anh-Tien, Nguyen, Mau, Hoang, Van Trung, Nguyen, Khai-Nguyen, Nguyen, Hy, Ngo, Chris, Liu, Anji, Ho, Nhat, Hauschild, Anne-Christin, Nguyen, Khanh Xuan, Nguyen-Tang, Thanh, Xie, Pengtao, Sonntag, Daniel, Zou, James, Niepert, Mathias, Nguyen, Anh Totti

Oct-28-2025–arXiv.org Artificial Intelligence

Faithful reasoning in medical vision-language models (VLMs) requires not only accurate predictions but also transparent alignment between textual rationales and visual evidence. While Chain-of-Thought (CoT) prompting has shown promise in medical visual question answering (VQA), no large-scale expert-level dataset has captured stepwise reasoning with precise visual grounding. We introduce S-Chain, the first large-scale dataset of 12,000 expert-annotated medical images with bounding boxes and structured visual CoT (SV-CoT), explicitly linking visual regions to reasoning steps. The dataset further supports 16 languages, totaling over 700k VQA pairs for broad multilingual applicability. Using S-Chain, we benchmark state-of-the-art medical VLMs (ExGra-Med, LLaVA-Med) and general-purpose VLMs (Qwen2.5-VL, InternVL2.5), showing that SV-CoT supervision significantly improves interpretability, grounding fidelity, and robustness. Beyond benchmarking, we study its synergy with retrieval-augmented generation, revealing how domain knowledge and visual grounding interact during autoregressive reasoning. Finally, we propose a new mechanism that strengthens the alignment between visual evidence and reasoning, improving both reliability and efficiency. S-Chain establishes a new benchmark for grounded medical reasoning and paves the way toward more trustworthy and explainable medical VLMs.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

Oct-28-2025

arXiv.org PDF

Add feedback

Country:
- Asia (0.67)
- North America > United States
  - California (0.28)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Health & Medicine
  - Pharmaceuticals & Biotechnology (1.00)
  - Health Care Technology (1.00)
  - Diagnostic Medicine > Imaging (1.00)
  - Nuclear Medicine (0.93)
  - Therapeutic Area > Neurology
    - Alzheimer's Disease (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Representation & Reasoning (1.00)
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found