Wu, Junde
MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning
Pan, Jiazhen, Liu, Che, Wu, Junde, Liu, Fenglin, Zhu, Jiayuan, Li, Hongwei Bran, Chen, Chen, Ouyang, Cheng, Rueckert, Daniel
Reasoning is a critical frontier for advancing medical image analysis, where transparency and trustworthiness play a central role in both clinician trust and regulatory approval. Although Medical Visual Language Models (VLMs) show promise for radiological tasks, most existing VLMs merely produce final answers without revealing the underlying reasoning. To address this gap, we introduce MedVLM-R1, a medical VLM that explicitly generates natural language reasoning to enhance transparency and trustworthiness. Instead of relying on supervised fine-tuning (SFT), which often suffers from overfitting to training distributions and fails to foster genuine reasoning, MedVLM-R1 employs a reinforcement learning framework that incentivizes the model to discover human-interpretable reasoning paths without using any reasoning references. Despite limited training data (600 visual question answering samples) and model parameters (2B), MedVLM-R1 boosts accuracy from 55.11% to 78.22% across MRI, CT, and X-ray benchmarks, outperforming larger models trained on over a million samples. It also demonstrates robust domain generalization under out-of-distribution tasks. By unifying medical image analysis with explicit reasoning, MedVLM-R1 marks a pivotal step toward trustworthy and interpretable AI in clinical practice. Inference model is available at: https://huggingface.co/JZPeterPan/ MedVLM-R1.
Ask Patients with Patience: Enabling LLMs for Human-Centric Medical Dialogue with Grounded Reasoning
Zhu, Jiayuan, Wu, Junde
Accurate and efficient diagnosis in online medical consultations remains a challenge for current large language models. These models often rely on single-turn interactions and lack the ability to refine their predictions through follow-up questions. Additionally, their responses frequently contain complex medical terminology, making them less accessible to non-medical users and creating barriers to effective communication. In this paper, we introduce Ask Patients with Patience (APP), the first multi-turn dialogue that enables LLMs to iteratively refine diagnoses based on grounded reasoning. By integrating medical guidelines and entropy minimization, APP improves both diagnostic accuracy and efficiency. Furthermore, it features human-centric communication that bridges the gap between user comprehension and medical terminology, significantly enhancing user accessibility and engagement. We evaluated APP using a subset of the ReMeDi dataset, comparing it with single-turn and traditional multi-turn LLM baselines. APP achieved higher similarity scores in diagnosis predictions, demonstrating better alignment with ground truth diagnoses. Entropy analysis showed that APP reduces diagnostic uncertainty more rapidly across iterations, increasing confidence in its predictions. APP also excels in user accessibility and empathy, further bridging the gap between complex medical language and user understanding. Code will be released at: https://github.com/SuperMedIntel/AskPatients.
Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research
Wu, Junde, Zhu, Jiayuan, Liu, Yuyuan
We introduce Agentic Reasoning, a framework that enhances large language model (LLM) reasoning by integrating external tool-using agents. Unlike conventional LLM-based reasoning approaches, which rely solely on internal inference, Agentic Reasoning dynamically engages web search, code execution, and structured reasoning-context memory to solve complex problems requiring deep research and multi-step logical deduction. Our framework introduces the Mind Map agent, which constructs a structured knowledge graph to track logical relationships, improving deductive reasoning. Additionally, the integration of web-search and coding agents enables real-time retrieval and computational analysis, enhancing reasoning accuracy and decision-making. Evaluations on PhD-level scientific reasoning (GPQA) and domain-specific deep research tasks demonstrate that our approach significantly outperforms existing models, including leading retrieval-augmented generation (RAG) systems and closed-source LLMs. Moreover, our results indicate that agentic reasoning improves expert-level knowledge synthesis, test-time scalability, and structured problem-solving. The code is at: https://github.com/theworldofagents/Agentic-Reasoning.
Simplifying Low-Light Image Enhancement Networks with Relative Loss Functions
Zhang, Yu, Di, Xiaoguang, Wu, Junde, Fu, Rao, Li, Yong, Wang, Yue, Xu, Yanwu, Yang, Guohui, Wang, Chunhui
Image enhancement is a common technique used to mitigate issues such as severe noise, low brightness, low contrast, and color deviation in low-light images. However, providing an optimal high-light image as a reference for low-light image enhancement tasks is impossible, which makes the learning process more difficult than other image processing tasks. As a result, although several low-light image enhancement methods have been proposed, most of them are either too complex or insufficient in addressing all the issues in low-light images. In this paper, to make the learning easier in low-light image enhancement, we introduce FLW-Net (Fast and LightWeight Network) and two relative loss functions. Specifically, we first recognize the challenges of the need for a large receptive field to obtain global contrast and the lack of an absolute reference, which limits the simplification of network structures in this task. Then, we propose an efficient global feature information extraction component and two loss functions based on relative information to overcome these challenges. Finally, we conducted comparative experiments to demonstrate the effectiveness of the proposed method, and the results confirm that the proposed method can significantly reduce the complexity of supervised low-light image enhancement networks while improving processing effect. The code is available at \url{https://github.com/hitzhangyu/FLW-Net}.
Universal, transferable and targeted adversarial attacks
Wu, Junde, Fu, Rao
Deep Neural Network has been found vulnerable recently. A kind of well-designed inputs, which called adversarial examples, can lead the networks to make incorrect predictions. Depending on the different scenarios, goals and capabilities, the difficulty to generate the attack is different. For example, generating a targeted attack is more difficult than a non-targeted attack, a universal attack is more difficult than a non-universal attack, a transferable attack is more difficult than a nontransferable one. The question is: Is there exist an attack that can survival in the most harsh adversity to meet all these requirements. Although many cheap and effective attacks have been proposed, this question is still not completely solved over large models and large scale dataset. In this paper, we learn a universal mapping from the sources to the adversarial examples. These examples can fool classification networks into classifying all of them to one targeted class. Besides, they are also transferable between different models.