AITopics | Dou, Qi

Collaborating Authors

Dou, Qi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Surgical Action Planning with Large Language Models

Xu, Mengya, Huang, Zhongzhen, Zhang, Jie, Zhang, Xiaofan, Dou, Qi

arXiv.org Artificial IntelligenceMar-23-2025

In robot-assisted minimally invasive surgery, we introduce the Surgical Action Planning (SAP) task, which generates future action plans from visual inputs to address the absence of intraoperative predictive planning in current intelligent applications. SAP shows great potential for enhancing intraoperative guidance and automating procedures. However, it faces challenges such as understanding instrument-action relationships and tracking surgical progress. Large Language Models (LLMs) show promise in understanding surgical video content but remain underexplored for predictive decision-making in SAP, as they focus mainly on retrospective analysis. Challenges like data privacy, computational demands, and modality-specific constraints further highlight significant research gaps. To tackle these challenges, we introduce LLM-SAP, a Large Language Models-based Surgical Action Planning framework that predicts future actions and generates text responses by interpreting natural language prompts of surgical goals. The text responses potentially support surgical education, intraoperative decision-making, procedure documentation, and skill analysis. LLM-SAP integrates two novel modules: the Near-History Focus Memory Module (NHF-MM) for modeling historical states and the prompts factory for action planning. We evaluate LLM-SAP on our constructed CholecT50-SAP dataset using models like Qwen2.5 and Qwen2-VL, demonstrating its effectiveness in next-action prediction. Pre-trained LLMs are tested zero-shot, and supervised fine-tuning (SFT) with LoRA is implemented to address data privacy concerns. Our experiments show that Qwen2.5-72B-SFT surpasses Qwen2.5-72B with a 19.3% higher accuracy.

large language model, natural language, surgical action planning, (11 more...)

arXiv.org Artificial Intelligence

2503.18296

Country: Asia > China (0.48)

Genre: Research Report > New Finding (0.35)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.47)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

pFedFair: Towards Optimal Group Fairness-Accuracy Trade-off in Heterogeneous Federated Learning

Lei, Haoyu, Gong, Shizhan, Dou, Qi, Farnia, Farzan

arXiv.org Artificial IntelligenceMar-19-2025

Federated learning (FL) algorithms commonly aim to maximize clients' accuracy by training a model on their collective data. However, in several FL applications, the model's decisions should meet a group fairness constraint to be independent of sensitive attributes such as gender or race. While such group fairness constraints can be incorporated into the objective function of the FL optimization problem, in this work, we show that such an approach would lead to suboptimal classification accuracy in an FL setting with heterogeneous client distributions. To achieve an optimal accuracy-group fairness trade-off, we propose the Personalized Federated Learning for Client-Level Group Fairness (pFedFair) framework, where clients locally impose their fairness constraints over the distributed training process. Leveraging the image embedding models, we extend the application of pFedFair to computer vision settings, where we numerically show that pFedFair achieves an optimal group fairness-accuracy trade-off in heterogeneous FL settings. We present the results of several numerical experiments on benchmark and synthetic datasets, which highlight the suboptimality of non-personalized FL algorithms and the improvements made by the pFedFair method.

artificial intelligence, machine learning, optimization problem, (14 more...)

arXiv.org Artificial Intelligence

2503.14925

Country: Europe > Spain (0.14)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.66)

Add feedback

Diffusion Stabilizer Policy for Automated Surgical Robot Manipulations

Ho, Chonlam, Hu, Jianshu, Wang, Hesheng, Dou, Qi, Ban, Yutong

arXiv.org Artificial IntelligenceMar-3-2025

Diffusion Stabilizer Policy for Automated Surgical Robot Manipulations Chonlam Ho 1,, Jianshu Hu 1,, Hesheng Wang 2, Qi Dou 3, and Y utong Ban 1 Abstract -- Intelligent surgical robots have the potential to revolutionize clinical practice by enabling more precise and automated surgical procedures. However, the automation of such robot for surgical tasks remains under-explored compared to recent advancements in solving household manipulation tasks. These successes have been largely driven by (1) advanced models, such as transformers and diffusion models, and (2) large-scale data utilization. Aiming to extend these successes to the domain of surgical robotics, we propose a diffusion-based policy learning framework, called Diffusion Stabilizer Policy (DSP), which enables training with imperfect or even failed trajectories. Our approach consists of two stages: first, we train the diffusion stabilizer policy using only clean data. Then, the policy is continuously updated using a mixture of clean and perturbed data, with filtering based on the prediction error on actions. Comprehensive experiments conducted in various surgical environments demonstrate the superior performance of our method in perturbation-free settings and its robustness when handling perturbed demonstrations.

artificial intelligence, diffusion model, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2503.01252

Country: Asia > China (0.29)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Surgery (1.00)

Technology: Information Technology > Artificial Intelligence > Robots > Manipulation (0.60)

Add feedback

Local Superior Soups: A Catalyst for Model Merging in Cross-Silo Federated Learning

Chen, Minghui, Jiang, Meirui, Zhang, Xin, Dou, Qi, Wang, Zehua, Li, Xiaoxiao

arXiv.org Artificial IntelligenceOct-31-2024

Federated learning (FL) is a learning paradigm that enables collaborative training of models using decentralized data. Recently, the utilization of pre-trained weight initialization in FL has been demonstrated to effectively improve model performance. However, the evolving complexity of current pre-trained models, characterized by a substantial increase in parameters, markedly intensifies the challenges associated with communication rounds required for their adaptation to FL. To address these communication cost issues and increase the performance of pre-trained model adaptation in FL, we propose an innovative model interpolation-based local training technique called ``Local Superior Soups.'' Our method enhances local training across different clients, encouraging the exploration of a connected low-loss basin within a few communication rounds through regularized model interpolation. This approach acts as a catalyst for the seamless adaptation of pre-trained models in in FL. We demonstrated its effectiveness and efficiency across diverse widely-used FL datasets. Our code is available at \href{https://github.com/ubc-tea/Local-Superior-Soups}{https://github.com/ubc-tea/Local-Superior-Soups}.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.2366

Country:

North America > Canada (0.14)
North America > United States (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > Promising Solution (0.66)

Industry:

Information Technology (0.67)
Materials > Chemicals > Specialty Chemicals (0.60)
Education (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

Add feedback

SegSTRONG-C: Segmenting Surgical Tools Robustly On Non-adversarial Generated Corruptions -- An EndoVis'24 Challenge

Ding, Hao, Lu, Tuxun, Zhang, Yuqian, Liang, Ruixing, Shu, Hongchao, Seenivasan, Lalithkumar, Long, Yonghao, Dou, Qi, Gao, Cong, Unberath, Mathias

arXiv.org Artificial IntelligenceJul-16-2024

Accurate segmentation of tools in robot-assisted surgery is critical for machine perception, as it facilitates numerous downstream tasks including augmented reality feedback. While current feed-forward neural network-based methods exhibit excellent segmentation performance under ideal conditions, these models have proven susceptible to even minor corruptions, significantly impairing the model's performance. This vulnerability is especially problematic in surgical settings where predictions might be used to inform high-stakes decisions. To better understand model behavior under non-adversarial corruptions, prior work has explored introducing artificial corruptions, like Gaussian noise or contrast perturbation to test set images, to assess model robustness. However, these corruptions are either not photo-realistic or model/task agnostic. Thus, these investigations provide limited insights into model deterioration under realistic surgical corruptions. To address this limitation, we introduce the SegSTRONG-C challenge that aims to promote the development of algorithms robust to unforeseen but plausible image corruptions of surgery, like smoke, bleeding, and low brightness. We collect and release corruption-free mock endoscopic video sequences for the challenge participants to train their algorithms and benchmark them on video sequences with photo-realistic non-adversarial corruptions for a binary robot tool segmentation task. This new benchmark will allow us to carefully study neural network robustness to non-adversarial corruptions of surgery, thus constituting an important first step towards more robust models for surgical computer vision. In this paper, we describe the data collection and annotation protocol, baseline evaluations of established segmentation models, and data augmentation-based techniques to enhance model robustness.

artificial intelligence, corruption, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2407.11906

Country:

Asia > China (0.14)
North America > United States (0.14)
Europe > Spain (0.14)
Europe > Germany (0.14)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Surgery (1.00)
Health & Medicine > Health Care Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Weakly-supervised Medical Image Segmentation with Gaze Annotations

Zhong, Yuan, Tang, Chenhui, Yang, Yumeng, Qi, Ruoxi, Zhou, Kang, Gong, Yuqi, Heng, Pheng Ann, Hsiao, Janet H., Dou, Qi

arXiv.org Artificial IntelligenceJul-10-2024

Eye gaze that reveals human observational patterns has increasingly been incorporated into solutions for vision tasks. Despite recent explorations on leveraging gaze to aid deep networks, few studies exploit gaze as an efficient annotation approach for medical image segmentation which typically entails heavy annotating costs. In this paper, we propose to collect dense weak supervision for medical image segmentation with a gaze annotation scheme. To train with gaze, we propose a multi-level framework that trains multiple networks from discriminative human attention, simulated with a set of pseudo-masks derived by applying hierarchical thresholds on gaze heatmaps. Furthermore, to mitigate gaze noise, a cross-level consistency is exploited to regularize overfitting noisy labels, steering models toward clean patterns learned by peer networks. The proposed method is validated on two public medical datasets of polyp and prostate segmentation tasks. We contribute a high-quality gaze dataset entitled GazeMedSeg as an extension to the popular medical segmentation datasets. To the best of our knowledge, this is the first gaze dataset for medical image segmentation. Our experiments demonstrate that gaze annotation outperforms previous label-efficient annotation schemes in terms of both performance and annotation time.

artificial intelligence, machine learning, segmentation, (13 more...)

arXiv.org Artificial Intelligence

2407.07406

Country:

Asia > China (0.14)
Europe > Germany (0.14)
Europe > France (0.14)

Genre: Research Report (0.82)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Improving Segment Anything on the Fly: Auxiliary Online Learning and Adaptive Fusion for Medical Image Segmentation

Huang, Tianyu, Zhou, Tao, Xie, Weidi, Wang, Shuo, Dou, Qi, Zhang, Yizhe

arXiv.org Artificial IntelligenceJun-2-2024

The current variants of the Segment Anything Model (SAM), which include the original SAM and Medical SAM, still lack the capability to produce sufficiently accurate segmentation for medical images. In medical imaging contexts, it is not uncommon for human experts to rectify segmentations of specific test samples after SAM generates its segmentation predictions. These rectifications typically entail manual or semi-manual corrections employing state-of-the-art annotation tools. Motivated by this process, we introduce a novel approach that leverages the advantages of online machine learning to enhance Segment Anything (SA) during test time. We employ rectified annotations to perform online learning, with the aim of improving the segmentation quality of SA on medical images. To improve the effectiveness and efficiency of online learning when integrated with large-scale vision models like SAM, we propose a new method called Auxiliary Online Learning (AuxOL). AuxOL creates and applies a small auxiliary model (specialist) in conjunction with SAM (generalist), entails adaptive online-batch and adaptive segmentation fusion. Experiments conducted on eight datasets covering four medical imaging modalities validate the effectiveness of the proposed method. Our work proposes and validates a new, practical, and effective approach for enhancing SA on downstream segmentation tasks (e.g., medical image segmentation).

artificial intelligence, machine learning, segmentation, (14 more...)

arXiv.org Artificial Intelligence

2406.00956

Country: Asia > China (0.47)

Genre: Research Report (0.84)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Enterprise Applications > Human Resources > Learning Management (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Multi-objective Cross-task Learning via Goal-conditioned GPT-based Decision Transformers for Surgical Robot Task Automation

Fu, Jiawei, Long, Yonghao, Chen, Kai, Wei, Wang, Dou, Qi

arXiv.org Artificial IntelligenceMay-29-2024

Surgical robot task automation has been increasingly Furthermore, the introduction of task-specific rewards and studied for its potential to improve surgical efficiency and the loss of cross-task pretraining create varying internal augment robot intelligence. Recent advancements have witnessed dynamics across tasks, resulting in technical challenges in research on learning-based methods [1]-[5] to promote developing a unified framework for reasoning and decisionmaking automation of surgical robots. Still, current performances within the goal-reaching paradigm in surgical tasks. of the latest methods are impeded in long-horizon To leverage the advanced GPT-based decision-making goal-conditioned tasks, where a sequence of actions and substeps frameworks for improving surgical robot task automation, are required until reaching an ultimate goal. Previous we propose the goal-conditioned decision transformer that algorithms with reinforcement learning [6] and Markov decision embedds goal and time-to-goal as future indicators. Besides, process only predict actions from the current state while we formulate multiple training objectives: action prediction, overlooking information from historical sequential states and dynamics prediction, time-to-goal prediction, and sequence actions. This lacks temporal reasoning capability over actions reconstruction in our cross-task pretraining process, which and affects learning of the inherent sequential dynamics fosters a comprehensive representation of the temporal dynamics which is useful to the final success of a complex task. Despite inherent in goal-conditioned tasks and encourages some works [7], [8] combining task-specific strategies to the model to incorporate diverse temporal reasoning factors.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2405.18757

Country: Asia > China (0.14)

Genre: Research Report (0.82)

Industry: Health & Medicine > Surgery (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Ada-Tracker: Soft Tissue Tracking via Inter-Frame and Adaptive-Template Matching

Guo, Jiaxin, Wang, Jiangliu, Li, Zhaoshuo, Jia, Tongyu, Dou, Qi, Liu, Yun-Hui

arXiv.org Artificial IntelligenceMay-24-2024

Soft tissue tracking is crucial for computer-assisted interventions. Existing approaches mainly rely on extracting discriminative features from the template and videos to recover corresponding matches. However, it is difficult to adopt these techniques in surgical scenes, where tissues are changing in shape and appearance throughout the surgery. To address this problem, we exploit optical flow to naturally capture the pixel-wise tissue deformations and adaptively correct the tracked template. Specifically, we first implement an inter-frame matching mechanism to extract a coarse region of interest based on optical flow from consecutive frames. To accommodate appearance change and alleviate drift, we then propose an adaptive-template matching method, which updates the tracked template based on the reliability of the estimates. Our approach, Ada-Tracker, enjoys both short-term dynamics modeling by capturing local deformations and long-term dynamics modeling by introducing global temporal compensation. We evaluate our approach on the public SurgT benchmark, which is generated from Hamlyn, SCARED, and Kidney boundary datasets. The experimental results show that Ada-Tracker achieves superior accuracy and performs more robustly against prior works. Code is available at https://github.com/wrld/Ada-Tracker.

artificial intelligence, machine learning, optical flow, (15 more...)

arXiv.org Artificial Intelligence

2403.06479

Country: Asia > China (0.47)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Surgery (0.93)
Health & Medicine > Therapeutic Area (0.68)
Health & Medicine > Health Care Technology (0.68)
Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Efficient Data-driven Scene Simulation using Robotic Surgery Videos via Physics-embedded 3D Gaussians

Yang, Zhenya, Chen, Kai, Long, Yonghao, Dou, Qi

arXiv.org Artificial IntelligenceMay-20-2024

Surgical scene simulation plays a crucial role in surgical education and simulator-based robot learning. Traditional approaches for creating these environments with surgical scene involve a labor-intensive process where designers hand-craft tissues models with textures and geometries for soft body simulations. This manual approach is not only time-consuming but also limited in the scalability and realism. In contrast, data-driven simulation offers a compelling alternative. It has the potential to automatically reconstruct 3D surgical scenes from real-world surgical video data, followed by the application of soft body physics. This area, however, is relatively uncharted. In our research, we introduce 3D Gaussian as a learnable representation for surgical scene, which is learned from stereo endoscopic video. To prevent over-fitting and ensure the geometrical correctness of these scenes, we incorporate depth supervision and anisotropy regularization into the Gaussian learning process. Furthermore, we apply the Material Point Method, which is integrated with physical properties, to the 3D Gaussians to achieve realistic scene deformations. Our method was evaluated on our collected in-house and public surgical videos datasets. Results show that it can reconstruct and simulate surgical scenes from endoscopic videos efficiently-taking only a few minutes to reconstruct the surgical scene-and produce both visually and physically plausible deformations at a speed approaching real-time. The results demonstrate great potential of our proposed method to enhance the efficiency and variety of simulations available for surgical education and robot learning.

artificial intelligence, gaussian, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2405.00956

Country:

Asia > China (0.14)
North America > United States (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Surgery (1.00)
Health & Medicine > Health Care Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback