Zhao, Yongqiang
Enhancing LLM Generation with Knowledge Hypergraph for Evidence-Based Medicine
Dou, Chengfeng, Zhang, Ying, Jin, Zhi, Jiao, Wenpin, Zhao, Haiyan, Zhao, Yongqiang, Tao, Zhengwei
Evidence-based medicine (EBM) plays a crucial role in the application of large language models (LLMs) in healthcare, as it provides reliable support for medical decision-making processes. Although it benefits from current retrieval-augmented generation~(RAG) technologies, it still faces two significant challenges: the collection of dispersed evidence and the efficient organization of this evidence to support the complex queries necessary for EBM. To tackle these issues, we propose using LLMs to gather scattered evidence from multiple sources and present a knowledge hypergraph-based evidence management model to integrate these evidence while capturing intricate relationships. Furthermore, to better support complex queries, we have developed an Importance-Driven Evidence Prioritization (IDEP) algorithm that utilizes the LLM to generate multiple evidence features, each with an associated importance score, which are then used to rank the evidence and produce the final retrieval results. Experimental results from six datasets demonstrate that our approach outperforms existing RAG techniques in application domains of interest to EBM, such as medical quizzing, hallucination detection, and decision support. Testsets and the constructed knowledge graph can be accessed at \href{https://drive.google.com/file/d/1WJ9QTokK3MdkjEmwuFQxwH96j_Byawj_/view?usp=drive_link}{https://drive.google.com/rag4ebm}.
General Force Sensation for Tactile Robot
Chen, Zhuo, Ou, Ni, Zhang, Xuyang, Wu, Zhiyuan, Zhao, Yongqiang, Wang, Yupeng, Lepora, Nathan, Jamone, Lorenzo, Deng, Jiankang, Luo, Shan
Robotic tactile sensors, including vision-based and taxel-based sensors, enable agile manipulation and safe human-robot interaction through force sensation. However, variations in structural configurations, measured signals, and material properties create domain gaps that limit the transferability of learned force sensation across different tactile sensors. Here, we introduce GenForce, a general framework for achieving transferable force sensation across both homogeneous and heterogeneous tactile sensors in robotic systems. By unifying tactile signals into marker-based binary tactile images, GenForce enables the transfer of existing force labels to arbitrary target sensors using a marker-to-marker translation technique with a few paired data. This process equips uncalibrated tactile sensors with force prediction capabilities through spatiotemporal force prediction models trained on the transferred data. Extensive experimental results validate GenForce's generalizability, accuracy, and robustness across sensors with diverse marker patterns, structural designs, material properties, and sensing principles. The framework significantly reduces the need for costly and labor-intensive labeled data collection, enabling the rapid deployment of multiple tactile sensors on robotic hands requiring force sensing capabilities.
Exploring LLM-based Data Annotation Strategies for Medical Dialogue Preference Alignment
Dou, Chengfeng, Zhang, Ying, Jin, Zhi, Jiao, Wenpin, Zhao, Haiyan, Zhao, Yongqiang, Tao, Zhengwei
This research examines the use of Reinforcement Learning from AI Feedback (RLAIF) techniques to improve healthcare dialogue models, with the aim of tackling the challenges of preference-aligned data annotation while reducing the reliance on medical experts. We argue that the primary challenges in current RLAIF research for healthcare are the limitations of automated evaluation methods and the difficulties in accurately representing physician preferences. To address these challenges, we present a new evaluation framework based on standardized patient examinations. This framework is designed to objectively assess the effectiveness of large language models (LLMs) in guiding users and following instructions, enabling a comprehensive comparison across different models. Furthermore, our investigation of effective ways to express physician preferences using Constitutional AI algorithms highlighted the particular effectiveness of flowcharts. Utilizing this finding, we introduce an innovative agent-based approach for annotating preference data. This approach autonomously creates medical dialogue flows tailored to the patient's condition, demonstrates strong generalization abilities, and reduces the need for expert involvement. Our results show that the agent-based approach outperforms existing RLAIF annotation methods in standardized patient examinations and surpasses current open source medical dialogue LLMs in various test scenarios.
FOTS: A Fast Optical Tactile Simulator for Sim2Real Learning of Tactile-motor Robot Manipulation Skills
Zhao, Yongqiang, Qian, Kun, Duan, Boyi, Luo, Shan
Simulation is a widely used tool in robotics to reduce hardware consumption and gather large-scale data. Despite previous efforts to simulate optical tactile sensors, there remain challenges in efficiently synthesizing images and replicating marker motion under different contact loads. In this work, we propose a fast optical tactile simulator, named FOTS, for simulating optical tactile sensors. We utilize multi-layer perceptron mapping and planar shadow generation to simulate the optical response, while employing marker distribution approximation to simulate the motion of surface markers caused by the elastomer deformation. Experimental results demonstrate that FOTS outperforms other methods in terms of image generation quality and rendering speed, achieving 28.6 fps for optical simulation and 326.1 fps for marker motion simulation on a single CPU without GPU acceleration. In addition, we integrate the FOTS simulation model with physical engines like MuJoCo, and the peg-in-hole task demonstrates the effectiveness of our method in achieving zero-shot Sim2Real learning of tactile-motor robot manipulation skills. Our code is available at https://github.com/Rancho-zhao/FOTS.
Integrating Physician Diagnostic Logic into Large Language Models: Preference Learning from Process Feedback
Dou, Chengfeng, Jin, Zhi, Jiao, Wenpin, Zhao, Haiyan, Zhao, Yongqiang, Tao, Zhenwei
The use of large language models in medical dialogue generation has garnered significant attention, with a focus on improving response quality and fluency. While previous studies have made progress in optimizing model performance for single-round medical Q&A tasks, there is a need to enhance the model's capability for multi-round conversations to avoid logical inconsistencies. To address this, we propose an approach called preference learning from process feedback~(PLPF), which integrates the doctor's diagnostic logic into LLMs. PLPF involves rule modeling, preference data generation, and preference alignment to train the model to adhere to the diagnostic process. Experimental results using Standardized Patient Testing show that PLPF enhances the diagnostic accuracy of the baseline model in medical conversations by 17.6%, outperforming traditional reinforcement learning from human feedback. Additionally, PLPF demonstrates effectiveness in both multi-round and single-round dialogue tasks, showcasing its potential for improving medical dialogue generation.
Enhancing the Spatial Awareness Capability of Multi-Modal Large Language Model
Zhao, Yongqiang, Li, Zhenyu, Jin, Zhi, Zhang, Feng, Zhao, Haiyan, Dou, Chengfeng, Tao, Zhengwei, Xu, Xinhai, Liu, Donghong
The Multi-Modal Large Language Model (MLLM) refers to an extension of the Large Language Model (LLM) equipped with the capability to receive and infer multi-modal data. Spatial awareness stands as one of the crucial abilities of MLLM, encompassing diverse skills related to understanding spatial relationships among objects and between objects and the scene area. Industries such as autonomous driving, smart healthcare, robotics, virtual, and augmented reality heavily demand MLLM's spatial awareness capabilities. However, there exists a noticeable gap between the current spatial awareness capabilities of MLLM and the requirements set by human needs. To address this issue, this paper proposes using more precise spatial position information between objects to guide MLLM in providing more accurate responses to user-related inquiries. Specifically, for a particular multi-modal task, we utilize algorithms for acquiring geometric spatial information and scene graphs to obtain relevant geometric spatial information and scene details of objects involved in the query. Subsequently, based on this information, we direct MLLM to address spatial awareness-related queries posed by the user. Extensive experiments were conducted in benchmarks such as MME, MM-Vet, and other multi-modal large language models. The experimental results thoroughly confirm the efficacy of the proposed method in enhancing the spatial awareness tasks and associated tasks of MLLM.
PlugMed: Improving Specificity in Patient-Centered Medical Dialogue Generation using In-Context Learning
Dou, Chengfeng, Jin, Zhi, Jiao, Wenping, Zhao, Haiyan, Tao, Zhenwei, Zhao, Yongqiang
The patient-centered medical dialogue systems strive to offer diagnostic interpretation services to users who are less knowledgeable about medical knowledge, through emphasizing the importance of providing responses specific to the patients. It is difficult for the large language models (LLMs) to guarantee the specificity of responses in spite of its promising performance even in some tasks in medical field. Inspired by in-context learning, we propose PlugMed, a Plug-and-Play Medical Dialogue System, for addressing this challenge. PlugMed is equipped with two modules, the prompt generation (PG) module and the response ranking (RR) module, to enhances LLMs' dialogue strategies for improving the specificity of the dialogue. The PG module is designed to stimulate the imitative ability of LLMs by providing them with real dialogues from similar patients as prompts. The RR module incorporates fine-tuned small model as response filter to enable the selection of appropriate responses generated by LLMs. Furthermore, we introduce a new evaluation method based on matching both user's intent and high-frequency medical term to effectively assess the specificity of the responses. We conduct experimental evaluations on three medical dialogue datasets, and the results, including both automatic and human evaluation, demonstrate the effectiveness of our approach.
Enhancing Subtask Performance of Multi-modal Large Language Model
Zhao, Yongqiang, Li, Zhenyu, Zhang, Feng, Xu, Xinhai, Liu, Donghong
Multi-modal Large Language Model (MLLM) refers to a model expanded from a Large Language Model (LLM) that possesses the capability to handle and infer multi-modal data. Current MLLMs typically begin by using LLMs to decompose tasks into multiple subtasks, then employing individual pre-trained models to complete specific subtasks, and ultimately utilizing LLMs to integrate the results of each subtasks to obtain the results of the task. In real-world scenarios, when dealing with large projects, it is common practice to break down the project into smaller sub-projects, with different teams providing corresponding solutions or results. The project owner then decides which solution or result to use, ensuring the best possible outcome for each subtask and, consequently, for the entire project. Inspired by this, this study considers selecting multiple pre-trained models to complete the same subtask. By combining the results from multiple pre-trained models, the optimal subtask result is obtained, enhancing the performance of the MLLM. Specifically, this study first selects multiple pre-trained models focused on the same subtask based on distinct evaluation approaches, and then invokes these models in parallel to process input data and generate corresponding subtask results. Finally, the results from multiple pre-trained models for the same subtask are compared using the LLM, and the best result is chosen as the outcome for that subtask. Extensive experiments are conducted in this study using GPT-4 annotated datasets and human-annotated datasets. The results of various evaluation metrics adequately demonstrate the effectiveness of the proposed approach in this paper.