AITopics | Chen, Fangyuan

Collaborating Authors

Chen, Fangyuan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

RAG-Gym: Optimizing Reasoning and Search Agents with Process Supervision

Xiong, Guangzhi, Jin, Qiao, Wang, Xiao, Fang, Yin, Liu, Haolin, Yang, Yifan, Chen, Fangyuan, Song, Zhixing, Wang, Dengyu, Zhang, Minjia, Lu, Zhiyong, Zhang, Aidong

arXiv.org Artificial IntelligenceFeb-19-2025

Retrieval-augmented generation (RAG) has shown great potential for knowledge-intensive tasks, but its traditional architectures rely on static retrieval, limiting their effectiveness for complex questions that require sequential information-seeking. While agentic reasoning and search offer a more adaptive approach, most existing methods depend heavily on prompt engineering. In this work, we introduce RAG-Gym, a unified optimization framework that enhances information-seeking agents through fine-grained process supervision at each search step. We also propose ReSearch, a novel agent architecture that synergizes answer reasoning and search query generation within the RAG-Gym framework. Experiments on four challenging datasets show that RAG-Gym improves performance by up to 25.6\% across various agent architectures, with ReSearch consistently outperforming existing baselines. Further analysis highlights the effectiveness of advanced LLMs as process reward judges and the transferability of trained reward models as verifiers for different LLMs. Additionally, we examine the scaling properties of training and inference in agentic RAG. The project homepage is available at https://rag-gym.github.io/.

arxiv preprint arxiv, large language model, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2502.13957

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Illinois (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Hidden Flaws Behind Expert-Level Accuracy of GPT-4 Vision in Medicine

Jin, Qiao, Chen, Fangyuan, Zhou, Yiliang, Xu, Ziyang, Cheung, Justin M., Chen, Robert, Summers, Ronald M., Rousseau, Justin F., Ni, Peiyun, Landsman, Marc J, Baxter, Sally L., Al'Aref, Subhi J., Li, Yijia, Chiang, Michael F., Peng, Yifan, Lu, Zhiyong

arXiv.org Artificial IntelligenceJan-24-2024

Recent studies indicate that Generative Pre-trained Transformer 4 with Vision (GPT-4V) outperforms human physicians in medical challenge tasks. However, these evaluations primarily focused on the accuracy of multi-choice questions alone. Our study extends the current scope by conducting a comprehensive analysis of GPT-4V's rationales of image comprehension, recall of medical knowledge, and step-by-step multimodal reasoning when solving New England Journal of Medicine (NEJM) Image Challenges - an imaging quiz designed to test the knowledge and diagnostic capabilities of medical professionals. Evaluation results confirmed that GPT-4V outperforms human physicians regarding multi-choice accuracy (88.0% vs. 77.0%, p=0.034). GPT-4V also performs well in cases where physicians incorrectly answer, with over 80% accuracy. However, we discovered that GPT-4V frequently presents flawed rationales in cases where it makes the correct final choices (27.3%), most prominent in image comprehension (21.6%). Regardless of GPT-4V's high accuracy in multi-choice questions, our findings emphasize the necessity for further in-depth evaluations of its rationales before integrating such models into clinical workflows.

large language model, machine learning, serpentine supravenous hyperpigmentation, (22 more...)

arXiv.org Artificial Intelligence

2401.08396

Country:

Asia (0.92)
Africa (0.67)
North America > United States > New York > New York County > New York City (0.14)
North America > United States > California > San Diego County (0.13)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Water & Waste Management > Water Management (1.00)
Materials > Chemicals (1.00)
Health & Medicine > Therapeutic Area > Rheumatology (1.00)
(29 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.64)

Add feedback

PMC-Patients: A Large-scale Dataset of Patient Summaries and Relations for Benchmarking Retrieval-based Clinical Decision Support Systems

Zhao, Zhengyun, Jin, Qiao, Chen, Fangyuan, Peng, Tuorui, Yu, Sheng

arXiv.org Artificial IntelligenceApr-18-2023

Objective: Retrieval-based Clinical Decision Support (ReCDS) can aid clinical workflow by providing relevant literature and similar patients for a given patient. However, the development of ReCDS systems has been severely obstructed by the lack of diverse patient collections and publicly available large-scale patient-level annotation datasets. In this paper, we aim to define and benchmark two ReCDS tasks: Patient-to-Article Retrieval (ReCDS-PAR) and Patient-to-Patient Retrieval (ReCDS-PPR) using a novel dataset called PMC-Patients. Methods: We extract patient summaries from PubMed Central articles using simple heuristics and utilize the PubMed citation graph to define patient-article relevance and patient-patient similarity. We also implement and evaluate several ReCDS systems on the PMC-Patients benchmarks, including sparse retrievers, dense retrievers, and nearest neighbor retrievers. We conduct several case studies to show the clinical utility of PMC-Patients. Results: PMC-Patients contains 167k patient summaries with 3.1M patient-article relevance annotations and 293k patient-patient similarity annotations, which is the largest-scale resource for ReCDS and also one of the largest patient collections. Human evaluation and analysis show that PMC-Patients is a diverse dataset with high-quality annotations. The evaluation of various ReCDS systems shows that the PMC-Patients benchmark is challenging and calls for further research. Conclusion: We present PMC-Patients, a large-scale, diverse, and publicly available patient summary dataset with the largest-scale patient-level relation annotations. Based on PMC-Patients, we formally define two benchmark tasks for ReCDS systems and evaluate various existing retrieval methods. PMC-Patients can largely facilitate methodology research on ReCDS systems and shows real-world clinical utility.

bioinformatics, decision support system, machine learning, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1038/s41597-023-02814-8

2202.13876

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Japan > Honshū > Chūbu (0.14)

Genre: Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Gastroenterology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(2 more...)

Technology:

Information Technology > Decision Support Systems (1.00)
Information Technology > Biomedical Informatics (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(2 more...)

Add feedback