Dai, Haixing
Evaluation of OpenAI o1: Opportunities and Challenges of AGI
Zhong, Tianyang, Liu, Zhengliang, Pan, Yi, Zhang, Yutong, Zhou, Yifan, Liang, Shizhe, Wu, Zihao, Lyu, Yanjun, Shu, Peng, Yu, Xiaowei, Cao, Chao, Jiang, Hanqi, Chen, Hanxu, Li, Yiwei, Chen, Junhao, Hu, Huawen, Liu, Yihen, Zhao, Huaqin, Xu, Shaochen, Dai, Haixing, Zhao, Lin, Zhang, Ruidong, Zhao, Wei, Yang, Zhenyuan, Chen, Jingyuan, Wang, Peilong, Ruan, Wei, Wang, Hui, Zhao, Huan, Zhang, Jing, Ren, Yiming, Qin, Shihuan, Chen, Tong, Li, Jiaxi, Zidan, Arif Hassan, Jahin, Afrar, Chen, Minheng, Xia, Sichen, Holmes, Jason, Zhuang, Yan, Wang, Jiaqi, Xu, Bochen, Xia, Weiran, Yu, Jichao, Tang, Kaibo, Yang, Yaxuan, Sun, Bolun, Yang, Tao, Lu, Guoyu, Wang, Xianqiao, Chai, Lilong, Li, He, Lu, Jin, Sun, Lichao, Zhang, Xin, Ge, Bao, Hu, Xintao, Zhang, Lian, Zhou, Hua, Zhang, Lu, Zhang, Shu, Liu, Ninghao, Jiang, Bei, Kong, Linglong, Xiang, Zhen, Ren, Yudan, Liu, Jun, Jiang, Xi, Bao, Yu, Zhang, Wei, Li, Xiang, Li, Gang, Liu, Wei, Shen, Dinggang, Sikora, Andrea, Zhai, Xiaoming, Zhu, Dajiang, Liu, Tianming
This comprehensive study evaluates the performance of OpenAI's o1-preview large language model across a diverse array of complex reasoning tasks, spanning multiple domains, including computer science, mathematics, natural sciences, medicine, linguistics, and social sciences. Through rigorous testing, o1-preview demonstrated remarkable capabilities, often achieving human-level or superior performance in areas ranging from coding challenges to scientific reasoning and from language processing to creative problem-solving. Key findings include: -83.3% success rate in solving complex competitive programming problems, surpassing many human experts. -Superior ability in generating coherent and accurate radiology reports, outperforming other evaluated models. -100% accuracy in high school-level mathematical reasoning tasks, providing detailed step-by-step solutions. -Advanced natural language inference capabilities across general and specialized domains like medicine. -Impressive performance in chip design tasks, outperforming specialized models in areas such as EDA script generation and bug analysis. -Remarkable proficiency in anthropology and geology, demonstrating deep understanding and reasoning in these specialized fields. -Strong capabilities in quantitative investing. O1 has comprehensive financial knowledge and statistical modeling skills. -Effective performance in social media analysis, including sentiment analysis and emotion recognition. The model excelled particularly in tasks requiring intricate reasoning and knowledge integration across various fields. While some limitations were observed, including occasional errors on simpler problems and challenges with certain highly specialized concepts, the overall results indicate significant progress towards artificial general intelligence.
Revolutionizing Finance with LLMs: An Overview of Applications and Insights
Zhao, Huaqin, Liu, Zhengliang, Wu, Zihao, Li, Yiwei, Yang, Tianze, Shu, Peng, Xu, Shaochen, Dai, Haixing, Zhao, Lin, Mai, Gengchen, Liu, Ninghao, Liu, Tianming
In recent years, Large Language Models (LLMs) like ChatGPT have seen considerable advancements and have been applied in diverse fields. Built on the Transformer architecture, these models are trained on extensive datasets, enabling them to understand and generate human language effectively. In the financial domain, the deployment of LLMs is gaining momentum. These models are being utilized for automating financial report generation, forecasting market trends, analyzing investor sentiment, and offering personalized financial advice. Leveraging their natural language processing capabilities, LLMs can distill key insights from vast financial data, aiding institutions in making informed investment choices and enhancing both operational efficiency and customer satisfaction. In this study, we provide a comprehensive overview of the emerging integration of LLMs into various financial tasks. Additionally, we conducted holistic tests on multiple financial tasks through the combination of natural language instructions. Our findings show that GPT-4 effectively follow prompt instructions across various financial tasks. This survey and evaluation of LLMs in the financial domain aim to deepen the understanding of LLMs' current role in finance for both financial practitioners and LLM researchers, identify new research and application prospects, and highlight how these technologies can be leveraged to solve practical challenges in the finance industry.
Large Language Models for Robotics: Opportunities, Challenges, and Perspectives
Wang, Jiaqi, Wu, Zihao, Li, Yiwei, Jiang, Hanqi, Shu, Peng, Shi, Enze, Hu, Huawen, Ma, Chong, Liu, Yiheng, Wang, Xuhui, Yao, Yincheng, Liu, Xuan, Zhao, Huaqin, Liu, Zhengliang, Dai, Haixing, Zhao, Lin, Ge, Bao, Li, Xiang, Liu, Tianming, Zhang, Shu
Large language models (LLMs) have undergone significant expansion and have been increasingly integrated across various domains. Notably, in the realm of robot task planning, LLMs harness their advanced reasoning and language comprehension capabilities to formulate precise and efficient action plans based on natural language instructions. However, for embodied tasks, where robots interact with complex environments, text-only LLMs often face challenges due to a lack of compatibility with robotic visual perception. This study provides a comprehensive overview of the emerging integration of LLMs and multimodal LLMs into various robotic tasks. Additionally, we propose a framework that utilizes multimodal GPT-4V to enhance embodied task planning through the combination of natural language instructions and robot visual perceptions. Our results, based on diverse datasets, indicate that GPT-4V effectively enhances robot performance in embodied tasks. This extensive survey and evaluation of LLMs and multimodal LLMs across a variety of robotic tasks enriches the understanding of LLM-centric embodied intelligence and provides forward-looking insights toward bridging the gap in Human-Robot-Environment interaction.
DeID-GPT: Zero-shot Medical Text De-Identification by GPT-4
Liu, Zhengliang, Huang, Yue, Yu, Xiaowei, Zhang, Lu, Wu, Zihao, Cao, Chao, Dai, Haixing, Zhao, Lin, Li, Yiwei, Shu, Peng, Zeng, Fang, Sun, Lichao, Liu, Wei, Shen, Dinggang, Li, Quanzheng, Liu, Tianming, Zhu, Dajiang, Li, Xiang
The digitization of healthcare has facilitated the sharing and re-using of medical data but has also raised concerns about confidentiality and privacy. HIPAA (Health Insurance Portability and Accountability Act) mandates removing re-identifying information before the dissemination of medical records. Thus, effective and efficient solutions for de-identifying medical data, especially those in free-text forms, are highly needed. While various computer-assisted de-identification methods, including both rule-based and learning-based, have been developed and used in prior practice, such solutions still lack generalizability or need to be fine-tuned according to different scenarios, significantly imposing restrictions in wider use. The advancement of large language models (LLM), such as ChatGPT and GPT-4, have shown great potential in processing text data in the medical domain with zero-shot in-context learning, especially in the task of privacy protection, as these models can identify confidential information by their powerful named entity recognition (NER) capability. In this work, we developed a novel GPT4-enabled de-identification framework (``DeID-GPT") to automatically identify and remove the identifying information. Compared to existing commonly used medical text data de-identification methods, our developed DeID-GPT showed the highest accuracy and remarkable reliability in masking private information from the unstructured medical text while preserving the original structure and meaning of the text. This study is one of the earliest to utilize ChatGPT and GPT-4 for medical text data processing and de-identification, which provides insights for further research and solution development on the use of LLMs such as ChatGPT/GPT-4 in healthcare. Codes and benchmarking data information are available at https://github.com/yhydhx/ChatGPT-API.
Holistic Evaluation of GPT-4V for Biomedical Imaging
Liu, Zhengliang, Jiang, Hanqi, Zhong, Tianyang, Wu, Zihao, Ma, Chong, Li, Yiwei, Yu, Xiaowei, Zhang, Yutong, Pan, Yi, Shu, Peng, Lyu, Yanjun, Zhang, Lu, Yao, Junjie, Dong, Peixin, Cao, Chao, Xiao, Zhenxiang, Wang, Jiaqi, Zhao, Huan, Xu, Shaochen, Wei, Yaonai, Chen, Jingyuan, Dai, Haixing, Wang, Peilong, He, Hao, Wang, Zewei, Wang, Xinyu, Zhang, Xu, Zhao, Lin, Liu, Yiheng, Zhang, Kai, Yan, Liheng, Sun, Lichao, Liu, Jun, Qiang, Ning, Ge, Bao, Cai, Xiaoyan, Zhao, Shijie, Hu, Xintao, Yuan, Yixuan, Li, Gang, Zhang, Shu, Zhang, Xin, Jiang, Xi, Zhang, Tuo, Shen, Dinggang, Li, Quanzheng, Liu, Wei, Li, Xiang, Zhu, Dajiang, Liu, Tianming
In this paper, we present a large-scale evaluation probing GPT-4V's capabilities and limitations for biomedical image analysis. GPT-4V represents a breakthrough in artificial general intelligence (AGI) for computer vision, with applications in the biomedical domain. We assess GPT-4V's performance across 16 medical imaging categories, including radiology, oncology, ophthalmology, pathology, and more. Tasks include modality recognition, anatomy localization, disease diagnosis, report generation, and lesion detection. The extensive experiments provide insights into GPT-4V's strengths and weaknesses. Results show GPT-4V's proficiency in modality and anatomy recognition but difficulty with disease diagnosis and localization. GPT-4V excels at diagnostic report generation, indicating strong image captioning skills. While promising for biomedical imaging AI, GPT-4V requires further enhancement and validation before clinical deployment. We emphasize responsible development and testing for trustworthy integration of biomedical AGI. This rigorous evaluation of GPT-4V on diverse medical images advances understanding of multimodal large language models (LLMs) and guides future work toward impactful healthcare applications.
SAMAug: Point Prompt Augmentation for Segment Anything Model
Dai, Haixing, Ma, Chong, Liu, Zhengliang, Li, Yiwei, Shu, Peng, Wei, Xiaozheng, Zhao, Lin, Wu, Zihao, Zeng, Fang, Zhu, Dajiang, Liu, Wei, Li, Quanzheng, Liu, Tianming, Li, Xiang
This paper introduces SAMAug, a novel visual point augmentation method for the Segment Anything Model (SAM) that enhances interactive image segmentation performance. SAMAug generates augmented point prompts to provide more information about the user's intention to SAM. Starting with an initial point prompt, SAM produces an initial mask, which is then fed into our proposed SAMAug to generate augmented point prompts. By incorporating these extra points, SAM can generate augmented segmentation masks based on both the augmented point prompts and the initial prompt, resulting in improved segmentation performance. We conducted evaluations using four different point augmentation strategies: random sampling, sampling based on maximum difference entropy, maximum distance, and saliency. Experiment results on the COCO, Fundus, COVID QUEx, and ISIC2018 datasets show that SAMAug can boost SAM's segmentation results, especially using the maximum distance and saliency. SAMAug demonstrates the potential of visual prompt augmentation for computer vision. Codes of SAMAug are available at github.com/yhydhx/SAMAug
ChatRadio-Valuer: A Chat Large Language Model for Generalizable Radiology Report Generation Based on Multi-institution and Multi-system Data
Zhong, Tianyang, Zhao, Wei, Zhang, Yutong, Pan, Yi, Dong, Peixin, Jiang, Zuowei, Kui, Xiaoyan, Shang, Youlan, Yang, Li, Wei, Yaonai, Yang, Longtao, Chen, Hao, Zhao, Huan, Liu, Yuxiao, Zhu, Ning, Li, Yiwei, Wang, Yisong, Yao, Jiaqi, Wang, Jiaqi, Zeng, Ying, He, Lei, Zheng, Chao, Zhang, Zhixue, Li, Ming, Liu, Zhengliang, Dai, Haixing, Wu, Zihao, Zhang, Lu, Zhang, Shu, Cai, Xiaoyan, Hu, Xintao, Zhao, Shijie, Jiang, Xi, Zhang, Xin, Li, Xiang, Zhu, Dajiang, Guo, Lei, Shen, Dinggang, Han, Junwei, Liu, Tianming, Liu, Jun, Zhang, Tuo
Radiology report generation, as a key step in medical image analysis, is critical to the quantitative analysis of clinically informed decision-making levels. However, complex and diverse radiology reports with cross-source heterogeneity pose a huge generalizability challenge to the current methods under massive data volume, mainly because the style and normativity of radiology reports are obviously distinctive among institutions, body regions inspected and radiologists. Recently, the advent of large language models (LLM) offers great potential for recognizing signs of health conditions. To resolve the above problem, we collaborate with the Second Xiangya Hospital in China and propose ChatRadio-Valuer based on the LLM, a tailored model for automatic radiology report generation that learns generalizable representations and provides a basis pattern for model adaptation in sophisticated analysts' cases. Specifically, ChatRadio-Valuer is trained based on the radiology reports from a single institution by means of supervised fine-tuning, and then adapted to disease diagnosis tasks for human multi-system evaluation (i.e., chest, abdomen, muscle-skeleton, head, and maxillofacial $\&$ neck) from six different institutions in clinical-level events. The clinical dataset utilized in this study encompasses a remarkable total of \textbf{332,673} observations. From the comprehensive results on engineering indicators, clinical efficacy and deployment cost metrics, it can be shown that ChatRadio-Valuer consistently outperforms state-of-the-art models, especially ChatGPT (GPT-3.5-Turbo) and GPT-4 et al., in terms of the diseases diagnosis from radiology reports. ChatRadio-Valuer provides an effective avenue to boost model generalization performance and alleviate the annotation workload of experts to enable the promotion of clinical AI applications in radiology reports.
Radiology-Llama2: Best-in-Class Large Language Model for Radiology
Liu, Zhengliang, Li, Yiwei, Shu, Peng, Zhong, Aoxiao, Yang, Longtao, Ju, Chao, Wu, Zihao, Ma, Chong, Luo, Jie, Chen, Cheng, Kim, Sekeun, Hu, Jiang, Dai, Haixing, Zhao, Lin, Zhu, Dajiang, Liu, Jun, Liu, Wei, Shen, Dinggang, Liu, Tianming, Li, Quanzheng, Li, Xiang
This paper introduces Radiology-Llama2, a large language model specialized for radiology through a process known as instruction tuning. Radiology-Llama2 is based on the Llama2 architecture and further trained on a large dataset of radiology reports to generate coherent and clinically useful impressions from radiological findings. Quantitative evaluations using ROUGE metrics on the MIMIC-CXR and OpenI datasets demonstrate that Radiology-Llama2 achieves state-of-the-art performance compared to other generative language models, with a Rouge-1 score of 0.4834 on MIMIC-CXR and 0.4185 on OpenI. Additional assessments by radiology experts highlight the model's strengths in understandability, coherence, relevance, conciseness, and clinical utility. The work illustrates the potential of localized language models designed and tuned for specialized domains like radiology. When properly evaluated and deployed, such models can transform fields like radiology by automating rote tasks and enhancing human expertise.
Evaluating Large Language Models for Radiology Natural Language Processing
Liu, Zhengliang, Zhong, Tianyang, Li, Yiwei, Zhang, Yutong, Pan, Yi, Zhao, Zihao, Dong, Peixin, Cao, Chao, Liu, Yuxiao, Shu, Peng, Wei, Yaonai, Wu, Zihao, Ma, Chong, Wang, Jiaqi, Wang, Sheng, Zhou, Mengyue, Jiang, Zuowei, Li, Chunlin, Holmes, Jason, Xu, Shaochen, Zhang, Lu, Dai, Haixing, Zhang, Kai, Zhao, Lin, Chen, Yuanhao, Liu, Xu, Wang, Peilong, Yan, Pingkun, Liu, Jun, Ge, Bao, Sun, Lichao, Zhu, Dajiang, Li, Xiang, Liu, Wei, Cai, Xiaoyan, Hu, Xintao, Jiang, Xi, Zhang, Shu, Zhang, Xin, Zhang, Tuo, Zhao, Shijie, Li, Quanzheng, Zhu, Hongtu, Shen, Dinggang, Liu, Tianming
The rise of large language models (LLMs) has marked a pivotal shift in the field of natural language processing (NLP). LLMs have revolutionized a multitude of domains, and they have made a significant impact in the medical field. Large language models are now more abundant than ever, and many of these models exhibit bilingual capabilities, proficient in both English and Chinese. However, a comprehensive evaluation of these models remains to be conducted. This lack of assessment is especially apparent within the context of radiology NLP. This study seeks to bridge this gap by critically evaluating thirty two LLMs in interpreting radiology reports, a crucial component of radiology NLP. Specifically, the ability to derive impressions from radiologic findings is assessed. The outcomes of this evaluation provide key insights into the performance, strengths, and weaknesses of these LLMs, informing their practical applications within the medical domain.
PharmacyGPT: The AI Pharmacist
Liu, Zhengliang, Wu, Zihao, Hu, Mengxuan, Zhao, Bokai, Zhao, Lin, Zhang, Tianyi, Dai, Haixing, Chen, Xianyan, Shen, Ye, Li, Sheng, Murray, Brian, Liu, Tianming, Sikora, Andrea
In this study, we introduce PharmacyGPT, a novel framework to assess the capabilities of large language models (LLMs) such as ChatGPT and GPT-4 in emulating the role of clinical pharmacists. Our methodology encompasses the utilization of LLMs to generate comprehensible patient clusters, formulate medication plans, and forecast patient outcomes. We conduct our investigation using real data acquired from the intensive care unit (ICU) at the University of North Carolina Chapel Hill (UNC) Hospital. Our analysis offers valuable insights into the potential applications and limitations of LLMs in the field of clinical pharmacy, with implications for both patient care and the development of future AI-driven healthcare solutions. By evaluating the performance of PharmacyGPT, we aim to contribute to the ongoing discourse surrounding the integration of artificial intelligence in healthcare settings, ultimately promoting the responsible and efficacious use of such technologies.