Zhao, Yuhang
GPBench: A Comprehensive and Fine-Grained Benchmark for Evaluating Large Language Models as General Practitioners
Li, Zheqing, Yang, Yiying, Lang, Jiping, Jiang, Wenhao, Zhao, Yuhang, Li, Shuang, Wang, Dingqian, Lin, Zhu, Li, Xuanna, Tang, Yuze, Qiu, Jiexian, Lu, Xiaolin, Yu, Hongji, Chen, Shuang, Bi, Yuhua, Zeng, Xiaofei, Chen, Yixian, Chen, Junrong, Yao, Lin
General practitioners (GPs) serve as the cornerstone of primary healthcare systems by providing continuous and comprehensive medical services. However, due to community-oriented nature of their practice, uneven training and resource gaps, the clinical proficiency among GPs can vary significantly across regions and healthcare settings. Currently, Large Language Models (LLMs) have demonstrated great potential in clinical and medical applications, making them a promising tool for supporting general practice. However, most existing benchmarks and evaluation frameworks focus on exam-style assessments-typically multiple-choice question-lack comprehensive assessment sets that accurately mirror the real-world scenarios encountered by GPs. To evaluate how effectively LLMs can make decisions in the daily work of GPs, we designed GPBench, which consists of both test questions from clinical practice and a novel evaluation framework. The test set includes multiple-choice questions that assess fundamental knowledge of general practice, as well as realistic, scenario-based problems. All questions are meticulously annotated by experts, incorporating rich fine-grained information related to clinical management. The proposed LLM evaluation framework is based on the competency model for general practice, providing a comprehensive methodology for assessing LLM performance in real-world settings. As the first large-model evaluation set targeting GP decision-making scenarios, GPBench allows us to evaluate current mainstream LLMs. Expert assessment and evaluation reveal that in areas such as disease staging, complication recognition, treatment detail, and medication usage, these models exhibit at least ten major shortcomings. Overall, existing LLMs are not yet suitable for independent use in real-world GP working scenarios without human oversight.
"This really lets us see the entire world:" Designing a conversational telepresence robot for homebound older adults
Hu, Yaxin, Stegner, Laura, Kotturi, Yasmine, Zhang, Caroline, Peng, Yi-Hao, Huq, Faria, Zhao, Yuhang, Bigham, Jeffrey P., Mutlu, Bilge
In this paper, we explore the design and use of conversational telepresence robots to help homebound older adults interact with the external world. An initial needfinding study (N=8) using video vignettes revealed older adults' experiential needs for robot-mediated remote experiences such as exploration, reminiscence and social participation. We then designed a prototype system to support these goals and conducted a technology probe study (N=11) to garner a deeper understanding of user preferences for remote experiences. The study revealed user interactive patterns in each desired experience, highlighting the need of robot guidance, social engagements with the robot and the remote bystanders. Our work identifies a novel design space where conversational telepresence robots can be used to foster meaningful interactions in the remote physical environment. We offer design insights into the robot's proactive role in providing guidance and using dialogue to create personalized, contextualized and meaningful experiences.
Structure design and coordinated motion analysis of bionic crocodile robot
Wang, Jun, Zheng, Jingya, Zhao, Yuhang, Yang, Kai
Crocodiles, known as one of the oldest and most resilient species on Earth, have demonstrated remarkable locomotor abilities both on land and in water, evolving over millennia to adapt to diverse environments. In this paper, we draw inspiration from crocodiles and introduce a highly biomimetic crocodile robot equipped with multiple degrees of freedom and articulated trunk joints. This design is based on a comprehensive analysis of the structural and motion characteristics observed in real crocodiles. The bionic crocodile robot has the problem of limb-torso incoordination during movement, in order to solve this problem, we apply the D-H method for both forward and inverse kinematics analysis of the robot's legs and spine. Through a series of simulation experiments, we investigate the robot's stability of motion, fault tolerance, and adaptability to the environment in two motor pattern: with and without the involvement of the spine and tail in its movements. Experiment results demonstrate that the bionic crocodile robot exhibits superior motion performance when the spine and tail cooperate with the extremities. This research not only showcases the potential of biomimicry in robotics but also underscores the significance of understanding how nature's designs can inform and enhance our technological innovations.
Artificial Intelligence Security Competition (AISC)
Dong, Yinpeng, Chen, Peng, Deng, Senyou, L, Lianji, Sun, Yi, Zhao, Hanyu, Li, Jiaxing, Tan, Yunteng, Liu, Xinyu, Dong, Yangyi, Xu, Enhui, Xu, Jincai, Xu, Shu, Fu, Xuelin, Sun, Changfeng, Han, Haoliang, Zhang, Xuchong, Chen, Shen, Sun, Zhimin, Cao, Junyi, Yao, Taiping, Ding, Shouhong, Wu, Yu, Lin, Jian, Wu, Tianpeng, Wang, Ye, Fu, Yu, Feng, Lin, Gao, Kangkang, Liu, Zeyu, Pang, Yuanzhe, Duan, Chengqi, Zhou, Huipeng, Wang, Yajie, Zhao, Yuhang, Wu, Shangbo, Lyu, Haoran, Lin, Zhiyu, Gao, Yifei, Li, Shuang, Wang, Haonan, Sang, Jitao, Ma, Chen, Zheng, Junhao, Li, Yijia, Shen, Chao, Lin, Chenhao, Cui, Zhichao, Liu, Guoshuai, Shi, Huafeng, Hu, Kun, Zhang, Mengxin
The security of artificial intelligence (AI) is an important research area towards safe, reliable, and trustworthy AI systems. To accelerate the research on AI security, the Artificial Intelligence Security Competition (AISC) was organized by the Zhongguancun Laboratory, China Industrial Control Systems Cyber Emergency Response Team, Institute for Artificial Intelligence, Tsinghua University, and RealAI as part of the Zhongguancun International Frontier Technology Innovation Competition (https://www.zgc-aisc.com/en). The competition consists of three tracks, including Deepfake Security Competition, Autonomous Driving Security Competition, and Face Recognition Security Competition. This report will introduce the competition rules of these three tracks and the solutions of top-ranking teams in each track.