evidence summarization
Leveraging Generative AI for Clinical Evidence Summarization Needs to Ensure Trustworthiness
Zhang, Gongbo, Jin, Qiao, McInerney, Denis Jered, Chen, Yong, Wang, Fei, Cole, Curtis L., Yang, Qian, Wang, Yanshan, Malin, Bradley A., Peleg, Mor, Wallace, Byron C., Lu, Zhiyong, Weng, Chunhua, Peng, Yifan
Evidence-based medicine promises to improve the quality of healthcare by empowering medical decisions and practices with the best available evidence. The rapid growth of medical evidence, which can be obtained from various sources, poses a challenge in collecting, appraising, and synthesizing the evidential information. Recent advancements in generative AI, exemplified by large language models, hold promise in facilitating the arduous task. However, developing accountable, fair, and inclusive models remains a complicated undertaking. In this perspective, we discuss the trustworthiness of generative AI in the context of automated summarization of medical evidence.
- North America > United States > New York > New York County > New York City (0.05)
- North America > Dominican Republic (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (3 more...)
- Research Report > Experimental Study (1.00)
- Research Report > Strength High (0.69)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- (2 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.93)
Evaluating Large Language Models: A Comprehensive Survey
Guo, Zishan, Jin, Renren, Liu, Chuang, Huang, Yufei, Shi, Dan, Supryadi, null, Yu, Linhao, Liu, Yan, Li, Jiaxuan, Xiong, Bojian, Xiong, Deyi
Large language models (LLMs) have demonstrated remarkable capabilities across a broad spectrum of tasks. They have attracted significant attention and been deployed in numerous downstream applications. Nevertheless, akin to a double-edged sword, LLMs also present potential risks. They could suffer from private data leaks or yield inappropriate, harmful, or misleading content. Additionally, the rapid progress of LLMs raises concerns about the potential emergence of superintelligent systems without adequate safeguards. To effectively capitalize on LLM capacities as well as ensure their safe and beneficial development, it is critical to conduct a rigorous and comprehensive evaluation of LLMs. This survey endeavors to offer a panoramic perspective on the evaluation of LLMs. We categorize the evaluation of LLMs into three major groups: knowledge and capability evaluation, alignment evaluation and safety evaluation. In addition to the comprehensive review on the evaluation methodologies and benchmarks on these three aspects, we collate a compendium of evaluations pertaining to LLMs' performance in specialized domains, and discuss the construction of comprehensive evaluation platforms that cover LLM evaluations on capabilities, alignment, safety, and applicability. We hope that this comprehensive overview will stimulate further research interests in the evaluation of LLMs, with the ultimate goal of making evaluation serve as a cornerstone in guiding the responsible development of LLMs. We envision that this will channel their evolution into a direction that maximizes societal benefit while minimizing potential risks. A curated list of related papers has been publicly available at https://github.com/tjunlp-lab/Awesome-LLMs-Evaluation-Papers.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
- North America > United States > Texas > Travis County > Austin (0.14)
- (55 more...)
- Research Report > New Finding (1.00)
- Overview (1.00)
- Research Report > Experimental Study (0.92)
- Media (1.00)
- Leisure & Entertainment (1.00)
- Law (1.00)
- (6 more...)