Wang, Jinqiang
TourLLM: Enhancing LLMs with Tourism Knowledge
Wei, Qikai, Yang, Mingzhi, Wang, Jinqiang, Mao, Wenwei, Xu, Jiabo, Ning, Huansheng
Recently, large language models (LLMs) have demonstrated their effectiveness in various natural language processing (NLP) tasks. However, the lack of tourism knowledge limits the performance of LLMs in tourist attraction presentations and travel planning. To address this challenge, we constructed a supervised fine-tuning dataset for the culture and tourism domain, named Cultour. This dataset consists of three parts: tourism knowledge base QA data, travelogues data, and tourism diversity QA data. Additionally, we propose TourLLM, a Qwen-based model supervised fine-tuned with Cultour, to improve the quality of the information provided about attractions and travel planning. To evaluate the performance of TourLLM, we employed both automatic and human evaluation, and we proposed a human evaluation criterion named CRA (Consistency, Readability, Availability). The experimental results demonstrate the effectiveness of the responses generated by the TourLLM. Our proposed Cultour is accessible at https://github.com/mrweiqk/Cultour.
A Survey on Large Language Models from General Purpose to Medical Applications: Datasets, Methodologies, and Evaluations
Wang, Jinqiang, Ning, Huansheng, Peng, Yi, Wei, Qikai, Tesfai, Daniel, Mao, Wenwei, Zhu, Tao, Huang, Runhe
Large Language Models (LLMs) have demonstrated surprising performance across various natural language processing tasks. Recently, medical LLMs enhanced with domain-specific knowledge have exhibited excellent capabilities in medical consultation and diagnosis. These models can smoothly simulate doctor-patient dialogues and provide professional medical advice. Most medical LLMs are developed through continued training of open-source general LLMs, which require significantly fewer computational resources than training LLMs from scratch. Additionally, this approach offers better protection of patient privacy compared to API-based solutions. This survey systematically explores how to train medical LLMs based on general LLMs. It covers: (a) how to acquire training corpus and construct customized medical training sets, (b) how to choose a appropriate training paradigm, (c) how to choose a suitable evaluation benchmark, and (d) existing challenges and promising future research directions are discussed. This survey can provide guidance for the development of LLMs focused on various medical applications, such as medical education, diagnostic planning, and clinical assistants.
Language-assisted Vision Model Debugger: A Sample-Free Approach to Finding Bugs
Jiang, Chaoquan, Wang, Jinqiang, Hu, Rui, Sang, Jitao
Vision models with high overall accuracy often exhibit systematic errors in specific scenarios, posing potential serious safety concerns. Diagnosing bugs of vision models is gaining increased attention, however traditional diagnostic approaches require annotation efforts (\eg rich metadata accompanying each samples of CelebA). To address this issue,We propose a language-assisted diagnostic method that uses texts instead of images to diagnose bugs in vision models based on multi-modal models (\eg CLIP). Our approach connects the embedding space of CLIP with the buggy vision model to be diagnosed; meanwhile, utilizing a shared classifier and the cross-modal transferability of embedding space from CLIP, the text-branch of CLIP become a proxy model to find bugs in the buggy model. The proxy model can classify texts paired with images. During the diagnosis, a Large Language Model (LLM) is employed to obtain task-relevant corpora, and this corpora is used to extract keywords. Descriptions constructed with templates containing these keywords serve as input text to probe errors in the proxy model. Finally, we validate the ability to diagnose existing visual models using language on the Waterbirds and CelebA datasets, we can identify bugs comprehensible to human experts, uncovering not only known bugs but also previously unknown ones.
The coupling effect between the environment and strategies drives the emergence of group cooperation
Di, Changyan, Zhou, Qingguo, Shen, Jun, Wang, Jinqiang, Zhou, Rui, Wang, Tianyi
The coupling effect between the environment and strategies drives the emergence of group cooperation Changyan Di, Qingguo Zhou, Jun Shen, Jinqiang Wang, Rui Zhou, Tianyi Wang The coupling effect between macro environment and individual behavior is the key factor to solve the social dilemma. In a static environment, rewards of different strategies are compared simultaneously, leading to a social dilemma due to the higher payoff of defection compared to cooperation. However, when individuals are placed in a dynamic environment that is coupled with their actions, we find that the expected payoffs of different strategies are not fixed but undergo dynamic changes. The higher expected payoff of defection can be diluted over time due to environmental degradation caused by an excessive number of defectors, while cooperation may become the dominant strategy if positively reinforced by environmental feedback. Group cooperation emerges as a direct result of a mutually reinforcing positive feedback loop among the environment, immediate rewards, and individual actions (or group states). Despite the agents' lack of awareness regarding the macro-level context, they possess the ability to astutely discern the inflection point of the environment solely through their rewards. This pivotal moment prompts agents to experience a surge in immediate rewards, thereby triggering a positive feedback loop among the environment, their rewards, and their current actions. Consequently, cooperation emerges within the group.
Understanding and Testing Generalization of Deep Networks on Out-of-Distribution Data
Hu, Rui, Sang, Jitao, Wang, Jinqiang, Hu, Rui, Jiang, Chaoquan
Deep network models perform excellently on In-Distribution (ID) data, but can significantly fail on Out-Of-Distribution (OOD) data. While developing methods focus on improving OOD generalization, few attention has been paid to evaluating the capability of models to handle OOD data. This study is devoted to analyzing the problem of experimental ID test and designing OOD test paradigm to accurately evaluate the practical performance. Our analysis is based on an introduced categorization of three types of distribution shifts to generate OOD data. Main observations include: (1) ID test fails in neither reflecting the actual performance of a single model nor comparing between different models under OOD data. (2) The ID test failure can be ascribed to the learned marginal and conditional spurious correlations resulted from the corresponding distribution shifts. Based on this, we propose novel OOD test paradigms to evaluate the generalization capacity of models to unseen data, and discuss how to use OOD test results to find bugs of models to guide model debugging.