Pan, Yi
Causal Mean Field Multi-Agent Reinforcement Learning
Ma, Hao, Pu, Zhiqiang, Pan, Yi, Liu, Boyin, Gao, Junlong, Guo, Zhenyu
Scalability remains a challenge in multi-agent reinforcement learning and is currently under active research. A framework named mean-field reinforcement learning (MFRL) could alleviate the scalability problem by employing the Mean Field Theory to turn a many-agent problem into a two-agent problem. However, this framework lacks the ability to identify essential interactions under nonstationary environments. Causality contains relatively invariant mechanisms behind interactions, though environments are nonstationary. Therefore, we propose an algorithm called causal mean-field Q-learning (CMFQ) to address the scalability problem. CMFQ is ever more robust toward the change of the number of agents though inheriting the compressed representation of MFRL's action-state space. Firstly, we model the causality behind the decision-making process of MFRL into a structural causal model (SCM). Then the essential degree of each interaction is quantified via intervening on the SCM. Furthermore, we design the causality-aware compact representation for behavioral information of agents as the weighted sum of all behavioral information according to their causal effects. We test CMFQ in a mixed cooperative-competitive game and a cooperative game. The result shows that our method has excellent scalability performance in both training in environments containing a large number of agents and testing in environments containing much more agents.
Large Language Models for Bioinformatics
Ruan, Wei, Lyu, Yanjun, Zhang, Jing, Cai, Jiazhang, Shu, Peng, Ge, Yang, Lu, Yao, Gao, Shang, Wang, Yue, Wang, Peilong, Zhao, Lin, Wang, Tao, Liu, Yufang, Fang, Luyang, Liu, Ziyu, Liu, Zhengliang, Li, Yiwei, Wu, Zihao, Chen, Junhao, Jiang, Hanqi, Pan, Yi, Yang, Zhenyuan, Chen, Jingyuan, Liang, Shizhe, Zhang, Wei, Ma, Terry, Dou, Yuan, Zhang, Jianli, Gong, Xinyu, Gan, Qi, Zou, Yusong, Chen, Zebang, Qian, Yuanxin, Yu, Shuo, Lu, Jin, Song, Kenan, Wang, Xianqiao, Sikora, Andrea, Li, Gang, Li, Xiang, Li, Quanzheng, Wang, Yingfeng, Zhang, Lu, Abate, Yohannes, He, Lifang, Zhong, Wenxuan, Liu, Rongjie, Huang, Chao, Liu, Wei, Shen, Ye, Ma, Ping, Zhu, Hongtu, Yan, Yajun, Zhu, Dajiang, Liu, Tianming
With the rapid advancements in large language model (LLM) technology and the emergence of bioinformatics-specific language models (BioLMs), there is a growing need for a comprehensive analysis of the current landscape, computational characteristics, and diverse applications. This survey aims to address this need by providing a thorough review of BioLMs, focusing on their evolution, classification, and distinguishing features, alongside a detailed examination of training methodologies, datasets, and evaluation frameworks. We explore the wide-ranging applications of BioLMs in critical areas such as disease diagnosis, drug discovery, and vaccine development, highlighting their impact and transformative potential in bioinformatics. We identify key challenges and limitations inherent in BioLMs, including data privacy and security concerns, interpretability issues, biases in training data and model outputs, and domain adaptation complexities. Finally, we highlight emerging trends and future directions, offering valuable insights to guide researchers and clinicians toward advancing BioLMs for increasingly sophisticated biological and clinical applications.
Opportunities and Challenges of Large Language Models for Low-Resource Languages in Humanities Research
Zhong, Tianyang, Yang, Zhenyuan, Liu, Zhengliang, Zhang, Ruidong, Liu, Yiheng, Sun, Haiyang, Pan, Yi, Li, Yiwei, Zhou, Yifan, Jiang, Hanqi, Chen, Junhao, Liu, Tianming
Importance and Endangerment of Low-Resource Languages in the Global Linguistic Ecology The linguistic landscape of the world constitutes a complex tapestry interwoven with a rich diversity of languages, each strand epitomizing a distinctive cultural, historical, and social identity. This global linguistic diversity forms a foundational pillar of human civilization, cultivating an array of perspectives and worldviews that enhance our collective intellectual legacy. Among these, low-resource languages occupy a particularly crucial position, not merely as modes of communication but as repositories of distinctive cultural knowledge, historical narratives, and worldviews. These languages, frequently spoken by smaller communities, are essential to the preservation of cultural heritage and the transmission of indigenous knowledge systems. However, the global linguistic landscape is presently undergoing an extraordinary crisis, with lowresource languages among the most threatened. The swift vanishing of these languages is of serious concern, highlighted by concerning data and studies. It is estimated, for example, that around 40% of the world's 7,000 languages face extinction, with numerous low-resource languages having fewer than 1,000 speakers [94].
QueEn: A Large Language Model for Quechua-English Translation
Chen, Junhao, Shu, Peng, Li, Yiwei, Zhao, Huaqin, Jiang, Hanqi, Pan, Yi, Zhou, Yifan, Liu, Zhengliang, Howe, Lewis C, Liu, Tianming
Recent studies show that large language models (LLMs) are powerful tools for working with natural language, bringing advances in many areas of computational linguistics. However, these models face challenges when applied to low-resource languages due to limited training data and difficulty in understanding cultural nuances. In this paper, we propose QueEn, a novel approach for Quechua-English translation that combines Retrieval-Augmented Generation (RAG) with parameter-efficient fine-tuning techniques. Our method leverages external linguistic resources through RAG and uses Low-Rank Adaptation (LoRA) for efficient model adaptation. Experimental results show that our approach substantially exceeds baseline models, with a BLEU score of 17.6 compared to 1.5 for standard GPT models. The integration of RAG with fine-tuning allows our system to address the challenges of low-resource language translation while maintaining computational efficiency. This work contributes to the broader goal of preserving endangered languages through advanced language technologies.
Transcending Language Boundaries: Harnessing LLMs for Low-Resource Language Translation
Shu, Peng, Chen, Junhao, Liu, Zhengliang, Wang, Hui, Wu, Zihao, Zhong, Tianyang, Li, Yiwei, Zhao, Huaqin, Jiang, Hanqi, Pan, Yi, Zhou, Yifan, Owl, Constance, Zhai, Xiaoming, Liu, Ninghao, Saunt, Claudio, Liu, Tianming
Large Language Models (LLMs) have demonstrated remarkable success across a wide range of tasks and domains. However, their performance in low-resource language translation, particularly when translating into these languages, remains underexplored. This gap poses significant challenges, as linguistic barriers hinder the cultural preservation and development of minority communities. To address this issue, this paper introduces a novel retrieval-based method that enhances translation quality for low-resource languages by focusing on key terms, which involves translating keywords and retrieving corresponding examples from existing data. To evaluate the effectiveness of this method, we conducted experiments translating from English into three low-resource languages: Cherokee, a critically endangered indigenous language of North America; Tibetan, a historically and culturally significant language in Asia; and Manchu, a language with few remaining speakers. Our comparison with the zero-shot performance of GPT-4o and LLaMA 3.1 405B, highlights the significant challenges these models face when translating into low-resource languages. In contrast, our retrieval-based method shows promise in improving both word-level accuracy and overall semantic understanding by leveraging existing resources more effectively.
Towards Next-Generation Medical Agent: How o1 is Reshaping Decision-Making in Medical Scenarios
Xu, Shaochen, Zhou, Yifan, Liu, Zhengliang, Wu, Zihao, Zhong, Tianyang, Zhao, Huaqin, Li, Yiwei, Jiang, Hanqi, Pan, Yi, Chen, Junhao, Lu, Jin, Zhang, Wei, Zhang, Tuo, Zhang, Lu, Zhu, Dajiang, Li, Xiang, Liu, Wei, Li, Quanzheng, Sikora, Andrea, Zhai, Xiaoming, Xiang, Zhen, Liu, Tianming
Artificial Intelligence (AI) has become essential in modern healthcare, with large language models (LLMs) offering promising advances in clinical decision-making. Traditional model-based approaches, including those leveraging in-context demonstrations and those with specialized medical fine-tuning, have demonstrated strong performance in medical language processing but struggle with real-time adaptability, multi-step reasoning, and handling complex medical tasks. Agent-based AI systems address these limitations by incorporating reasoning traces, tool selection based on context, knowledge retrieval, and both short- and long-term memory. These additional features enable the medical AI agent to handle complex medical scenarios where decision-making should be built on real-time interaction with the environment. Therefore, unlike conventional model-based approaches that treat medical queries as isolated questions, medical AI agents approach them as complex tasks and behave more like human doctors. In this paper, we study the choice of the backbone LLM for medical AI agents, which is the foundation for the agent's overall reasoning and action generation. In particular, we consider the emergent o1 model and examine its impact on agents' reasoning, tool-use adaptability, and real-time information retrieval across diverse clinical scenarios, including high-stakes settings such as intensive care units (ICUs). Our findings demonstrate o1's ability to enhance diagnostic accuracy and consistency, paving the way for smarter, more responsive AI tools that support better patient outcomes and decision-making efficacy in clinical practice.
HELENE: Hessian Layer-wise Clipping and Gradient Annealing for Accelerating Fine-tuning LLM with Zeroth-order Optimization
Zhao, Huaqin, Li, Jiaxi, Pan, Yi, Liang, Shizhe, Yang, Xiaofeng, Liu, Wei, Li, Xiang, Dou, Fei, Liu, Tianming, Lu, Jin
Fine-tuning large language models (LLMs) poses significant memory challenges, as the back-propagation process demands extensive resources, especially with growing model sizes. Recent work, MeZO, addresses this issue using a zerothorder (ZO) optimization method, which reduces memory consumption by matching the usage to the inference phase. To overcome this limitation, we introduce HELENE, a novel scalable and memory-efficient optimizer that integrates annealed A-GNB gradients with a diagonal Hessian estimation and layerwise clipping, serving as a second-order pre-conditioner. This combination allows for faster and more stable convergence. Our theoretical analysis demonstrates that HELENE improves convergence rates, particularly for models with heterogeneous layer dimensions, by reducing the dependency on the total parameter space dimension. Furthermore, HELENE remains compatible with both full parameter tuning and parameter-efficient fine-tuning (PEFT), outperforming several state-of-the-art optimizers. The codes will be released after reviewing. LLMs have demonstrated remarkable capabilities across various downstream tasks. Fine-tuning these models has become the standard approach for improving task-specific performance, in which the firstorder optimizers like Stochastic Gradient Descent (SGD) (Robbins & Monro, 1951), Adam (Diederik, 2014) and AdamW (Hutter & Loshchilov, 2017) are widely used.
EG-SpikeFormer: Eye-Gaze Guided Transformer on Spiking Neural Networks for Medical Image Analysis
Pan, Yi, Jiang, Hanqi, Chen, Junhao, Li, Yiwei, Zhao, Huaqin, Zhou, Yifan, Shu, Peng, Wu, Zihao, Liu, Zhengliang, Zhu, Dajiang, Li, Xiang, Abate, Yohannes, Liu, Tianming
Neuromorphic computing has emerged as a promising energy-efficient alternative to traditional artificial intelligence, predominantly utilizing spiking neural networks (SNNs) implemented on neuromorphic hardware. Significant advancements have been made in SNN-based convolutional neural networks (CNNs) and Transformer architectures. However, neuromorphic computing for the medical imaging domain remains underexplored. In this study, we introduce EG-SpikeFormer, an SNN architecture tailored for clinical tasks that incorporates eye-gaze data to guide the model's attention to the diagnostically relevant regions in medical images. Our developed approach effectively addresses shortcut learning issues commonly observed in conventional models, especially in scenarios with limited clinical data and high demands for model reliability, generalizability, and transparency. Our EG-SpikeFormer not only demonstrates superior energy efficiency and performance in medical image prediction tasks but also enhances clinical relevance through multi-modal information alignment. By incorporating eye-gaze data, the model improves interpretability and generalization, opening new directions for applying neuromorphic computing in healthcare.
Large Language Models for Manufacturing
Li, Yiwei, Zhao, Huaqin, Jiang, Hanqi, Pan, Yi, Liu, Zhengliang, Wu, Zihao, Shu, Peng, Tian, Jie, Yang, Tianze, Xu, Shaochen, Lyu, Yanjun, Blenk, Parker, Pence, Jacob, Rupram, Jason, Banu, Eliza, Liu, Ninghao, Wang, Linbing, Song, Wenzhan, Zhai, Xiaoming, Song, Kenan, Zhu, Dajiang, Li, Beiwen, Wang, Xianqiao, Liu, Tianming
The rapid advances in Large Language Models (LLMs) have the potential to transform manufacturing industry, offering new opportunities to optimize processes, improve efficiency, and drive innovation. This paper provides a comprehensive exploration of the integration of LLMs into the manufacturing domain, focusing on their potential to automate and enhance various aspects of manufacturing, from product design and development to quality control, supply chain optimization, and talent management. Through extensive evaluations across multiple manufacturing tasks, we demonstrate the remarkable capabilities of state-of-the-art LLMs, such as GPT-4V, in understanding and executing complex instructions, extracting valuable insights from vast amounts of data, and facilitating knowledge sharing. We also delve into the transformative potential of LLMs in reshaping manufacturing education, automating coding processes, enhancing robot control systems, and enabling the creation of immersive, data-rich virtual environments through the industrial metaverse. By highlighting the practical applications and emerging use cases of LLMs in manufacturing, this paper aims to provide a valuable resource for professionals, researchers, and decision-makers seeking to harness the power of these technologies to address real-world challenges, drive operational excellence, and unlock sustainable growth in an increasingly competitive landscape.
A Survey of Hallucination in Large Visual Language Models
Lan, Wei, Chen, Wenyi, Chen, Qingfeng, Pan, Shirui, Zhou, Huiyu, Pan, Yi
The Large Visual Language Models (LVLMs) enhances user interaction and enriches user experience by integrating visual modality on the basis of the Large Language Models (LLMs). It has demonstrated their powerful information processing and generation capabilities. However, the existence of hallucinations has limited the potential and practical effectiveness of LVLM in various fields. Although lots of work has been devoted to the issue of hallucination mitigation and correction, there are few reviews to summary this issue. In this survey, we first introduce the background of LVLMs and hallucinations. Then, the structure of LVLMs and main causes of hallucination generation are introduced. Further, we summary recent works on hallucination correction and mitigation. In addition, the available hallucination evaluation benchmarks for LVLMs are presented from judgmental and generative perspectives. Finally, we suggest some future research directions to enhance the dependability and utility of LVLMs.