Pan, Yijie
Unveiling the Key Factors for Distilling Chain-of-Thought Reasoning
Chen, Xinghao, Sun, Zhijing, Guo, Wenjin, Zhang, Miaoran, Chen, Yanjun, Sun, Yirong, Su, Hui, Pan, Yijie, Klakow, Dietrich, Li, Wenjie, Shen, Xiaoyu
Large Language Models (LLMs) excel in reasoning tasks through Chain-of-Thought (CoT) prompting. However, CoT prompting greatly increases computational demands, which has prompted growing interest in distilling CoT capabilities into Small Language Models (SLMs). This study systematically examines the factors influencing CoT distillation, including the choice of granularity, format and teacher model. Through experiments involving four teacher models and seven student models across seven mathematical and commonsense reasoning datasets, we uncover three key findings: (1) Unlike LLMs, SLMs exhibit a non-monotonic relationship with granularity, with stronger models benefiting from finer-grained reasoning and weaker models performing better with simpler CoT supervision; (2) CoT format significantly impacts LLMs but has minimal effect on SLMs, likely due to their reliance on supervised fine-tuning rather than pretraining preferences; (3) Stronger teacher models do NOT always produce better student models, as diversity and complexity in CoT supervision can outweigh accuracy alone. These findings emphasize the need to tailor CoT strategies to specific student model, offering actionable insights for optimizing CoT distillation in SLMs. The code and datasets are available at https://github.com/EIT-NLP/Distilling-CoT-Reasoning.
Exact Fit Attention in Node-Holistic Graph Convolutional Network for Improved EEG-Based Driver Fatigue Detection
Xu, Meiyan, Chen, Qingqing, Chen, Duo, Ding, Yi, Wang, Jingyuan, Gu, Peipei, Pan, Yijie, Huang, Deshuang, Zhang, Xun, Guo, Jiayang
-- EEG-based fatigue monitoring can effectively reduce the incidence of related traffic accidents. In the past decade, with the advancement of deep learning, convolu-tional neural networks (CNN) have been increasingly used for EEG signal processing. However, due to the data's non-Euclidean characteristics, existing CNNs may lose important spatial information from EEG, specifically channel correlation. Thus, we propose the node-holistic graph convo-lutional network (NHGNet), a model that uses graphic convolution to dynamically learn each channel's features. The interpretability is enhanced by revealing critical areas of brain activity and their interrelations in various mental states. In validations on two public datasets, NHGNet outperforms the SOTAs. Specifically, in the intra-subject, NHGNet improved detection accuracy by at least 2.34% and 3.42%, and in the inter-subjects, it improved by at least 2.09% and 15.06%. Visualization research on the model revealed that the central parietal area plays an important role in detecting fatigue levels, whereas the frontal and temporal lobes are essential for maintaining vigilance. Duo Chen is with the School of Artificial Intelligence and Information T echnology, Nanjing University of Chinese Medicine, Nanjing 210023, China (e-mail: 380013@njucm.edu.cn). Yi Ding is with the College of Computing and Data Science, Nanyang T echnological University, Singapore.