Wei, Fan
Constructing a Norm for Children's Scientific Drawing: Distribution Features Based on Semantic Similarity of Large Language Models
Zhang, Yi, Wei, Fan, Li, Jingyi, Wang, Yan, Yu, Yanyan, Chen, Jianli, Cai, Zipo, Liu, Xinyu, Wang, Wei, Wang, Peng, Wang, Zhong
The use of children's drawings to examining their conceptual understanding has been proven to be an effective method, but there are two major problems with previous research: 1. The content of the drawings heavily relies on the task, and the ecological validity of the conclusions is low; 2. The interpretation of drawings relies too much on the subjective feelings of the researchers. To address this issue, this study uses the Large Language Model (LLM) to identify 1420 children's scientific drawings (covering 9 scientific themes/concepts), and uses the word2vec algorithm to calculate their semantic similarity. The study explores whether there are consistent drawing representations for children on the same theme, and attempts to establish a norm for children's scientific drawings, providing a baseline reference for follow-up children's drawing research. The results show that the representation of most drawings has consistency, manifested as most semantic similarity greater than 0.8. At the same time, it was found that the consistency of the representation is independent of the accuracy (of LLM's recognition), indicating the existence of consistency bias. In the subsequent exploration of influencing factors, we used Kendall rank correlation coefficient to investigate the effects of Sample Size, Abstract Degree, and Focus Points on drawings, and used word frequency statistics to explore whether children represented abstract themes/concepts by reproducing what was taught in class.
Self-Paced Multi-Task Learning
Li, Changsheng (East China Normal University) | Yan, Junchi (East China Normal University) | Wei, Fan (Stanford University) | Dong, Weishan (IBM Research - China) | Liu, Qingshan (Nanjing University of Information Science and Technology) | Zha, Hongyuan (East China Normal University)
Multi-task learning is a paradigm, where multiple tasks are jointly learnt. Previous multi-task learning models usually treat all tasks and instances per task equally during learning. Inspired by the fact that humans often learn from easy concepts to hard ones in the cognitive process, in this paper, we propose a novel multi-task learning framework that attempts to learn the tasks by simultaneously taking into consideration the complexities of both tasks and instances per task. We propose a novel formulation by presenting a new task-oriented regularizer that can jointly prioritize tasks and instances.Thus it can be interpreted as a self-paced learner for multi-task learning. An efficient block coordinate descent algorithm is developed to solve the proposed objective function, and the convergence of the algorithm can be guaranteed. Experimental results on the toy and real-world datasets demonstrate the effectiveness of the proposed approach, compared to the state-of-the-arts.
Spatially Regularized Streaming Sensor Selection
Li, Changsheng (IBM Research-China) | Wei, Fan (Stanford University) | Dong, Weishan (IBM Research-China) | Wang, Xiangfeng (East China Normal University) | Yan, Junchi (East China Normal University) | Zhu, Xiaobin (Beijing Technology and Business University) | Liu, Qingshan (Nanjing University of Information Science and Technology) | Zhang, Xin (IBM Research-China)
Sensor selection has become an active topic aimed at energy saving, information overload prevention, and communication cost planning in sensor networks. In many real applications, often the sensors' observation regions have overlaps and thus the sensor network is inherently redundant. Therefore it is important to select proper sensors to avoid data redundancy. This paper focuses on how to incrementally select a subset of sensors in a streaming scenario to minimize information redundancy, and meanwhile meet the power consumption constraint. We propose to perform sensor selection in a multi-variate interpolation framework, such that the data sampled by the selected sensors can well predict those of the inactive sensors. Importantly, we incorporate sensors' spatial information as two regularizers, which leads to significantly better prediction performance. We also define a statistical variable to store sufficient information for incremental learning, and introduce a forgetting factor to track sensor streams' evolvement. Experiments on both synthetic and real datasets validate the effectiveness of the proposed method. Moreover, our method is over 10 times faster than the state-of-the-art sensor selection algorithm.