Wang, Shanshan
Identifying Ising and percolation phase transitions based on KAN method
Xu, Dian, Wang, Shanshan, Li, Wei, Deng, Weibing, Gao, Feng, Shen, Jianmin
Modern machine learning, grounded in the Universal Approximation Theorem, has achieved significant success in the study of phase transitions in both equilibrium and non-equilibrium systems. However, identifying the critical points of percolation models using raw configurations remains a challenging and intriguing problem. This paper proposes the use of the Kolmogorov-Arnold Network, which is based on the Kolmogorov-Arnold Representation Theorem, to input raw configurations into a learning model. The results demonstrate that the KAN can indeed predict the critical points of percolation models. Further observation reveals that, apart from models associated with the density of occupied points, KAN is also capable of effectively achieving phase classification for models where the sole alteration pertains to the orientation of spins, resulting in an order parameter that manifests as an external magnetic flux, such as the Ising model.
Learning states enhanced knowledge tracing: Simulating the diversity in real-world learning process
Wang, Shanshan, Zhang, Xueying, Wang, Keyang, Yang, Xun, Zhang, Xingyi
The Knowledge Tracing (KT) task focuses on predicting a learner's future performance based on the historical interactions. The knowledge state plays a key role in learning process. However, considering that the knowledge state is influenced by various learning factors in the interaction process, such as the exercises similarities, responses reliability and the learner's learning state. Previous models still face two major limitations. First, due to the exercises differences caused by various complex reasons and the unreliability of responses caused by guessing behavior, it is hard to locate the historical interaction which is most relevant to the current answered exercise. Second, the learning state is also a key factor to influence the knowledge state, which is always ignored by previous methods. To address these issues, we propose a new method named Learning State Enhanced Knowledge Tracing (LSKT). Firstly, to simulate the potential differences in interactions, inspired by Item Response Theory~(IRT) paradigm, we designed three different embedding methods ranging from coarse-grained to fine-grained views and conduct comparative analysis on them. Secondly, we design a learning state extraction module to capture the changing learning state during the learning process of the learner. In turn, with the help of the extracted learning state, a more detailed knowledge state could be captured. Experimental results on four real-world datasets show that our LSKT method outperforms the current state-of-the-art methods.
Optimized Vessel Segmentation: A Structure-Agnostic Approach with Small Vessel Enhancement and Morphological Correction
Song, Dongning, Huang, Weijian, Liu, Jiarun, Islam, Md Jahidul, Yang, Hao, Wang, Shanshan
Accurate segmentation of blood vessels is essential for various clinical assessments and postoperative analyses. However, the inherent challenges of vascular imaging, such as sparsity, fine granularity, low contrast, data distribution variability, and the critical need for preserving topological structure, making generalized vessel segmentation particularly complex. While specialized segmentation methods have been developed for specific anatomical regions, their over-reliance on tailored models hinders broader applicability and generalization. General-purpose segmentation models introduced in medical imaging often fail to address critical vascular characteristics, including the connectivity of segmentation results. To overcome these limitations, we propose an optimized vessel segmentation framework: a structure-agnostic approach incorporating small vessel enhancement and morphological correction for multi-modality vessel segmentation. To train and validate this framework, we compiled a comprehensive multi-modality dataset spanning 17 datasets and benchmarked our model against six SAM-based methods and 17 expert models. The results demonstrate that our approach achieves superior segmentation accuracy, generalization, and a 34.6% improvement in connectivity, underscoring its clinical potential. An ablation study further validates the effectiveness of the proposed improvements. We will release the code and dataset at github following the publication of this work.
Exploring structure diversity in atomic resolution microscopy with graph neural networks
Luo, Zheng, Feng, Ming, Gao, Zijian, Yu, Jinyang, Hu, Liang, Wang, Tao, Xue, Shenao, Zhou, Shen, Ouyang, Fangping, Feng, Dawei, Xu, Kele, Wang, Shanshan
The emergence of deep learning (DL) has provided great opportunities for the high-throughput analysis of atomic-resolution micrographs. However, the DL models trained by image patches in fixed size generally lack efficiency and flexibility when processing micrographs containing diversified atomic configurations. Herein, inspired by the similarity between the atomic structures and graphs, we describe a few-shot learning framework based on an equivariant graph neural network (EGNN) to analyze a library of atomic structures (e.g., vacancies, phases, grain boundaries, doping, etc.), showing significantly promoted robustness and three orders of magnitude reduced computing parameters compared to the image-driven DL models, which is especially evident for those aggregated vacancy lines with flexible lattice distortion. Besides, the intuitiveness of graphs enables quantitative and straightforward extraction of the atomic-scale structural features in batches, thus statistically unveiling the self-assembly dynamics of vacancy lines under electron beam irradiation. A versatile model toolkit is established by integrating EGNN sub-models for single structure recognition to process images involving varied configurations in the form of a task chain, leading to the discovery of novel doping configurations with superior electrocatalytic properties for hydrogen evolution reactions. This work provides a powerful tool to explore structure diversity in a fast, accurate, and intelligent manner.
H2OVL-Mississippi Vision Language Models Technical Report
Galib, Shaikat, Wang, Shanshan, Xu, Guanshuo, Pfeiffer, Pascal, Chesler, Ryan, Landry, Mark, Ambati, Sri Satish
Smaller vision-language models (VLMs) are becoming increasingly important for privacy-focused, on-device applications due to their ability to run efficiently on consumer hardware for processing enterprise commercial documents and images. These models require strong language understanding and visual capabilities to enhance human-machine interaction. To address this need, we present H2OVL-Mississippi, a pair of small VLMs trained on 37 million image-text pairs using 240 hours of compute on 8 x H100 GPUs. H2OVL-Mississippi-0.8B is a tiny model with 0.8 billion parameters that specializes in text recognition, achieving state of the art performance on the Text Recognition portion of OCRBench and surpassing much larger models in this area. Additionally, we are releasing H2OVL-Mississippi-2B, a 2 billion parameter model for general use cases, exhibiting highly competitive metrics across various academic benchmarks. Both models build upon our prior work with H2O-Danube language models, extending their capabilities into the visual domain. We release them under the Apache 2.0 license, making VLMs accessible to everyone, democratizing document AI and visual LLMs.
MedBench: A Comprehensive, Standardized, and Reliable Benchmarking System for Evaluating Chinese Medical Large Language Models
Liu, Mianxin, Ding, Jinru, Xu, Jie, Hu, Weiguo, Li, Xiaoyang, Zhu, Lifeng, Bai, Zhian, Shi, Xiaoming, Wang, Benyou, Song, Haitao, Liu, Pengfei, Zhang, Xiaofan, Wang, Shanshan, Li, Kang, Wang, Haofen, Ruan, Tong, Huang, Xuanjing, Sun, Xin, Zhang, Shaoting
Ensuring the general efficacy and goodness for human beings from medical large language models (LLM) before real-world deployment is crucial. However, a widely accepted and accessible evaluation process for medical LLM, especially in the Chinese context, remains to be established. In this work, we introduce "MedBench", a comprehensive, standardized, and reliable benchmarking system for Chinese medical LLM. First, MedBench assembles the currently largest evaluation dataset (300,901 questions) to cover 43 clinical specialties and performs multi-facet evaluation on medical LLM. Second, MedBench provides a standardized and fully automatic cloud-based evaluation infrastructure, with physical separations for question and ground truth. Third, MedBench implements dynamic evaluation mechanisms to prevent shortcut learning and answer remembering. Applying MedBench to popular general and medical LLMs, we observe unbiased, reproducible evaluation results largely aligning with medical professionals' perspectives. This study establishes a significant foundation for preparing the practical applications of Chinese medical LLMs. MedBench is publicly accessible at https://medbench.opencompass.org.cn.
What is the Best Way for ChatGPT to Translate Poetry?
Wang, Shanshan, Wong, Derek F., Yao, Jingming, Chao, Lidia S.
Machine translation (MT) has historically faced significant challenges when applied to literary works, particularly in the domain of poetry translation. The advent of Large Language Models such as ChatGPT holds potential for innovation in this field. This study examines ChatGPT's capabilities in English-Chinese poetry translation tasks, utilizing targeted prompts and small sample scenarios to ascertain optimal performance. Despite promising outcomes, our analysis reveals persistent issues in the translations generated by ChatGPT that warrant attention. To address these shortcomings, we propose an Explanation-Assisted Poetry Machine Translation (EAPMT) method, which leverages monolingual poetry explanation as a guiding information for the translation process. Furthermore, we refine existing evaluation criteria to better suit the nuances of modern poetry translation. We engaged a panel of professional poets for assessments, complemented evaluations by using GPT-4. The results from both human and machine evaluations demonstrate that our EAPMT method outperforms traditional translation methods of ChatGPT and the existing online systems. This paper validates the efficacy of our method and contributes a novel perspective to machine-assisted literary translation.
Dual-State Personalized Knowledge Tracing with Emotional Incorporation
Wang, Shanshan, Yuan, Fangzheng, Wang, Keyang, Yang, Xun, Zhang, Xingyi, Wang, Meng
Knowledge tracing has been widely used in online learning systems to guide the students' future learning. However, most existing KT models primarily focus on extracting abundant information from the question sets and explore the relationships between them, but ignore the personalized student behavioral information in the learning process. This will limit the model's ability to accurately capture the personalized knowledge states of students and reasonably predict their performances. To alleviate this limitation, we explicitly models the personalized learning process by incorporating the emotions, a representative personalized behavior in the learning process, into KT framework. Specifically, we present a novel Dual-State Personalized Knowledge Tracing with Emotional Incorporation model to achieve this goal: Firstly, we incorporate emotional information into the modeling process of knowledge state, resulting in the Knowledge State Boosting Module. Secondly, we design an Emotional State Tracing Module to monitor students' personalized emotional states, and propose an emotion prediction method based on personalized emotional states. Finally, we apply the predicted emotions to enhance students' response prediction. Furthermore, to extend the generalization capability of our model across different datasets, we design a transferred version of DEKT, named Transfer Learning-based Self-loop model (T-DEKT). Extensive experiments show our method achieves the state-of-the-art performance.
Personalized Forgetting Mechanism with Concept-Driven Knowledge Tracing
Wang, Shanshan, Hu, Ying, Yang, Xun, Zhang, Zhongzhou, Wang, Keyang, Zhang, Xingyi
Knowledge Tracing (KT) aims to trace changes in students' knowledge states throughout their entire learning process by analyzing their historical learning data and predicting their future learning performance. Existing forgetting curve theory based knowledge tracing models only consider the general forgetting caused by time intervals, ignoring the individualization of students and the causal relationship of the forgetting process. To address these problems, we propose a Concept-driven Personalized Forgetting knowledge tracing model (CPF) which integrates hierarchical relationships between knowledge concepts and incorporates students' personalized cognitive abilities. First, we integrate the students' personalized capabilities into both the learning and forgetting processes to explicitly distinguish students' individual learning gains and forgetting rates according to their cognitive abilities. Second, we take into account the hierarchical relationships between knowledge points and design a precursor-successor knowledge concept matrix to simulate the causal relationship in the forgetting process, while also integrating the potential impact of forgetting prior knowledge points on subsequent ones. The proposed personalized forgetting mechanism can not only be applied to the learning of specifc knowledge concepts but also the life-long learning process. Extensive experimental results on three public datasets show that our CPF outperforms current forgetting curve theory based methods in predicting student performance, demonstrating CPF can better simulate changes in students' knowledge status through the personalized forgetting mechanism.
Enhancing Out-of-Distribution Detection with Multitesting-based Layer-wise Feature Fusion
Li, Jiawei, Li, Sitong, Wang, Shanshan, Zeng, Yicheng, Tan, Falong, Xie, Chuanlong
Deploying machine learning in open environments presents the challenge of encountering diverse test inputs that differ significantly from the training data. These out-of-distribution samples may exhibit shifts in local or global features compared to the training distribution. The machine learning (ML) community has responded with a number of methods aimed at distinguishing anomalous inputs from original training data. However, the majority of previous studies have primarily focused on the output layer or penultimate layer of pre-trained deep neural networks. In this paper, we propose a novel framework, Multitesting-based Layer-wise Out-of-Distribution (OOD) Detection (MLOD), to identify distributional shifts in test samples at different levels of features through rigorous multiple testing procedure. Our approach distinguishes itself from existing methods as it does not require modifying the structure or fine-tuning of the pre-trained classifier. Through extensive experiments, we demonstrate that our proposed framework can seamlessly integrate with any existing distance-based inspection method while efficiently utilizing feature extractors of varying depths. Our scheme effectively enhances the performance of out-of-distribution detection when compared to baseline methods. In particular, MLOD-Fisher achieves superior performance in general. When trained using KNN on CIFAR10, MLOD-Fisher significantly lowers the false positive rate (FPR) from 24.09% to 7.47% on average compared to merely utilizing the features of the last layer.