Goto

Collaborating Authors

 Zhang, Yuhong


Towards Effective Graph Rationalization via Boosting Environment Diversity

arXiv.org Artificial Intelligence

Graph Neural Networks (GNNs) perform effectively when training and testing graphs are drawn from the same distribution, but struggle to generalize well in the face of distribution shifts. To address this issue, existing mainstreaming graph rationalization methods first identify rationale and environment subgraphs from input graphs, and then diversify training distributions by augmenting the environment subgraphs. However, these methods merely combine the learned rationale subgraphs with environment subgraphs in the representation space to produce augmentation samples, failing to produce sufficiently diverse distributions. Thus, in this paper, we propose to achieve an effective Graph Rationalization by Boosting Environmental diversity, a GRBE approach that generates the augmented samples in the original graph space to improve the diversity of the environment subgraph. Firstly, to ensure the effectiveness of augmentation samples, we propose a precise rationale subgraph extraction strategy in GRBE to refine the rationale subgraph learning process in the original graph space. Secondly, to ensure the diversity of augmented samples, we propose an environment diversity augmentation strategy in GRBE that mixes the environment subgraphs of different graphs in the original graph space and then combines the new environment subgraphs with rationale subgraphs to generate augmented graphs. The average improvements of 7.65% and 6.11% in rationalization and classification performance on benchmark datasets demonstrate the superiority of GRBE over state-of-the-art approaches.


Integrating LLM, EEG, and Eye-Tracking Biomarker Analysis for Word-Level Neural State Classification in Semantic Inference Reading Comprehension

arXiv.org Artificial Intelligence

With the recent proliferation of large language models (LLMs), such as Generative Pre-trained Transformers (GPT), there has been a significant shift in exploring human and machine comprehension of semantic language meaning. This shift calls for interdisciplinary research that bridges cognitive science and natural language processing (NLP). This pilot study aims to provide insights into individuals' neural states during a semantic relation reading-comprehension task. We propose jointly analyzing LLMs, eye-gaze, and electroencephalographic (EEG) data to study how the brain processes words with varying degrees of relevance to a keyword during reading. We also use a feature engineering approach to improve the fixation-related EEG data classification while participants read words with high versus low relevance to the keyword. The best validation accuracy in this word-level classification is over 60\% across 12 subjects. Words of high relevance to the inference keyword had significantly more eye fixations per word: 1.0584 compared to 0.6576 when excluding no-fixation words, and 1.5126 compared to 1.4026 when including them. This study represents the first attempt to classify brain states at a word level using LLM knowledge. It provides valuable insights into human cognitive abilities and the realm of Artificial General Intelligence (AGI), and offers guidance for developing potential reading-assisted technologies.


DAGKT: Difficulty and Attempts Boosted Graph-based Knowledge Tracing

arXiv.org Artificial Intelligence

In the field of intelligent education, knowledge tracing (KT) has attracted increasing attention, which estimates and traces students' mastery of knowledge concepts to provide high-quality education. In KT, there are natural graph structures among questions and knowledge concepts so some studies explored the application of graph neural networks (GNNs) to improve the performance of the KT models which have not used graph structure. However, most of them ignored both the questions' difficulties and students' attempts at questions. Actually, questions with the same knowledge concepts have different difficulties, and students' different attempts also represent different knowledge mastery. In this paper, we propose a difficulty and attempts boosted graph-based KT (DAGKT), using rich information from students' records. Moreover, a novel method is designed to establish the question similarity relationship inspired by the F1 score. Extensive experiments on three real-world datasets demonstrate the effectiveness of the proposed DAGKT.


A Temporal Knowledge Graph Completion Method Based on Balanced Timestamp Distribution

arXiv.org Artificial Intelligence

Completion through the embedding representation of the knowledge graph (KGE) has been a research hotspot in recent years. Realistic knowledge graphs are mostly related to time, while most of the existing KGE algorithms ignore the time information. A few existing methods directly or indirectly encode the time information, ignoring the balance of timestamp distribution, which greatly limits the performance of temporal knowledge graph completion (KGC). In this paper, a temporal KGC method is proposed based on the direct encoding time information framework, and a given time slice is treated as the finest granularity for balanced timestamp distribution. A large number of experiments on temporal knowledge graph datasets extracted from the real world demonstrate the effectiveness of our method.


Semi-supervised representation learning via dual autoencoders for domain adaptation

arXiv.org Machine Learning

Domain adaptation which pays attention to exploiting the knowledge in source domain to promote the learning tasks in target domain plays a critical role in real-world applications. Recently, lots of deep learning approaches based on autoencoders have achieved significance performance in domain adaptation. However, most existing methods focus on minimizing the distribution divergence by putting the source data and target data together to learn global feature representations, while do not take the local relationship between instances of the same category in different domains into account. To address this problem, we propose a novel Semi-Supervised Representation Learning framework via Dual Autoencoders for domain adaptation, named SSRLDA. More specifically, \textcolor{red}{we extract richer feature representations by learning the global and local feature representations simultaneously using two novel autoencoders}, which are referred to as marginalized denoising autoencoder with adaptation distribution (MDA$_{ad}$) and multi-class marginalized denoising autoencoder (MMDA) respectively. Meanwhile, we \textcolor{red}{adopt an iterative strategy} to make full use of label information to optimize feature representations. Experimental results show that our proposed approach outperforms several state-of-the-art baseline methods.