Plotting

 Xu, Min


HGTDP-DTA: Hybrid Graph-Transformer with Dynamic Prompt for Drug-Target Binding Affinity Prediction

arXiv.org Artificial Intelligence

Drug target binding affinity (DTA) is a key criterion for drug screening. Existing experimental methods are time-consuming and rely on limited structural and domain information. While learning-based methods can model sequence and structural information, they struggle to integrate contextual data and often lack comprehensive modeling of drug-target interactions. In this study, we propose a novel DTA prediction method, termed HGTDP-DTA, which utilizes dynamic prompts within a hybrid Graph-Transformer framework. Our method generates context-specific prompts for each drug-target pair, enhancing the model's ability to capture unique interactions. The introduction of prompt tuning further optimizes the prediction process by filtering out irrelevant noise and emphasizing task-relevant information, dynamically adjusting the input features of the molecular graph. The proposed hybrid Graph-Transformer architecture combines structural information from Graph Convolutional Networks (GCNs) with sequence information captured by Transformers, facilitating the interaction between global and local information. Additionally, we adopted the multi-view feature fusion method to project molecular graph views and affinity subgraph views into a common feature space, effectively combining structural and contextual information. Experiments on two widely used public datasets, Davis and KIBA, show that HGTDP-DTA outperforms state-of-the-art DTA prediction methods in both prediction performance and generalization ability.


Synergistic Global-space Camera and Human Reconstruction from Videos

arXiv.org Artificial Intelligence

Remarkable strides have been made in reconstructing static scenes or human bodies from monocular videos. Yet, the two problems have largely been approached independently, without much synergy. Most visual SLAM methods can only reconstruct camera trajectories and scene structures up to scale, while most HMR methods reconstruct human meshes in metric scale but fall short in reasoning with cameras and scenes. This work introduces Synergistic Camera and Human Reconstruction (SynCHMR) to marry the best of both worlds. Specifically, we design Human-aware Metric SLAM to reconstruct metric-scale camera poses and scene point clouds using camera-frame HMR as a strong prior, addressing depth, scale, and dynamic ambiguities. Conditioning on the dense scene recovered, we further learn a Scene-aware SMPL Denoiser to enhance world-frame HMR by incorporating spatio-temporal coherency and dynamic scene constraints. Together, they lead to consistent reconstructions of camera trajectories, human meshes, and dense scene point clouds in a common world frame. Project page: https://paulchhuang.github.io/synchmr


Conditional Local Feature Encoding for Graph Neural Networks

arXiv.org Artificial Intelligence

Graph neural networks (GNNs) have shown great success in learning from graph-based data. The key mechanism of current GNNs is message passing, where a node's feature is updated based on the information passing from its local neighbourhood. A limitation of this mechanism is that node features become increasingly dominated by the information aggregated from the neighbourhood as we use more rounds of message passing. Consequently, as the GNN layers become deeper, adjacent node features tends to be similar, making it more difficult for GNNs to distinguish adjacent nodes, thereby, limiting the performance of GNNs. In this paper, we propose conditional local feature encoding (CLFE) to help prevent the problem of node features being dominated by the information from local neighbourhood. The idea of our method is to extract the node hidden state embedding from message passing process and concatenate it with the nodes feature from previous stage, then we utilise linear transformation to form a CLFE based on the concatenated vector. The CLFE will form the layer output to better preserve node-specific information, thus help to improve the performance of GNN models. To verify the feasibility of our method, we conducted extensive experiments on seven benchmark datasets for four graph domain tasks: super-pixel graph classification, node classification, link prediction, and graph regression. The experimental results consistently demonstrate that our method improves model performance across a variety of baseline GNN models for all four tasks.


ERAGent: Enhancing Retrieval-Augmented Language Models with Improved Accuracy, Efficiency, and Personalization

arXiv.org Artificial Intelligence

Retrieval-augmented generation (RAG) for language models significantly improves language understanding systems. The basic retrieval-then-read pipeline of response generation has evolved into a more extended process due to the integration of various components, sometimes even forming loop structures. Despite its advancements in improving response accuracy, challenges like poor retrieval quality for complex questions that require the search of multifaceted semantic information, inefficiencies in knowledge re-retrieval during long-term serving, and lack of personalized responses persist. Motivated by transcending these limitations, we introduce ERAGent, a cutting-edge framework that embodies an advancement in the RAG area. Our contribution is the introduction of the synergistically operated module: Enhanced Question Rewriter and Knowledge Filter, for better retrieval quality. Retrieval Trigger is incorporated to curtail extraneous external knowledge retrieval without sacrificing response quality. ERAGent also personalizes responses by incorporating a learned user profile. The efficiency and personalization characteristics of ERAGent are supported by the Experiential Learner module which makes the AI assistant being capable of expanding its knowledge and modeling user profile incrementally. Rigorous evaluations across six datasets and three question-answering tasks prove ERAGent's superior accuracy, efficiency, and personalization, emphasizing its potential to advance the RAG field and its applicability in practical systems.


Optimal convex $M$-estimation via score matching

arXiv.org Machine Learning

In the context of linear regression, we construct a data-driven convex loss function with respect to which empirical risk minimisation yields optimal asymptotic variance in the downstream estimation of the regression coefficients. Our semiparametric approach targets the best decreasing approximation of the derivative of the log-density of the noise distribution. At the population level, this fitting process is a nonparametric extension of score matching, corresponding to a log-concave projection of the noise distribution with respect to the Fisher divergence. The procedure is computationally efficient, and we prove that our procedure attains the minimal asymptotic covariance among all convex $M$-estimators. As an example of a non-log-concave setting, for Cauchy errors, the optimal convex loss function is Huber-like, and our procedure yields an asymptotic efficiency greater than 0.87 relative to the oracle maximum likelihood estimator of the regression coefficients that uses knowledge of this error distribution; in this sense, we obtain robustness without sacrificing much efficiency. Numerical experiments confirm the practical merits of our proposal.


Causal Disentanglement for Regulating Social Influence Bias in Social Recommendation

arXiv.org Artificial Intelligence

Social recommendation systems face the problem of social influence bias, which can lead to an overemphasis on recommending items that friends have interacted with. Addressing this problem is crucial, and existing methods often rely on techniques such as weight adjustment or leveraging unbiased data to eliminate this bias. However, we argue that not all biases are detrimental, i.e., some items recommended by friends may align with the user's interests. Blindly eliminating such biases could undermine these positive effects, potentially diminishing recommendation accuracy. In this paper, we propose a Causal Disentanglement-based framework for Regulating Social influence Bias in social recommendation, named CDRSB, to improve recommendation performance. From the perspective of causal inference, we find that the user social network could be regarded as a confounder between the user and item embeddings (treatment) and ratings (outcome). Due to the presence of this social network confounder, two paths exist from user and item embeddings to ratings: a non-causal social influence path and a causal interest path. Building upon this insight, we propose a disentangled encoder that focuses on disentangling user and item embeddings into interest and social influence embeddings. Mutual information-based objectives are designed to enhance the distinctiveness of these disentangled embeddings, eliminating redundant information. Additionally, a regulatory decoder that employs a weight calculation module to dynamically learn the weights of social influence embeddings for effectively regulating social influence bias has been designed. Experimental results on four large-scale real-world datasets Ciao, Epinions, Dianping, and Douban book demonstrate the effectiveness of CDRSB compared to state-of-the-art baselines.


A Privacy-Preserving Framework with Multi-Modal Data for Cross-Domain Recommendation

arXiv.org Artificial Intelligence

Cross-domain recommendation (CDR) aims to enhance recommendation accuracy in a target domain with sparse data by leveraging rich information in a source domain, thereby addressing the data-sparsity problem. Some existing CDR methods highlight the advantages of extracting domain-common and domain-specific features to learn comprehensive user and item representations. However, these methods can't effectively disentangle these components as they often rely on simple user-item historical interaction information (such as ratings, clicks, and browsing), neglecting the rich multi-modal features. Additionally, they don't protect user-sensitive data from potential leakage during knowledge transfer between domains. To address these challenges, we propose a Privacy-Preserving Framework with Multi-Modal Data for Cross-Domain Recommendation, called P2M2-CDR. Specifically, we first design a multi-modal disentangled encoder that utilizes multi-modal information to disentangle more informative domain-common and domain-specific embeddings. Furthermore, we introduce a privacy-preserving decoder to mitigate user privacy leakage during knowledge transfer. Local differential privacy (LDP) is utilized to obfuscate the disentangled embeddings before inter-domain exchange, thereby enhancing privacy protection. To ensure both consistency and differentiation among these obfuscated disentangled embeddings, we incorporate contrastive learning-based domain-inter and domain-intra losses. Extensive Experiments conducted on four real-world datasets demonstrate that P2M2-CDR outperforms other state-of-the-art single-domain and cross-domain baselines.


Deep Active Learning with Noise Stability

arXiv.org Artificial Intelligence

Uncertainty estimation for unlabeled data is crucial to active learning. With a deep neural network employed as the backbone model, the data selection process is highly challenging due to the potential over-confidence of the model inference. Existing methods resort to special learning fashions (e.g. adversarial) or auxiliary models to address this challenge. This tends to result in complex and inefficient pipelines, which would render the methods impractical. In this work, we propose a novel algorithm that leverages noise stability to estimate data uncertainty. The key idea is to measure the output derivation from the original observation when the model parameters are randomly perturbed by noise. We provide theoretical analyses by leveraging the small Gaussian noise theory and demonstrate that our method favors a subset with large and diverse gradients. Our method is generally applicable in various tasks, including computer vision, natural language processing, and structural data analysis. It achieves competitive performance compared against state-of-the-art active learning baselines.


Improving Transformation-based Defenses against Adversarial Examples with First-order Perturbations

arXiv.org Artificial Intelligence

Deep neural networks have been successfully applied in various machine learning tasks. However, studies show that neural networks are susceptible to adversarial attacks. This exposes a potential threat to neural network-based intelligent systems. We observe that the probability of the correct result outputted by the neural network increases by applying small first-order perturbations generated for non-predicted class labels to adversarial examples. Based on this observation, we propose a method for counteracting adversarial perturbations to improve adversarial robustness. In the proposed method, we randomly select a number of class labels and generate small first-order perturbations for these selected labels. The generated perturbations are added together and then clamped onto a specified space. The obtained perturbation is finally added to the adversarial example to counteract the adversarial perturbation contained in the example. The proposed method is applied at inference time and does not require retraining or finetuning the model. We experimentally validate the proposed method on CIFAR-10 and CIFAR-100. The results demonstrate that our method effectively improves the defense performance of several transformation-based defense methods, especially against strong adversarial examples generated using more iterations.


Leveraging Generative Language Models for Weakly Supervised Sentence Component Analysis in Video-Language Joint Learning

arXiv.org Artificial Intelligence

A thorough comprehension of textual data is a fundamental element in multi-modal video analysis tasks. However, recent works have shown that the current models do not achieve a comprehensive understanding of the textual data during the training for the target downstream tasks. Orthogonal to the previous approaches to this limitation, we postulate that understanding the significance of the sentence components according to the target task can potentially enhance the performance of the models. Hence, we utilize the knowledge of a pre-trained large language model (LLM) to generate text samples from the original ones, targeting specific sentence components. We propose a weakly supervised importance estimation module to compute the relative importance of the components and utilize them to improve different video-language tasks. Through rigorous quantitative analysis, our proposed method exhibits significant improvement across several video-language tasks. In particular, our approach notably enhances video-text retrieval by a relative improvement of 8.3\% in video-to-text and 1.4\% in text-to-video retrieval over the baselines, in terms of R@1. Additionally, in video moment retrieval, average mAP shows a relative improvement ranging from 2.0\% to 13.7 \% across different baselines.