Goto

Collaborating Authors

 Tsang, Ivor


Cross-Context Backdoor Attacks against Graph Prompt Learning

arXiv.org Artificial Intelligence

Graph Prompt Learning (GPL) bridges significant disparities between pretraining and downstream applications to alleviate the knowledge transfer bottleneck in real-world graph learning. While GPL offers superior effectiveness in graph knowledge transfer and computational efficiency, the security risks posed by backdoor poisoning effects embedded in pretrained models remain largely unexplored. Our study provides a comprehensive analysis of GPL's vulnerability to backdoor attacks. We introduce \textit{CrossBA}, the first cross-context backdoor attack against GPL, which manipulates only the pretraining phase without requiring knowledge of downstream applications. Our investigation reveals both theoretically and empirically that tuning trigger graphs, combined with prompt transformations, can seamlessly transfer the backdoor threat from pretrained encoders to downstream applications. Through extensive experiments involving 3 representative GPL methods across 5 distinct cross-context scenarios and 5 benchmark datasets of node and graph classification tasks, we demonstrate that \textit{CrossBA} consistently achieves high attack success rates while preserving the functionality of downstream applications over clean input. We also explore potential countermeasures against \textit{CrossBA} and conclude that current defenses are insufficient to mitigate \textit{CrossBA}. Our study highlights the persistent backdoor threats to GPL systems, raising trustworthiness concerns in the practices of GPL techniques.


A First-Order Multi-Gradient Algorithm for Multi-Objective Bi-Level Optimization

arXiv.org Artificial Intelligence

In this paper, we study the Multi-Objective Bi-Level Optimization (MOBLO) problem, where the upper-level subproblem is a multi-objective optimization problem and the lower-level subproblem is for scalar optimization. Existing gradient-based MOBLO algorithms need to compute the Hessian matrix, causing the computational inefficient problem. To address this, we propose an efficient first-order multi-gradient method for MOBLO, called FORUM. Specifically, we reformulate MOBLO problems as a constrained multi-objective optimization (MOO) problem via the value-function approach. Then we propose a novel multi-gradient aggregation method to solve the challenging constrained MOO problem. Theoretically, we provide the complexity analysis to show the efficiency of the proposed method and a non-asymptotic convergence result. Empirically, extensive experiments demonstrate the effectiveness and efficiency of the proposed FORUM method in different learning problems. In particular, it achieves state-of-the-art performance on three multi-task learning benchmark datasets.


Nonparametric Teaching for Multiple Learners

arXiv.org Artificial Intelligence

We study the problem of teaching multiple learners simultaneously in the nonparametric iterative teaching setting, where the teacher iteratively provides examples to the learner for accelerating the acquisition of a target concept. This problem is motivated by the gap between current single-learner teaching setting and the real-world scenario of human instruction where a teacher typically imparts knowledge to multiple students. Under the new problem formulation, we introduce a novel framework -- Multi-learner Nonparametric Teaching (MINT). In MINT, the teacher aims to instruct multiple learners, with each learner focusing on learning a scalar-valued target model. To achieve this, we frame the problem as teaching a vector-valued target model and extend the target model space from a scalar-valued reproducing kernel Hilbert space used in single-learner scenarios to a vector-valued space. Furthermore, we demonstrate that MINT offers significant teaching speed-up over repeated single-learner teaching, particularly when the multiple learners can communicate with each other. Lastly, we conduct extensive experiments to validate the practicality and efficiency of MINT.


Nonparametric Iterative Machine Teaching

arXiv.org Artificial Intelligence

In this paper, we consider the problem of Iterative Machine Teaching (IMT), where the teacher provides examples to the learner iteratively such that the learner can achieve fast convergence to a target model. However, existing IMT algorithms are solely based on parameterized families of target models. They mainly focus on convergence in the parameter space, resulting in difficulty when the target models are defined to be functions without dependency on parameters. To address such a limitation, we study a more general task -- Nonparametric Iterative Machine Teaching (NIMT), which aims to teach nonparametric target models to learners in an iterative fashion. Unlike parametric IMT that merely operates in the parameter space, we cast NIMT as a functional optimization problem in the function space. To solve it, we propose both random and greedy functional teaching algorithms. We obtain the iterative teaching dimension (ITD) of the random teaching algorithm under proper assumptions, which serves as a uniform upper bound of ITD in NIMT. Further, the greedy teaching algorithm has a significantly lower ITD, which reaches a tighter upper bound of ITD in NIMT. Finally, we verify the correctness of our theoretical findings with extensive experiments in nonparametric scenarios.


Causal Intervention for Abstractive Related Work Generation

arXiv.org Artificial Intelligence

Abstractive related work generation has attracted increasing attention in generating coherent related work that better helps readers grasp the background in the current research. However, most existing abstractive models ignore the inherent causality of related work generation, leading to low quality of generated related work and spurious correlations that affect the models' generalizability. In this study, we argue that causal intervention can address these limitations and improve the quality and coherence of the generated related works. To this end, we propose a novel Causal Intervention Module for Related Work Generation (CaM) to effectively capture causalities in the generation process and improve the quality and coherence of the generated related works. Specifically, we first model the relations among sentence order, document relation, and transitional content in related work generation using a causal graph. Then, to implement the causal intervention and mitigate the negative impact of spurious correlations, we use do-calculus to derive ordinary conditional probabilities and identify causal effects through CaM. Finally, we subtly fuse CaM with Transformer to obtain an end-to-end generation model. Extensive experiments on two real-world datasets show that causal interventions in CaM can effectively promote the model to learn causal relations and produce related work of higher quality and coherence.


Neural Optimization Kernel: Towards Robust Deep Learning

arXiv.org Machine Learning

Recent studies show a close connection between neural networks (NN) and kernel methods. However, most of these analyses (e.g., NTK) focus on the influence of (infinite) width instead of the depth of NN models. There remains a gap between theory and practical network designs that benefit from the depth. This paper first proposes a novel kernel family named Neural Optimization Kernel (NOK). Our kernel is defined as the inner product between two $T$-step updated functionals in RKHS w.r.t. a regularized optimization problem. Theoretically, we proved the monotonic descent property of our update rule for both convex and non-convex problems, and a $O(1/T)$ convergence rate of our updates for convex problems. Moreover, we propose a data-dependent structured approximation of our NOK, which builds the connection between training deep NNs and kernel methods associated with NOK. The resultant computational graph is a ResNet-type finite width NN. Our structured approximation preserved the monotonic descent property and $O(1/T)$ convergence rate. Namely, a $T$-layer NN performs $T$-step monotonic descent updates. Notably, we show our $T$-layered structured NN with ReLU maintains a $O(1/T)$ convergence rate w.r.t. a convex regularized problem, which explains the success of ReLU on training deep NN from a NN architecture optimization perspective. For the unsupervised learning and the shared parameter case, we show the equivalence of training structured NN with GD and performing functional gradient descent in RKHS associated with a fixed (data-dependent) NOK at an infinity-width regime. For finite NOKs, we prove generalization bounds. Remarkably, we show that overparameterized deep NN (NOK) can increase the expressive power to reduce empirical risk and reduce the generalization bound at the same time. Extensive experiments verify the robustness of our structured NOK blocks.


Contrastive Conditional Transport for Representation Learning

arXiv.org Machine Learning

The classical contrastive loss (Oord et al., 2018; Poole et al., 2018) has achieved remarkable success in representation learning, benefiting downstream tasks in a variety of areas (Misra & Maaten, 2020; He et al., 2020; Chen et al., 2020a; Fang & Xie, 2020; Giorgi et al., 2020). The intuition of the contrastive loss is that given a query, its positive sample needs to be close, while the negative samples need to be far away in the representation space, for which the unit hypersphere is the most common assumption (Wang et al., 2017; Davidson et al., 2018). This learning scheme encourages the encoder to learn representations that are invariant to unnecessary details, and uniformly distributed on the hypersphere to maximally preserve relevant information (Hjelm et al., 2018; Tian et al., 2019; Bachman et al., 2019; Wang & Isola, 2020). A notable concern of the conventional contrastive loss is that the query's positive and negative samples are often uniformly sampled and equally treated in the comparison, which results in an inefficient estimation and limits the performance of learned representations (Saunshi et al., 2019b; Chuang et al., 2020). As illustrated in Figure 1, given a query, the conventional CL methods usually randomly take one positive sample to form the positive pair and equally treat all the other negative pairs, regardless of how informative a sample is to the query.


Human-Understandable Decision Making for Visual Recognition

arXiv.org Artificial Intelligence

The widespread use of deep neural networks has achieved substantial success in many tasks. However, there still exists a huge gap between the operating mechanism of deep learning models and human-understandable decision making, so that humans cannot fully trust the predictions made by these models. To date, little work has been done on how to align the behaviors of deep learning models with human perception in order to train a human-understandable model. To fill this gap, we propose a new framework to train a deep neural network by incorporating the prior of human perception into the model learning process. Our proposed model mimics the process of perceiving conceptual parts from images and assessing their relative contributions towards the final recognition. The effectiveness of our proposed model is evaluated on two classical visual recognition tasks. The experimental results and analysis confirm our model is able to provide interpretable explanations for its predictions, but also maintain competitive recognition accuracy.


Learning Robust Node Representations on Graphs

arXiv.org Machine Learning

Graph neural networks (GNN), as a popular methodology for node representation learning on graphs, currently mainly focus on preserving the smoothness and identifiability of node representations. A robust node representation on graphs should further hold the stability property which means a node representation is resistant to slight perturbations on the input. In this paper, we introduce the stability of node representations in addition to the smoothness and identifiability, and develop a novel method called contrastive graph neural networks (CGNN) that learns robust node representations in an unsupervised manner. Specifically, CGNN maintains the stability and identifiability by a contrastive learning objective, while preserving the smoothness with existing GNN models. Furthermore, the proposed method is a generic framework that can be equipped with many other backbone models (e.g. GCN, GraphSage and GAT). Extensive experiments on four benchmarks under both transductive and inductive learning setups demonstrate the effectiveness of our method in comparison with recent supervised and unsupervised models.


Co-teaching: Robust training of deep neural networks with extremely noisy labels

Neural Information Processing Systems

Deep learning with noisy labels is practically challenging, as the capacity of deep models is so high that they can totally memorize these noisy labels sooner or later during training. Nonetheless, recent studies on the memorization effects of deep neural networks show that they would first memorize training data of clean labels and then those of noisy labels. Therefore in this paper, we propose a new deep learning paradigm called ''Co-teaching'' for combating with noisy labels. Namely, we train two deep neural networks simultaneously, and let them teach each other given every mini-batch: firstly, each network feeds forward all data and selects some data of possibly clean labels; secondly, two networks communicate with each other what data in this mini-batch should be used for training; finally, each network back propagates the data selected by its peer network and updates itself. Empirical results on noisy versions of MNIST, CIFAR-10 and CIFAR-100 demonstrate that Co-teaching is much superior to the state-of-the-art methods in the robustness of trained deep models.