Du, Bo
MOKD: Cross-domain Finetuning for Few-shot Classification via Maximizing Optimized Kernel Dependence
Tian, Hongduan, Liu, Feng, Liu, Tongliang, Du, Bo, Cheung, Yiu-ming, Han, Bo
In cross-domain few-shot classification, nearest Cross-domain few-shot classification (Dvornik et al., 2020; centroid classifier (NCC) aims to learn representations Li et al., 2021a; Liu et al., 2021a; Triantafillou et al., 2020), to construct a metric space where few-shot also known as CFC, is a learning paradigm which aims at classification can be performed by measuring the learning to perform classification on tasks sampled from similarities between samples and the prototype of previously unseen data or domains with only a few labeled each class. An intuition behind NCC is that each data available. Compared with conventional few-shot classification sample is pulled closer to the class centroid it belongs (Finn et al., 2017; Ravi & Larochelle, 2017; Snell to while pushed away from those of other et al., 2017; Vinyals et al., 2016) which learns to adapt to classes. However, in this paper, we find that there new tasks sampled from unseen data with the same distribution exist high similarities between NCC-learned representations as seen data, cross-domain few-shot classification of two samples from different classes. is a much more challenging learning task since there exist In order to address this problem, we propose a discrepancies between the distributions of source and target bi-level optimization framework, maximizing optimized domains (Chi et al., 2021; Kuzborskij & Orabona, 2013).
A Cross-Field Fusion Strategy for Drug-Target Interaction Prediction
Zhang, Hongzhi, Gong, Xiuwen, Pan, Shirui, Wu, Jia, Du, Bo, Hu, Wenbin
Drug-target interaction (DTI) prediction is a critical component of the drug discovery process. In the drug development engineering field, predicting novel drugtarget interactions is extremely crucial. However, although existing methods have achieved high accuracy levels in predicting known drugs and drug targets, they fail to utilize global protein information during DTI prediction. This leads to an inability to effectively predict interaction the interactions between novel drugs and their targets. As a result, the cross-field information fusion strategy is employed to acquire local and global protein information. Thus, we propose the siamese drug-target interaction (SiamDTI SiamDTI) prediction method, which utilizes a double channel network structure for cross-field supervised learning. Experimental results on three benchmark datasets demonstrate that SiamDTI achieves higher accuracy levels than other state-of-the-art (SOTA) methods on novel drugs and targets. Additionally, SiamDTI's performance with known drugs and targets is comparable to that of SOTA approachs. The code is available at https://anonymous.4open.science/r/DDDTI-434D/.
Regressor-free Molecule Generation to Support Drug Response Prediction
Li, Kun, Gong, Xiuwen, Pan, Shirui, Wu, Jia, Du, Bo, Hu, Wenbin
Drug response prediction (DRP) is a crucial phase in drug discovery, and the most important metric for its evaluation is the IC50 score. DRP results are heavily dependent on the quality of the generated molecules. Existing molecule generation methods typically employ classifier-based guidance, enabling sampling within the IC50 classification range. However, these methods fail to ensure the sampling space range's effectiveness, generating numerous ineffective molecules. Through experimental and theoretical study, we hypothesize that conditional generation based on the target IC50 score can obtain a more effective sampling space. As a result, we introduce regressor-free guidance molecule generation to ensure sampling within a more effective space and support DRP. Regressor-free guidance combines a diffusion model's score estimation with a regression controller model's gradient based on number labels. To effectively map regression labels between drugs and cell lines, we design a common-sense numerical knowledge graph that constrains the order of text representations. Experimental results on the real-world dataset for the DRP task demonstrate our method's effectiveness in drug discovery. The code is available at: https://anonymous.4open.science/r/RMCD-DBD1.
Hi-GMAE: Hierarchical Graph Masked Autoencoders
Liu, Chuang, Yao, Zelin, Zhan, Yibing, Ma, Xueqi, Tao, Dapeng, Wu, Jia, Hu, Wenbin, Pan, Shirui, Du, Bo
Graph Masked Autoencoders (GMAEs) have emerged as a notable self-supervised learning approach for graph-structured data. Existing GMAE models primarily focus on reconstructing node-level information, categorizing them as single-scale GMAEs. This methodology, while effective in certain contexts, tends to overlook the complex hierarchical structures inherent in many real-world graphs. For instance, molecular graphs exhibit a clear hierarchical organization in the form of the atoms-functional groups-molecules structure. Hence, the inability of single-scale GMAE models to incorporate these hierarchical relationships often leads to their inadequate capture of crucial high-level graph information, resulting in a noticeable decline in performance. To address this limitation, we propose Hierarchical Graph Masked AutoEncoders (Hi-GMAE), a novel multi-scale GMAE framework designed to handle the hierarchical structures within graphs. First, Hi-GMAE constructs a multi-scale graph hierarchy through graph pooling, enabling the exploration of graph structures across different granularity levels. To ensure masking uniformity of subgraphs across these scales, we propose a novel coarse-to-fine strategy that initiates masking at the coarsest scale and progressively back-projects the mask to the finer scales. Furthermore, we integrate a gradual recovery strategy with the masking process to mitigate the learning challenges posed by completely masked subgraphs. Diverging from the standard graph neural network (GNN) used in GMAE models, Hi-GMAE modifies its encoder and decoder into hierarchical structures. This entails using GNN at the finer scales for detailed local graph analysis and employing a graph transformer at coarser scales to capture global information. Our experiments on 15 graph datasets consistently demonstrate that Hi-GMAE outperforms 17 state-of-the-art self-supervised competitors.
Separable Power of Classical and Quantum Learning Protocols Through the Lens of No-Free-Lunch Theorem
Wang, Xinbiao, Du, Yuxuan, Liu, Kecheng, Luo, Yong, Du, Bo, Tao, Dacheng
The No-Free-Lunch (NFL) theorem, which quantifies problem- and data-independent generalization errors regardless of the optimization process, provides a foundational framework for comprehending diverse learning protocols' potential. Despite its significance, the establishment of the NFL theorem for quantum machine learning models remains largely unexplored, thereby overlooking broader insights into the fundamental relationship between quantum and classical learning protocols. To address this gap, we categorize a diverse array of quantum learning algorithms into three learning protocols designed for learning quantum dynamics under a specified observable and establish their NFL theorem. The exploited protocols, namely Classical Learning Protocols (CLC-LPs), Restricted Quantum Learning Protocols (ReQu-LPs), and Quantum Learning Protocols (Qu-LPs), offer varying levels of access to quantum resources. Our derived NFL theorems demonstrate quadratic reductions in sample complexity across CLC-LPs, ReQu-LPs, and Qu-LPs, contingent upon the orthogonality of quantum states and the diagonality of observables. We attribute this performance discrepancy to the unique capacity of quantum-related learning protocols to indirectly utilize information concerning the global phases of non-orthogonal quantum states, a distinctive physical feature inherent in quantum mechanics. Our findings not only deepen our understanding of quantum learning protocols' capabilities but also provide practical insights for the development of advanced quantum learning algorithms.
Improving Complex Reasoning over Knowledge Graph with Logic-Aware Curriculum Tuning
Xia, Tianle, Ding, Liang, Wan, Guojia, Zhan, Yibing, Du, Bo, Tao, Dacheng
Answering complex queries over incomplete knowledge graphs (KGs) is a challenging job. Most previous works have focused on learning entity/relation embeddings and simulating first-order logic operators with various neural networks. However, they are bottlenecked by the inability to share world knowledge to improve logical reasoning, thus resulting in suboptimal performance. In this paper, we propose a complex reasoning schema over KG upon large language models (LLMs), containing a curriculum-based logical-aware instruction tuning framework, named LACT. Specifically, we augment the arbitrary first-order logical queries via binary tree decomposition, to stimulate the reasoning capability of LLMs. To address the difficulty gap among different types of complex queries, we design a simple and flexible logic-aware curriculum learning framework. Experiments across widely used datasets demonstrate that LACT has substantial improvements~(brings an average +5.5% MRR score) over advanced methods, achieving the new state-of-the-art. Our code and model will be released at GitHub and huggingface soon.
Federated Learning with Only Positive Labels by Exploring Label Correlations
An, Xuming, Wang, Dui, Shen, Li, Luo, Yong, Hu, Han, Du, Bo, Wen, Yonggang, Tao, Dacheng
This approach, however, treats different labels equally Federated learning (FL) [1] is a novel machine learning in the spreadout (class embedding separation) process. That paradigm that trains an algorithm across multiple decentralized is, embeddings of class labels that are highly correlated and clients (such as edge devices) or servers without exchanging significantly different in multiple labels' space are separated in local data samples. Since clients can only access the local the same way. This is not reasonable since embeddings should datasets, the user's privacy can be well protected, and this be close for correlated labels, and dissimilar otherwise. For paradigm has attracted increasing attention in recent years [2]- example, we assume that the class labels'Desktop computer' [4]. In this paper, we study the challenge problem of learning and'Desk' often appear in the same instance, thus these two a multi-label classification model [5], [6] under the federated corresponding class embedding vectors can be deemed highcorrelation learning setting, where each user has only local positive data and may be relatively close compared with others, related to a single class label [7]. This setting can be treated such as class labels'aircraft', 'automobile', etc. Besides, since as the extremely label-skew case in the data heterogeneity of the instance and class embeddings are trained on clients and federated learning, which is popular in real-world applications.
Soft-Prompting with Graph-of-Thought for Multi-modal Representation Learning
Yang, Juncheng, Li, Zuchao, Xie, Shuai, Yu, Wei, Li, Shijun, Du, Bo
The chain-of-thought technique has been received well in multi-modal tasks. It is a step-by-step linear reasoning process that adjusts the length of the chain to improve the performance of generated prompts. However, human thought processes are predominantly non-linear, as they encompass multiple aspects simultaneously and employ dynamic adjustment and updating mechanisms. Therefore, we propose a novel Aggregation-Graph-of-Thought (AGoT) mechanism for soft-prompt tuning in multi-modal representation learning. The proposed AGoT models the human thought process not only as a chain but also models each step as a reasoning aggregation graph to cope with the overlooked multiple aspects of thinking in single-step reasoning. This turns the entire reasoning process into prompt aggregation and prompt flow operations. Experiments show that our multi-modal model enhanced with AGoT soft-prompting achieves good results in several tasks such as text-image retrieval, visual question answering, and image recognition. In addition, we demonstrate that it has good domain generalization performance due to better reasoning.
Improving Bird's Eye View Semantic Segmentation by Task Decomposition
Zhao, Tianhao, Chen, Yongcan, Wu, Yu, Liu, Tianyang, Du, Bo, Xiao, Peilun, Qiu, Shi, Yang, Hongda, Li, Guozhen, Yang, Yi, Lin, Yutian
Semantic segmentation in bird's eye view (BEV) plays a crucial role in autonomous driving. Previous methods usually follow an end-to-end pipeline, directly predicting the BEV segmentation map from monocular RGB inputs. However, the challenge arises when the RGB inputs and BEV targets from distinct perspectives, making the direct point-to-point predicting hard to optimize. In this paper, we decompose the original BEV segmentation task into two stages, namely BEV map reconstruction and RGB-BEV feature alignment. In the first stage, we train a BEV autoencoder to reconstruct the BEV segmentation maps given corrupted noisy latent representation, which urges the decoder to learn fundamental knowledge of typical BEV patterns. The second stage involves mapping RGB input images into the BEV latent space of the first stage, directly optimizing the correlations between the two views at the feature level. Our approach simplifies the complexity of combining perception and generation into distinct steps, equipping the model to handle intricate and challenging scenes effectively. Besides, we propose to transform the BEV segmentation map from the Cartesian to the polar coordinate system to establish the column-wise correspondence between RGB images and BEV maps. Moreover, our method requires neither multi-scale features nor camera intrinsic parameters for depth estimation and saves computational overhead. Extensive experiments on nuScenes and Argoverse show the effectiveness and efficiency of our method. Code is available at https://github.com/happytianhao/TaDe.
Online GNN Evaluation Under Test-time Graph Distribution Shifts
Zheng, Xin, Song, Dongjin, Wen, Qingsong, Du, Bo, Pan, Shirui
Evaluating the performance of a well-trained GNN model on real-world graphs is a pivotal step for reliable GNN online deployment and serving. Due to a lack of test node labels and unknown potential training-test graph data distribution shifts, conventional model evaluation encounters limitations in calculating performance metrics (e.g., test error) and measuring graph data-level discrepancies, particularly when the training graph used for developing GNNs remains unobserved during test time. In this paper, we study a new research problem, online GNN evaluation, which aims to provide valuable insights into the well-trained GNNs's ability to effectively generalize to real-world unlabeled graphs under the test-time graph distribution shifts. Concretely, we develop an effective learning behavior discrepancy score, dubbed LeBeD, to estimate the test-time generalization errors of well-trained GNN models. Through a novel GNN re-training strategy with a parameter-free optimality criterion, the proposed LeBeD comprehensively integrates learning behavior discrepancies from both node prediction and structure reconstruction perspectives. This enables the effective evaluation of the well-trained GNNs' ability to capture test node semantics and structural representations, making it an expressive metric for estimating the generalization error in online GNN evaluation. Extensive experiments on real-world test graphs under diverse graph distribution shifts could verify the effectiveness of the proposed method, revealing its strong correlation with ground-truth test errors on various well-trained GNN models.