Ling, Haibin
Graph Transformer Networks for Accurate Band Structure Prediction: An End-to-End Approach
Gong, Weiyi, Sun, Tao, Bai, Hexin, Tsai, Jeng-Yuan, Ling, Haibin, Yan, Qimin
Predicting electronic band structures from crystal structures is crucial for understanding structure-property correlations in materials science. First-principles approaches are accurate but computationally intensive. Recent years, machine learning (ML) has been extensively applied to this field, while existing ML models predominantly focus on band gap predictions or indirect band structure estimation via solving predicted Hamiltonians. An end-to-end model to predict band structure accurately and efficiently is still lacking. Here, we introduce a graph Transformer-based end-to-end approach that directly predicts band structures from crystal structures with high accuracy. Our method leverages the continuity of the k-path and treat continuous bands as a sequence. We demonstrate that our model not only provides accurate band structure predictions but also can derive other properties (such as band gap, band center, and band dispersion) with high accuracy. We verify the model performance on large and diverse datasets.
VQ-Map: Bird's-Eye-View Map Layout Estimation in Tokenized Discrete Space via Vector Quantization
Zhang, Yiwei, Gao, Jin, Ge, Fudong, Luo, Guan, Li, Bing, Zhang, Zhaoxiang, Ling, Haibin, Hu, Weiming
Bird's-eye-view (BEV) map layout estimation requires an accurate and full understanding of the semantics for the environmental elements around the ego car to make the results coherent and realistic. Due to the challenges posed by occlusion, unfavourable imaging conditions and low resolution, generating the BEV semantic maps corresponding to corrupted or invalid areas in the perspective view (PV) is appealing very recently. The question is how to align the PV features with the generative models to facilitate the map estimation. In this paper, we propose to utilize a generative model similar to the Vector Quantized-Variational AutoEncoder (VQ-VAE) to acquire prior knowledge for the high-level BEV semantics in the tokenized discrete space. Thanks to the obtained BEV tokens accompanied with a codebook embedding encapsulating the semantics for different BEV elements in the groundtruth maps, we are able to directly align the sparse backbone image features with the obtained BEV tokens from the discrete representation learning based on a specialized token decoder module, and finally generate high-quality BEV maps with the BEV codebook embedding serving as a bridge between PV and BEV. We evaluate the BEV map layout estimation performance of our model, termed VQ-Map, on both the nuScenes and Argoverse benchmarks, achieving 62.2/47.6 mean IoU for surround-view/monocular evaluation on nuScenes, as well as 73.4 IoU for monocular evaluation on Argoverse, which all set a new record for this map layout estimation task. The code and models are available on https://github.com/Z1zyw/VQ-Map.
Data Augmentation for Automated Adaptive Rodent Training
Das, Dibyendu, Fontanini, Alfredo, Kogan, Joshua F., Ling, Haibin, Ramakrishnan, C. R., Ramakrishnan, I. V.
Fully optimized automation of behavioral training protocols for lab animals like rodents has long been a coveted goal for researchers. It is an otherwise labor-intensive and time-consuming process that demands close interaction between the animal and the researcher. In this work, we used a data-driven approach to optimize the way rodents are trained in labs. In pursuit of our goal, we looked at data augmentation, a technique that scales well in data-poor environments. Using data augmentation, we built several artificial rodent models, which in turn would be used to build an efficient and automatic trainer. Then we developed a novel similarity metric based on the action probability distribution to measure the behavioral resemblance of our models to that of real rodents.
Long-Tailed Backdoor Attack Using Dynamic Data Augmentation Operations
Pang, Lu, Sun, Tao, Lyu, Weimin, Ling, Haibin, Chen, Chao
Recently, backdoor attack has become an increasing security threat to deep neural networks and drawn the attention of researchers. Backdoor attacks exploit vulnerabilities in third-party pretrained models during the training phase, enabling them to behave normally for clean samples and mispredict for samples with specific triggers. Existing backdoor attacks mainly focus on balanced datasets. However, real-world datasets often follow long-tailed distributions. In this paper, for the first time, we explore backdoor attack on such datasets. Specifically, we first analyze the influence of data imbalance on backdoor attack. Based on our analysis, we propose an effective backdoor attack named Dynamic Data Augmentation Operation (D$^2$AO). We design D$^2$AO selectors to select operations depending jointly on the class, sample type (clean vs. backdoored) and sample features. Meanwhile, we develop a trigger generator to generate sample-specific triggers. Through simultaneous optimization of the backdoored model and trigger generator, guided by dynamic data augmentation operation selectors, we achieve significant advancements. Extensive experiments demonstrate that our method can achieve the state-of-the-art attack performance while preserving the clean accuracy.
DFA-GNN: Forward Learning of Graph Neural Networks by Direct Feedback Alignment
Zhao, Gongpei, Wang, Tao, Lang, Congyan, Jin, Yi, Li, Yidong, Ling, Haibin
Graph neural networks are recognized for their strong performance across various applications, with the backpropagation algorithm playing a central role in the development of most GNN models. However, despite its effectiveness, BP has limitations that challenge its biological plausibility and affect the efficiency, scalability and parallelism of training neural networks for graph-based tasks. While several non-BP training algorithms, such as the direct feedback alignment, have been successfully applied to fully-connected and convolutional network components for handling Euclidean data, directly adapting these non-BP frameworks to manage non-Euclidean graph data in GNN models presents significant challenges. These challenges primarily arise from the violation of the i.i.d. assumption in graph data and the difficulty in accessing prediction errors for all samples (nodes) within the graph. To overcome these obstacles, in this paper we propose DFA-GNN, a novel forward learning framework tailored for GNNs with a case study of semi-supervised learning. The proposed method breaks the limitations of BP by using a dedicated forward training mechanism. Specifically, DFA-GNN extends the principles of DFA to adapt to graph data and unique architecture of GNNs, which incorporates the information of graph topology into the feedback links to accommodate the non-Euclidean characteristics of graph data. Additionally, for semi-supervised graph learning tasks, we developed a pseudo error generator that spreads residual errors from training data to create a pseudo error for each unlabeled node. These pseudo errors are then utilized to train GNNs using DFA. Extensive experiments on 10 public benchmarks reveal that our learning framework outperforms not only previous non-BP methods but also the standard BP methods, and it exhibits excellent robustness against various types of noise and attacks.
Task-Agnostic Detector for Insertion-Based Backdoor Attacks
Lyu, Weimin, Lin, Xiao, Zheng, Songzhu, Pang, Lu, Ling, Haibin, Jha, Susmit, Chen, Chao
Textual backdoor attacks pose significant security threats. Current detection approaches, typically relying on intermediate feature representation or reconstructing potential triggers, are task-specific and less effective beyond sentence classification, struggling with tasks like question answering and named entity recognition. We introduce TABDet (Task-Agnostic Backdoor Detector), a pioneering task-agnostic method for backdoor detection. TABDet leverages final layer logits combined with an efficient pooling technique, enabling unified logit representation across three prominent NLP tasks. TABDet can jointly learn from diverse task-specific models, demonstrating superior detection efficacy over traditional task-specific methods.
Attention-Enhancing Backdoor Attacks Against BERT-based Models
Lyu, Weimin, Zheng, Songzhu, Pang, Lu, Ling, Haibin, Chen, Chao
Recent studies have revealed that \textit{Backdoor Attacks} can threaten the safety of natural language processing (NLP) models. Investigating the strategies of backdoor attacks will help to understand the model's vulnerability. Most existing textual backdoor attacks focus on generating stealthy triggers or modifying model weights. In this paper, we directly target the interior structure of neural networks and the backdoor mechanism. We propose a novel Trojan Attention Loss (TAL), which enhances the Trojan behavior by directly manipulating the attention patterns. Our loss can be applied to different attacking methods to boost their attack efficacy in terms of attack successful rates and poisoning rates. It applies to not only traditional dirty-label attacks, but also the more challenging clean-label attacks. We validate our method on different backbone models (BERT, RoBERTa, and DistilBERT) and various tasks (Sentiment Analysis, Toxic Detection, and Topic Classification).
Mask and Restore: Blind Backdoor Defense at Test Time with Masked Autoencoder
Sun, Tao, Pang, Lu, Chen, Chao, Ling, Haibin
Deep neural networks are vulnerable to backdoor attacks, where an adversary maliciously manipulates the model behavior through overlaying images with special triggers. Existing backdoor defense methods often require accessing a few validation data and model parameters, which are impractical in many real-world applications, e.g., when the model is provided as a cloud service. In this paper, we address the practical task of blind backdoor defense at test time, in particular for black-box models. The true label of every test image needs to be recovered on the fly from a suspicious model regardless of image benignity. We focus on testtime image purification methods that incapacitate possible triggers while keeping semantic contents intact. Due to diverse trigger patterns and sizes, the heuristic trigger search in image space can be unscalable. We circumvent such barrier by leveraging the strong reconstruction power of generative models, and propose a framework of Blind Defense with Masked AutoEncoder (BDMAE). It detects possible triggers in the token space using image structural similarity and label consistency between the test image and MAE restorations. The detection results are then refined by considering trigger topology. Our approach is blind to the model architectures, trigger patterns and image benignity. Code is available at https://github.com/tsun/BDMAE. Deep neural networks have been widely used in various computer vision tasks, like image classification (Krizhevsky et al., 2012), object detection (Girshick et al., 2014) and image segmentation (Long et al., 2015), etc. Despite the superior performances, their vulnerability to backdoor attacks has raised increasing concerns (Gu et al., 2019; Nguyen & Tran, 2020; Turner et al., 2019). During training, an adversary can maliciously inject a small portion of poisoned data. These images contain special triggers that are associated with specific target labels. At inference, the backdoored model behaves normally on clean images but makes incorrect predictions on images with triggers. To defend against backdoor behaviors, existing methods often require accessing a few validation data and model parameters. Some works reverse-engineer triggers (Wang et al., 2019; Guan et al., 2022), and mitigate backdoor by pruning bad neurons or retraining models (Liu et al., 2018; Wang et al., 2019; Zeng et al., 2021). The clean labeled data they require, however, are often unavailable. A recent work shows that the backdoor behaviors could be cleansed with unlabeled or even out-ofdistribution data (Pang et al., 2023). Instead of modifying the model, Februus (Doan et al., 2020) detects triggers with GradCAM (Selvaraju et al., 2017), and feeds purified images to the model.
Transparent Object Tracking with Enhanced Fusion Module
Garigapati, Kalyan, Blasch, Erik, Wei, Jie, Ling, Haibin
Accurate tracking of transparent objects, such as glasses, plays a critical role in many robotic tasks such as robot-assisted living. Due to the adaptive and often reflective texture of such objects, traditional tracking algorithms that rely on general-purpose learned features suffer from reduced performance. Recent research has proposed to instill transparency awareness into existing general object trackers by fusing purpose-built features. However, with the existing fusion techniques, the addition of new features causes a change in the latent space making it impossible to incorporate transparency awareness on trackers with fixed latent spaces. For example, many of the current days transformer-based trackers are fully pre-trained and are sensitive to any latent space perturbations. In this paper, we present a new feature fusion technique that integrates transparency information into a fixed feature space, enabling its use in a broader range of trackers. Our proposed fusion module, composed of a transformer encoder and an MLP module, leverages key query-based transformations to embed the transparency information into the tracking pipeline. We also present a new two-step training strategy for our fusion module to effectively merge transparency features. We propose a new tracker architecture that uses our fusion techniques to achieve superior results for transparent object tracking. Our proposed method achieves competitive results with state-of-the-art trackers on TOTB, which is the largest transparent object tracking benchmark recently released. Our results and the implementation of code will be made publicly available at https://github.com/kalyan0510/TOTEM.
Local Context-Aware Active Domain Adaptation
Sun, Tao, Lu, Cheng, Ling, Haibin
Active Domain Adaptation (ADA) queries the labels of a small number of selected target samples to help adapting a model from a source domain to a target domain. The local context of queried data is important, especially when the domain gap is large. However, this has not been fully explored by existing ADA works. In this paper, we propose a Local context-aware ADA framework, named LADA, to address this issue. To select informative target samples, we devise a novel criterion based on the local inconsistency of model predictions. Since the labeling budget is usually small, fine-tuning model on only queried data can be inefficient. We progressively augment labeled target data with the confident neighbors in a class-balanced manner. Experiments validate that the proposed criterion chooses more informative target samples than existing active selection strategies. Furthermore, our full method clearly surpasses recent ADA arts on various benchmarks. Code is available at https://github.com/tsun/LADA.