Fu, Ying
Frequency Dynamic Convolution for Dense Image Prediction
Chen, Linwei, Gu, Lin, Li, Liang, Yan, Chenggang, Fu, Ying
While Dynamic Convolution (DY-Conv) has shown promising performance by enabling adaptive weight selection through multiple parallel weights combined with an attention mechanism, the frequency response of these weights tends to exhibit high similarity, resulting in high parameter costs but limited adaptability. In this work, we introduce Frequency Dynamic Convolution (FDConv), a novel approach that mitigates these limitations by learning a fixed parameter budget in the Fourier domain. FDConv divides this budget into frequency-based groups with disjoint Fourier indices, enabling the construction of frequency-diverse weights without increasing the parameter cost. To further enhance adaptability, we propose Kernel Spatial Modulation (KSM) and Frequency Band Modulation (FBM). KSM dynamically adjusts the frequency response of each filter at the spatial level, while FBM decomposes weights into distinct frequency bands in the frequency domain and modulates them dynamically based on local content. Extensive experiments on object detection, segmentation, and classification validate the effectiveness of FDConv. We demonstrate that when applied to ResNet-50, FDConv achieves superior performance with a modest increase of +3.6M parameters, outperforming previous methods that require substantial increases in parameter budgets (e.g., CondConv +90M, KW +76.5M). Moreover, FDConv seamlessly integrates into a variety of architectures, including ConvNeXt, Swin-Transformer, offering a flexible and efficient solution for modern vision tasks. The code is made publicly available at https://github.com/Linwei-Chen/FDConv.
Deep Learning Models for Colloidal Nanocrystal Synthesis
Gu, Kai, Liang, Yingping, Su, Jiaming, Sun, Peihan, Peng, Jia, Miao, Naihua, Sun, Zhimei, Fu, Ying, Zhong, Haizheng, Zhang, Jun
Colloidal synthesis of nanocrystals usually includes complex chemical reactions and multi-step crystallization processes. Despite the great success in the past 30 years, it remains challenging to clarify the correlations between synthetic parameters of chemical reaction and physical properties of nanocrystals. Here, we developed a deep learning-based nanocrystal synthesis model that correlates synthetic parameters with the final size and shape of target nanocrystals, using a dataset of 3500 recipes covering 348 distinct nanocrystal compositions. The size and shape labels were obtained from transmission electron microscope images using a segmentation model trained with a semi-supervised algorithm on a dataset comprising 1.2 million nanocrystals. By applying the reaction intermediate-based data augmentation method and elaborated descriptors, the synthesis model was able to predict nanocrystal's size with a mean absolute error of 1.39 nm, while reaching an 89% average accuracy for shape classification. The synthesis model shows knowledge transfer capabilities across different nanocrystals with inputs of new recipes. With that, the influence of chemicals on the final size of nanocrystals was further evaluated, revealing the importance order of nanocrystal composition, precursor or ligand, and solvent. Overall, the deep learning-based nanocrystal synthesis model offers a powerful tool to expedite the development of high-quality nanocrystals.
Aligning LLMs for FL-free Program Repair
Xu, Junjielong, Fu, Ying, Tan, Shin Hwei, He, Pinjia
Large language models (LLMs) have achieved decent results on automated program repair (APR). However, the next token prediction training objective of decoder-only LLMs (e.g., GPT-4) is misaligned with the masked span prediction objective of current infilling-style methods, which impedes LLMs from fully leveraging pre-trained knowledge for program repair. In addition, while some LLMs are capable of locating and repairing bugs end-to-end when using the related artifacts (e.g., test cases) as input, existing methods regard them as separate tasks and ask LLMs to generate patches at fixed locations. This restriction hinders LLMs from exploring potential patches beyond the given locations. In this paper, we investigate a new approach to adapt LLMs to program repair. Our core insight is that LLM's APR capability can be greatly improved by simply aligning the output to their training objective and allowing them to refine the whole program without first performing fault localization. Based on this insight, we designed D4C, a straightforward prompting framework for APR. D4C can repair 180 bugs correctly in Defects4J, with each patch being sampled only 10 times. This surpasses the SOTA APR methods with perfect fault localization by 10% and reduces the patch sampling number by 90%. Our findings reveal that (1) objective alignment is crucial for fully exploiting LLM's pre-trained capability, and (2) replacing the traditional localize-then-repair workflow with direct debugging is more effective for LLM-based APR methods. Thus, we believe this paper introduces a new mindset for harnessing LLMs in APR.
LLMs Instruct LLMs:An Extraction and Editing Method
Zhang, Xin, Ju, Tianjie, Liang, Huijia, Fu, Ying, Zhang, Qin
The interest in updating Large Language Models (LLMs) without retraining from scratch is substantial, yet it comes with some challenges.This is especially true for situations demanding complex reasoning with limited samples, a scenario we refer to as the Paucity-Constrained Complex Reasoning Adaptation for LLMs (PCRA-LLM).Traditional methods like Low-Rank Adaptation (LoRA) and Retrieval-Augmented Generation (RAG) are inadequate for this critical issue, particularly evident in our exploration of a specific medical context that epitomize the PCRA-LLM's distinct needs.To address the issue, we propose a Sequential Fusion method to incorporate knowledge from complex context into LLMs. This method employs a two-stage framework: initially, it leverages general LLMs to construct knowledge graphs (KGs) for extracting knowledge from complex texts; subsequently, it updates the domain LLMs through knowledge edit. According to our method, the domain LLM achieved a 71.69\% accuracy in question answering tasks. Subsequently, we broadened our assessment to a novel dataset we developed in the economics and management field, where our method realized a 75\% accuracy. These outcomes underline the efficacy and adaptability of our approach for PCRA-LLM across various domains.
Degradation Modeling and Prognostic Analysis Under Unknown Failure Modes
Fu, Ying, Huh, Ye Kwon, Liu, Kaibo
Operating units often experience various failure modes in complex systems, leading to distinct degradation paths. Relying on a prognostic model trained on a single failure mode may lead to poor generalization performance across multiple failure modes. Therefore, accurately identifying the failure mode is of critical importance. Current prognostic approaches either ignore failure modes during degradation or assume known failure mode labels, which can be challenging to acquire in practice. Moreover, the high dimensionality and complex relations of sensor signals make it challenging to identify the failure modes accurately. To address these issues, we propose a novel failure mode diagnosis method that leverages a dimension reduction technique called UMAP (Uniform Manifold Approximation and Projection) to project and visualize each unit's degradation trajectory into a lower dimension. Then, using these degradation trajectories, we develop a time series-based clustering method to identify the training units' failure modes. Finally, we introduce a monotonically constrained prognostic model to predict the failure mode labels and RUL of the test units simultaneously using the obtained failure modes of the training units. The proposed prognostic model provides failure mode-specific RUL predictions while preserving the monotonic property of the RUL predictions across consecutive time steps. We evaluate the proposed model using a case study with the aircraft gas turbine engine dataset.
Comparative Study on the Performance of Categorical Variable Encoders in Classification and Regression Tasks
Zhu, Wenbin, Qiu, Runwen, Fu, Ying
Categorical variables often appear in datasets for classification and regression tasks, and they need to be encoded into numerical values before training. Since many encoders have been developed and can significantly impact performance, choosing the appropriate encoder for a task becomes a time-consuming yet important practical issue. This study broadly classifies machine learning models into three categories: 1) ATI models that implicitly perform affine transformations on inputs, such as multi-layer perceptron neural network; 2) Tree-based models that are based on decision trees, such as random forest; and 3) the rest, such as kNN. Theoretically, we prove that the one-hot encoder is the best choice for ATI models in the sense that it can mimic any other encoders by learning suitable weights from the data. We also explain why the target encoder and its variants are the most suitable encoders for tree-based models. This study conducted comprehensive computational experiments to evaluate 14 encoders, including one-hot and target encoders, along with eight common machine-learning models on 28 datasets. The computational results agree with our theoretical analysis. The findings in this study shed light on how to select the suitable encoder for data scientists in fields such as fraud detection, disease diagnosis, etc.
Instance Segmentation in the Dark
Chen, Linwei, Fu, Ying, Wei, Kaixuan, Zheng, Dezhi, Heide, Felix
Noname manuscript No. (will be inserted by the editor) Abstract Existing instance segmentation techniques are primarily depth can be critical for low-light instance segmentation. To tailored for high-visibility inputs, but their performance mitigate the scarcity of annotated RAW datasets, we leverage significantly deteriorates in extremely low-light environments. In addition, to facilitate further research in the dark and introduce several techniques that in this direction, we capture a real-world low-light instance substantially boost the low-light inference accuracy. The proposed segmentation dataset comprising over two thousand paired method is motivated by the observation that noise in low/normal-light images with instance-level pixel-wise annotations. To suppress this "feature noise", we in very low light (4 % AP higher than state-of-the-art propose a novel learning method that relies on an adaptive competitors), meanwhile opening new opportunities for future weighted downsampling layer, a smooth-oriented convolutional research. Our code and dataset are publicly available to block, and disturbance suppression learning. Furthermore, we discover that high-bit-depth RAW images can better preserve richer scene information in low-light conditions compared 1 Introduction to typical camera sRGB outputs, thus supporting the use of RAW-input algorithms. "buried" by severe noise caused by limited photon count and They substantially improve the capability of models to learn noiseresisted features and thus boost the low-light segmentation accuracy appreciably. It is worth noting that they are modelagnostic and lightweight or even cost-free. It aggregates local features adaptively and suppresses the high-frequency disturbance caused by noise as well as keeping the details in deep features. The smoothoriented convolutional block enhances the ordinary convolutional layers by adding a smooth-oriented convolution branch. Relevant Moreover, we notice that the high bit-depth can be crucial low-light recognition/detection methods (Cui et al., 2021; for low-light conditions.
Denoising Diffusion Semantic Segmentation with Mask Prior Modeling
Lai, Zeqiang, Duan, Yuchen, Dai, Jifeng, Li, Ziheng, Fu, Ying, Li, Hongsheng, Qiao, Yu, Wang, Wenhai
The evolution of semantic segmentation has long been dominated by learning more discriminative image representations for classifying each pixel. Despite the prominent advancements, the priors of segmentation masks themselves, e.g., geometric and semantic constraints, are still under-explored. In this paper, we propose to ameliorate the semantic segmentation quality of existing discriminative approaches with a mask prior modeled by a recently-developed denoising diffusion generative model. Beginning with a unified architecture that adapts diffusion models for mask prior modeling, we focus this work on a specific instantiation with discrete diffusion and identify a variety of key design choices for its successful application. Our exploratory analysis revealed several important findings, including: (1) a simple integration of diffusion models into semantic segmentation is not sufficient, and a poorly-designed diffusion process might lead to degradation in segmentation performance; (2) during the training, the object to which noise is added is more important than the type of noise; (3) during the inference, the strict diffusion denoising scheme may not be essential and can be relaxed to a simpler scheme that even works better. We evaluate the proposed prior modeling with several off-the-shelf segmentors, and our experimental results on ADE20K and Cityscapes demonstrate that our approach could achieve competitively quantitative performance and more appealing visual quality.
HyperAttack: Multi-Gradient-Guided White-box Adversarial Structure Attack of Hypergraph Neural Networks
Hu, Chao, Yu, Ruishi, Zeng, Binqi, Zhan, Yu, Fu, Ying, Zhang, Quan, Liu, Rongkai, Shi, Heyuan
Hypergraph neural networks (HGNN) have shown superior performance in various deep learning tasks, leveraging the high-order representation ability to formulate complex correlations among data by connecting two or more nodes through hyperedge modeling. Despite the well-studied adversarial attacks on Graph Neural Networks (GNN), there is few study on adversarial attacks against HGNN, which leads to a threat to the safety of HGNN applications. In this paper, we introduce HyperAttack, the first white-box adversarial attack framework against hypergraph neural networks. HyperAttack conducts a white-box structure attack by perturbing hyperedge link status towards the target node with the guidance of both gradients and integrated gradients. We evaluate HyperAttack on the widely-used Cora and PubMed datasets and three hypergraph neural networks with typical hypergraph modeling techniques. Compared to state-of-the-art white-box structural attack methods for GNN, HyperAttack achieves a 10-20X improvement in time efficiency while also increasing attack success rates by 1.3%-3.7%. The results show that HyperAttack can achieve efficient adversarial attacks that balance effectiveness and time costs.
Mixed Attention Network for Hyperspectral Image Denoising
Lai, Zeqiang, Fu, Ying
Hyperspectral image denoising is unique for the highly similar and correlated spectral information that should be properly considered. However, existing methods show limitations in exploring the spectral correlations across different bands and feature interactions within each band. Besides, the low- and high-level features usually exhibit different importance for different spatial-spectral regions, which is not fully explored for current algorithms as well. In this paper, we present a Mixed Attention Network (MAN) that simultaneously considers the inter- and intra-spectral correlations as well as the interactions between low- and high-level spatial-spectral meaningful features. Specifically, we introduce a multi-head recurrent spectral attention that efficiently integrates the inter-spectral features across all the spectral bands. These features are further enhanced with a progressive spectral channel attention by exploring the intra-spectral relationships. Moreover, we propose an attentive skip-connection that adaptively controls the proportion of the low- and high-level spatial-spectral features from the encoder and decoder to better enhance the aggregated features. Extensive experiments show that our MAN outperforms existing state-of-the-art methods on simulated and real noise settings while maintaining a low cost of parameters and running time.