Zhao, Qi
KnowPath: Knowledge-enhanced Reasoning via LLM-generated Inference Paths over Knowledge Graphs
Zhao, Qi, Yang, Hongyu, Song, Qi, Yao, Xinwei, Li, Xiangyang
Large language models (LLMs) have demonstrated remarkable capabilities in various complex tasks, yet they still suffer from hallucinations. Introducing external knowledge, such as knowledge graph, can enhance the LLMs' ability to provide factual answers. LLMs have the ability to interactively explore knowledge graphs. However, most approaches have been affected by insufficient internal knowledge excavation in LLMs, limited generation of trustworthy knowledge reasoning paths, and a vague integration between internal and external knowledge. Therefore, we propose KnowPath, a knowledge-enhanced large model framework driven by the collaboration of internal and external knowledge. It relies on the internal knowledge of the LLM to guide the exploration of interpretable directed subgraphs in external knowledge graphs, better integrating the two knowledge sources for more accurate reasoning. Extensive experiments on multiple real-world datasets confirm the superiority of KnowPath.
Self-supervised Attribute-aware Dynamic Preference Ranking Alignment
Yang, Hongyu, Zhao, Qi, hu, Zhenhua, Li, Rui
Reinforcement Learning from Human Feedback and its variants excel in aligning with human intentions to generate helpful, harmless, and honest responses. However, most of them rely on costly human-annotated pairwise comparisons for supervised alignment, which is not suitable for list-level scenarios, such as community question answering. Additionally, human preferences are influenced by multiple intrinsic factors in responses, leading to decision-making inconsistencies. Therefore, we propose \textbf{Se}lf-supervised \textbf{A}ttribute-aware \textbf{d}ynamic \textbf{p}reference \textbf{ra}nking, called \shortname. \ It quantifies preference differences between responses based on Attribute-Perceptual Distance Factors (APDF) and dynamically determines the list-wise alignment order. Furthermore, it achieves fine-grained preference difference learning and enables precise alignment with the optimal one. We specifically constructed a challenging code preference dataset named StaCoCoQA, and introduced more cost-effective and scalable preference evaluation metrics: PrefHit and PrefRecall. Extensive experimental results show that SeAdpra exhibits superior performance and generalizability on both StaCoCoQA and preference datasets from eight popular domains.
Enhancing Graph Representation Learning with Localized Topological Features
Yan, Zuoyu, Zhao, Qi, Ye, Ze, Ma, Tengfei, Gao, Liangcai, Tang, Zhi, Wang, Yusu, Chen, Chao
Representation learning on graphs is a fundamental problem that can be crucial in various tasks. Graph neural networks, the dominant approach for graph representation learning, are limited in their representation power. Therefore, it can be beneficial to explicitly extract and incorporate high-order topological and geometric information into these models. In this paper, we propose a principled approach to extract the rich connectivity information of graphs based on the theory of persistent homology. Our method utilizes the topological features to enhance the representation learning of graph neural networks and achieve state-of-the-art performance on various node classification and link prediction benchmarks. We also explore the option of end-to-end learning of the topological features, i.e., treating topological computation as a differentiable operator during learning. Our theoretical analysis and empirical study provide insights and potential guidelines for employing topological features in graph learning tasks.
Detecting Defective Wafers Via Modular Networks
Zhang, Yifeng, Baker, Bryan, Chen, Shi, Zhang, Chao, Huang, Yu, Zhao, Qi, Bom, Sthitie
The growing availability of sensors within semiconductor manufacturing processes makes it feasible to detect defective wafers with data-driven models. Without directly measuring the quality of semiconductor devices, they capture the modalities between diverse sensor readings and can be used to predict key quality indicators (KQI, \textit{e.g.}, roughness, resistance) to detect faulty products, significantly reducing the capital and human cost in maintaining physical metrology steps. Nevertheless, existing models pay little attention to the correlations among different processes for diverse wafer products and commonly struggle with generalizability issues. To enable generic fault detection, in this work, we propose a modular network (MN) trained using time series stage-wise datasets that embodies the structure of the manufacturing process. It decomposes KQI prediction as a combination of stage modules to simulate compositional semiconductor manufacturing, universally enhancing faulty wafer detection among different wafer types and manufacturing processes. Extensive experiments demonstrate the usefulness of our approach, and shed light on how the compositional design provides an interpretable interface for more practical applications.
Is Your Text-to-Image Model Robust to Caption Noise?
Yu, Weichen, Yang, Ziyan, Lin, Shanchuan, Zhao, Qi, Wang, Jianyi, Gui, Liangke, Fredrikson, Matt, Jiang, Lu
In text-to-image (T2I) generation, a prevalent training technique involves utilizing Vision Language Models (VLMs) for image re-captioning. Even though VLMs are known to exhibit hallucination, generating descriptive content that deviates from the visual reality, the ramifications of such caption hallucinations on T2I generation performance remain under-explored. Through our empirical investigation, we first establish a comprehensive dataset comprising VLM-generated captions, and then systematically analyze how caption hallucination influences generation outcomes. Our findings reveal that (1) the disparities in caption quality persistently impact model outputs during fine-tuning. (2) VLMs confidence scores serve as reliable indicators for detecting and characterizing noise-related patterns in the data distribution. (3) even subtle variations in caption fidelity have significant effects on the quality of learned representations. These findings collectively emphasize the profound impact of caption quality on model performance and highlight the need for more sophisticated robust training algorithm in T2I. In response to these observations, we propose a approach leveraging VLM confidence score to mitigate caption noise, thereby enhancing the robustness of T2I models against hallucination in caption.
Holistic Adversarially Robust Pruning
Zhao, Qi, Wressnegger, Christian
Neural networks can be drastically shrunk in size by removing redundant parameters. While crucial for the deployment on resource-constraint hardware, oftentimes, compression comes with a severe drop in accuracy and lack of adversarial robustness. Despite recent advances, counteracting both aspects has only succeeded for moderate compression rates so far. We propose a novel method, HARP, that copes with aggressive pruning significantly better than prior work. For this, we consider the network holistically. We learn a global compression strategy that optimizes how many parameters (compression rate) and which parameters (scoring connections) to prune specific to each layer individually. Our method fine-tunes an existing model with dynamic regularization, that follows a step-wise incremental function balancing the different objectives. It starts by favoring robustness before shifting focus on reaching the target compression rate and only then handles the objectives equally. The learned compression strategies allow us to maintain the pre-trained model natural accuracy and its adversarial robustness for a reduction by 99% of the network original size. Moreover, we observe a crucial influence of non-uniform compression across layers.
Automated Metaheuristic Algorithm Design with Autoregressive Learning
Zhao, Qi, Liu, Tengfei, Yan, Bai, Duan, Qiqi, Yang, Jian, Shi, Yuhui
Automated design of metaheuristic algorithms offers an attractive avenue to reduce human effort and gain enhanced performance beyond human intuition. Current automated methods design algorithms within a fixed structure and operate from scratch. This poses a clear gap towards fully discovering potentials over the metaheuristic family and fertilizing from prior design experience. To bridge the gap, this paper proposes an autoregressive learning-based designer for automated design of metaheuristic algorithms. Our designer formulates metaheuristic algorithm design as a sequence generation task, and harnesses an autoregressive generative network to handle the task. This offers two advances. First, through autoregressive inference, the designer generates algorithms with diverse lengths and structures, enabling to fully discover potentials over the metaheuristic family. Second, prior design knowledge learned and accumulated in neurons of the designer can be retrieved for designing algorithms for future problems, paving the way to continual design of algorithms for open-ended problem-solving. Extensive experiments on numeral benchmarks and real-world problems reveal that the proposed designer generates algorithms that outperform all human-created baselines on 24 out of 25 test problems. The generated algorithms display various structures and behaviors, reasonably fitting for different problem-solving contexts. Code will be released after paper publication.
Learning to Predict Gradients for Semi-Supervised Continual Learning
Luo, Yan, Wong, Yongkang, Kankanhalli, Mohan, Zhao, Qi
A key challenge for machine intelligence is to learn new visual concepts without forgetting the previously acquired knowledge. Continual learning is aimed towards addressing this challenge. However, there is a gap between existing supervised continual learning and human-like intelligence, where human is able to learn from both labeled and unlabeled data. How unlabeled data affects learning and catastrophic forgetting in the continual learning process remains unknown. To explore these issues, we formulate a new semi-supervised continual learning method, which can be generically applied to existing continual learning models. Specifically, a novel gradient learner learns from labeled data to predict gradients on unlabeled data. Hence, the unlabeled data could fit into the supervised continual learning method. Different from conventional semi-supervised settings, we do not hypothesize that the underlying classes, which are associated to the unlabeled data, are known to the learning process. In other words, the unlabeled data could be very distinct from the labeled data. We evaluate the proposed method on mainstream continual learning, adversarial continual learning, and semi-supervised learning tasks. The proposed method achieves state-of-the-art performance on classification accuracy and backward transfer in the continual learning setting while achieving desired performance on classification accuracy in the semi-supervised learning setting. This implies that the unlabeled images can enhance the generalizability of continual learning models on the predictive ability on unseen data and significantly alleviate catastrophic forgetting. The code is available at \url{https://github.com/luoyan407/grad_prediction.git}.
SwitchTab: Switched Autoencoders Are Effective Tabular Learners
Wu, Jing, Chen, Suiyao, Zhao, Qi, Sergazinov, Renat, Li, Chen, Liu, Shengjie, Zhao, Chongchao, Xie, Tianpei, Guo, Hanqing, Ji, Cheng, Cociorva, Daniel, Brunzel, Hakan
Self-supervised representation learning methods have achieved significant success in computer vision and natural language processing, where data samples exhibit explicit spatial or semantic dependencies. However, applying these methods to tabular data is challenging due to the less pronounced dependencies among data samples. In this paper, we address this limitation by introducing SwitchTab, a novel self-supervised method specifically designed to capture latent dependencies in tabular data. SwitchTab leverages an asymmetric encoder-decoder framework to decouple mutual and salient features among data pairs, resulting in more representative embeddings. These embeddings, in turn, contribute to better decision boundaries and lead to improved results in downstream tasks. To validate the effectiveness of SwitchTab, we conduct extensive experiments across various domains involving tabular data. The results showcase superior performance in end-to-end prediction tasks with fine-tuning. Moreover, we demonstrate that pre-trained salient embeddings can be utilized as plug-and-play features to enhance the performance of various traditional classification methods (e.g., Logistic Regression, XGBoost, etc.). Lastly, we highlight the capability of SwitchTab to create explainable representations through visualization of decoupled mutual and salient features in the latent space.
QwenGrasp: A Usage of Large Vision-Language Model for Target-Oriented Grasping
Chen, Xinyu, Yang, Jian, He, Zonghan, Yang, Haobin, Zhao, Qi, Shi, Yuhui
Target-oriented grasping in unstructured scenes with language control is essential for intelligent robot arm grasping. The ability for the robot arm to understand the human language and execute corresponding grasping actions is a pivotal challenge. In this paper, we propose a combination model called QwenGrasp which combines a large vision-language model with a 6-DoF grasp neural network. QwenGrasp is able to conduct a 6-DoF grasping task on the target object with textual language instruction. We design a complete experiment with six-dimension instructions to test the QwenGrasp when facing with different cases. The results show that QwenGrasp has a superior ability to comprehend the human intention. Even in the face of vague instructions with descriptive words or instructions with direction information, the target object can be grasped accurately. When QwenGrasp accepts the instruction which is not feasible or not relevant to the grasping task, our approach has the ability to suspend the task execution and provide a proper feedback to humans, improving the safety. In conclusion, with the great power of large vision-language model, QwenGrasp can be applied in the open language environment to conduct the target-oriented grasping task with freely input instructions.