Goto

Collaborating Authors

 Wang, Zhen


IterIS: Iterative Inference-Solving Alignment for LoRA Merging

arXiv.org Artificial Intelligence

Low-rank adaptations (LoRA) are widely used to fine-tune large models across various domains for specific downstream tasks. While task-specific LoRAs are often available, concerns about data privacy and intellectual property can restrict access to training data, limiting the acquisition of a multi-task model through gradient-based training. In response, LoRA merging presents an effective solution by combining multiple LoRAs into a unified adapter while maintaining data privacy. Prior works on LoRA merging primarily frame it as an optimization problem, yet these approaches face several limitations, including the rough assumption about input features utilized in optimization, massive sample requirements, and the unbalanced optimization objective. These limitations can significantly degrade performance. To address these, we propose a novel optimization-based method, named IterIS: 1) We formulate LoRA merging as an advanced optimization problem to mitigate the rough assumption. Additionally, we employ an iterative inference-solving framework in our algorithm. It can progressively refine the optimization objective for improved performance. 2) We introduce an efficient regularization term to reduce the need for massive sample requirements (requiring only 1-5% of the unlabeled samples compared to prior methods). 3) We utilize adaptive weights in the optimization objective to mitigate potential unbalances in LoRA merging process. Our method demonstrates significant improvements over multiple baselines and state-of-the-art methods in composing tasks for text-to-image diffusion, vision-language models, and large language models. Furthermore, our layer-wise algorithm can achieve convergence with minimal steps, ensuring efficiency in both memory and computation.


Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models

arXiv.org Artificial Intelligence

Aligning Large Language Models (LLMs) traditionally relies on costly training and human preference annotations. Self-alignment seeks to reduce these expenses by enabling models to align themselves. To further lower costs and achieve alignment without any expensive tuning or annotations, we introduce a new tuning-free approach for self-alignment, Dynamic Rewarding with Prompt Optimization (DRPO). Our approach leverages a search-based optimization framework that allows LLMs to iteratively self-improve and craft the optimal alignment instructions, all without additional training or human intervention. The core of DRPO is a dynamic rewarding mechanism, which identifies and rectifies model-specific alignment weaknesses, allowing LLMs to adapt efficiently to diverse alignment challenges. Empirical evaluations on eight recent LLMs, both open- and closed-sourced, demonstrate that DRPO significantly enhances alignment performance, with base models outperforming their SFT/RLHF-tuned counterparts. Moreover, the prompts automatically optimized by DRPO surpass those curated by human experts, further validating the effectiveness of our approach. Our findings highlight the great potential of current LLMs to achieve adaptive self-alignment through inference-time optimization, complementing tuning-based alignment methods.


Customized Subgraph Selection and Encoding for Drug-drug Interaction Prediction

arXiv.org Artificial Intelligence

Subgraph-based methods have proven to be effective and interpretable in predicting drug-drug interactions (DDIs), which are essential for medical practice and drug development. Subgraph selection and encoding are critical stages in these methods, yet customizing these components remains underexplored due to the high cost of manual adjustments. In this study, inspired by the success of neural architecture search (NAS), we propose a method to search for data-specific components within subgraph-based frameworks. Specifically, we introduce extensive subgraph selection and encoding spaces that account for the diverse contexts of drug interactions in DDI prediction. To address the challenge of large search spaces and high sampling costs, we design a relaxation mechanism that uses an approximation strategy to efficiently explore optimal subgraph configurations. This approach allows for robust exploration of the search space. Extensive experiments demonstrate the effectiveness and superiority of the proposed method, with the discovered subgraphs and encoding functions highlighting the model's adaptability.


Prediction by Machine Learning Analysis of Genomic Data Phenotypic Frost Tolerance in Perccottus glenii

arXiv.org Artificial Intelligence

Analysis of the genome sequence of Perccottus glenii, the only fish known to possess freeze tolerance, holds significant importance for understanding how organisms adapt to extreme environments, Traditional biological analysis methods are time-consuming and have limited accuracy, To address these issues, we will employ machine learning techniques to analyze the gene sequences of Perccottus glenii, with Neodontobutis hainanens as a comparative group, Firstly, we have proposed five gene sequence vectorization methods and a method for handling ultra-long gene sequences, We conducted a comparative study on the three vectorization methods: ordinal encoding, One-Hot encoding, and K-mer encoding, to identify the optimal encoding method, Secondly, we constructed four classification models: Random Forest, LightGBM, XGBoost, and Decision Tree, The dataset used by these classification models was extracted from the National Center for Biotechnology Information database, and we vectorized the sequence matrices using the optimal encoding method, K-mer, The Random Forest model, which is the optimal model, achieved a classification accuracy of up to 99, 98 , Lastly, we utilized SHAP values to conduct an interpretable analysis of the optimal classification model, Through ten-fold cross-validation and the AUC metric, we identified the top 10 features that contribute the most to the model's classification accuracy, This demonstrates that machine learning methods can effectively replace traditional manual analysis in identifying genes associated with the freeze tolerance phenotype in Perccottus glenii.


Large Language Models Overcome the Machine Penalty When Acting Fairly but Not When Acting Selfishly or Altruistically

arXiv.org Artificial Intelligence

In social dilemmas where the collective and self-interests are at odds, people typically cooperate less with machines than with fellow humans, a phenomenon termed the machine penalty. Overcoming this penalty is critical for successful human-machine collectives, yet current solutions often involve ethically-questionable tactics, like concealing machines' non-human nature. In this study, with 1,152 participants, we explore the possibility of closing this research question by using Large Language Models (LLMs), in scenarios where communication is possible between interacting parties. We design three types of LLMs: (i) Cooperative, aiming to assist its human associate; (ii) Selfish, focusing solely on maximizing its self-interest; and (iii) Fair, balancing its own and collective interest, while slightly prioritizing self-interest. Our findings reveal that, when interacting with humans, fair LLMs are able to induce cooperation levels comparable to those observed in human-human interactions, even when their non-human nature is fully disclosed. In contrast, selfish and cooperative LLMs fail to achieve this goal. Post-experiment analysis shows that all three types of LLMs succeed in forming mutual cooperation agreements with humans, yet only fair LLMs, which occasionally break their promises, are capable of instilling a perception among humans that cooperating with them is the social norm, and eliciting positive views on their trustworthiness, mindfulness, intelligence, and communication quality. Our findings suggest that for effective human-machine cooperation, bot manufacturers should avoid designing machines with mere rational decision-making or a sole focus on assisting humans. Instead, they should design machines capable of judiciously balancing their own interest and the interest of humans.


Combing Text-based and Drag-based Editing for Precise and Flexible Image Editing

arXiv.org Artificial Intelligence

Precise and flexible image editing remains a fundamental challenge in computer vision. Based on the modified areas, most editing methods can be divided into two main types: global editing and local editing. In this paper, we choose the two most common editing approaches (ie text-based editing and drag-based editing) and analyze their drawbacks. Specifically, text-based methods often fail to describe the desired modifications precisely, while drag-based methods suffer from ambiguity. To address these issues, we proposed \textbf{CLIPDrag}, a novel image editing method that is the first to combine text and drag signals for precise and ambiguity-free manipulations on diffusion models. To fully leverage these two signals, we treat text signals as global guidance and drag points as local information. Then we introduce a novel global-local motion supervision method to integrate text signals into existing drag-based methods by adapting a pre-trained language-vision model like CLIP. Furthermore, we also address the problem of slow convergence in CLIPDrag by presenting a fast point-tracking method that enforces drag points moving toward correct directions. Extensive experiments demonstrate that CLIPDrag outperforms existing single drag-based methods or text-based methods.


Task-agnostic Pre-training and Task-guided Fine-tuning for Versatile Diffusion Planner

arXiv.org Artificial Intelligence

Diffusion models have demonstrated their capabilities in modeling trajectories of multi-tasks. However, existing multi-task planners or policies typically rely on task-specific demonstrations via multi-task imitation, or require task-specific reward labels to facilitate policy optimization via Reinforcement Learning (RL). To address these challenges, we aim to develop a versatile diffusion planner that can leverage large-scale inferior data that contains task-agnostic sub-optimal trajectories, with the ability to fast adapt to specific tasks. In this paper, we propose \textbf{SODP}, a two-stage framework that leverages \textbf{S}ub-\textbf{O}ptimal data to learn a \textbf{D}iffusion \textbf{P}lanner, which is generalizable for various downstream tasks. Specifically, in the pre-training stage, we train a foundation diffusion planner that extracts general planning capabilities by modeling the versatile distribution of multi-task trajectories, which can be sub-optimal and has wide data coverage. Then for downstream tasks, we adopt RL-based fine-tuning with task-specific rewards to fast refine the diffusion planner, which aims to generate action sequences with higher task-specific returns. Experimental results from multi-task domains including Meta-World and Adroit demonstrate that SODP outperforms state-of-the-art methods with only a small amount of data for reward-guided fine-tuning.


Pavement Fatigue Crack Detection and Severity Classification Based on Convolutional Neural Network

arXiv.org Artificial Intelligence

Due to the varying intensity of pavement cracks, the complexity of topological structure, and the noise of texture background, image classification for asphalt pavement cracking has proven to be a challenging problem. Fatigue cracking, also known as alligator cracking, is one of the common distresses of asphalt pavement. It is thus important to detect and monitor the condition of alligator cracking on roadway pavements. Most research in this area has typically focused on pixel-level detection of cracking using limited datasets. A novel deep convolutional neural network that can achieve two objectives is proposed. The first objective of the proposed neural network is to classify presence of fatigue cracking based on pavement surface images. The second objective is to classify the fatigue cracking severity level based on the Distress Identification Manual (DIM) standard. In this paper, a databank of 4484 high-resolution pavement surface images is established in which images are taken locally in the Town of Blacksburg, Virginia, USA. In the data pre-preparation, over 4000 images are labeled into 4 categories manually according to DIM standards. A four-layer convolutional neural network model is then built to achieve the goal of classification of images by pavement crack severity category. The trained model reached the highest accuracy among all existing methods. After only 30 epochs of training, the model achieved a crack existence classification accuracy of 96.23% and a severity level classification accuracy of 96.74%. After 20 epochs of training, the model achieved a pavement marking presence classification accuracy of 97.64%.


A Survey on Deep Active Learning: Recent Advances and New Frontiers

arXiv.org Artificial Intelligence

Active learning seeks to achieve strong performance with fewer training samples. It does this by iteratively asking an oracle to label new selected samples in a human-in-the-loop manner. This technique has gained increasing popularity due to its broad applicability, yet its survey papers, especially for deep learning-based active learning (DAL), remain scarce. Therefore, we conduct an advanced and comprehensive survey on DAL. We first introduce reviewed paper collection and filtering. Second, we formally define the DAL task and summarize the most influential baselines and widely used datasets. Third, we systematically provide a taxonomy of DAL methods from five perspectives, including annotation types, query strategies, deep model architectures, learning paradigms, and training processes, and objectively analyze their strengths and weaknesses. Then, we comprehensively summarize main applications of DAL in Natural Language Processing (NLP), Computer Vision (CV), and Data Mining (DM), etc. Finally, we discuss challenges and perspectives after a detailed analysis of current studies. This work aims to serve as a useful and quick guide for researchers in overcoming difficulties in DAL. We hope that this survey will spur further progress in this burgeoning field.


Foundations and Frontiers of Graph Learning Theory

arXiv.org Artificial Intelligence

Recent advancements in graph learning have revolutionized the way to understand and analyze data with complex structures. Notably, Graph Neural Networks (GNNs), i.e. neural network architectures designed for learning graph representations, have become a popular paradigm. With these models being usually characterized by intuition-driven design or highly intricate components, placing them within the theoretical analysis framework to distill the core concepts, helps understand the key principles that drive the functionality better and guide further development. Given this surge in interest, this article provides a comprehensive summary of the theoretical foundations and breakthroughs concerning the approximation and learning behaviors intrinsic to prevalent graph learning models. Encompassing discussions on fundamental aspects such as expressiveness power, generalization, optimization, and unique phenomena such as over-smoothing and over-squashing, this piece delves into the theoretical foundations and frontier driving the evolution of graph learning. In addition, this article also presents several challenges and further initiates discussions on possible solutions.