Li, Qi
How to optimize K-means?
Li, Qi
Center-based clustering algorithms (e.g., K-means) are popular for clustering tasks, but they usually struggle to achieve high accuracy on complex datasets. We believe the main reason is that traditional center-based clustering algorithms identify only one clustering center in each cluster. Once the distribution of the dataset is complex, a single clustering center cannot strongly represent distant objects within the cluster. How to optimize the existing center-based clustering algorithms will be valuable research. In this paper, we propose a general optimization method called ECAC, and it can optimize different center-based clustering algorithms. ECAC is independent of the clustering principle and is embedded as a component between the center process and the category assignment process of center-based clustering algorithms. Specifically, ECAC identifies several extended-centers for each clustering center. The extended-centers will act as relays to expand the representative capability of the clustering center in the complex cluster, thus improving the accuracy of center-based clustering algorithms. We conducted numerous experiments to verify the robustness and effectiveness of ECAC. ECAC is robust to diverse datasets and diverse clustering centers. After ECAC optimization, the accuracy (NMI as well as RI) of center-based clustering algorithms improves by an average of 33.4% and 64.1%, respectively, and even K-means accurately identifies complex-shaped clusters.
NaFM: Pre-training a Foundation Model for Small-Molecule Natural Products
Ding, Yuheng, Wang, Yusong, Qiang, Bo, Yu, Jie, Li, Qi, Zhou, Yiran, Liu, Zhenmin
Natural products, as metabolites from microorganisms, animals, or plants, exhibit diverse biological activities, making them crucial for drug discovery. Nowadays, existing deep learning methods for natural products research primarily rely on supervised learning approaches designed for specific downstream tasks. However, such one-model-for-a-task paradigm often lacks generalizability and leaves significant room for performance improvement. Additionally, existing molecular characterization methods are not well-suited for the unique tasks associated with natural products. To address these limitations, we have pre-trained a foundation model for natural products based on their unique properties. Our approach employs a novel pretraining strategy that is especially tailored to natural products. By incorporating contrastive learning and masked graph learning objectives, we emphasize evolutional information from molecular scaffolds while capturing side-chain information. Our framework achieves state-of-the-art (SOTA) results in various downstream tasks related to natural product mining and drug discovery. We first compare taxonomy classification with synthesized molecule-focused baselines to demonstrate that current models are inadequate for understanding natural synthesis. Furthermore, by diving into a fine-grained analysis at both the gene and microbial levels, NaFM demonstrates the ability to capture evolutionary information. Eventually, our method is experimented with virtual screening, illustrating informative natural product representations that can lead to more effective identification of potential drug candidates.
VisualSimpleQA: A Benchmark for Decoupled Evaluation of Large Vision-Language Models in Fact-Seeking Question Answering
Wang, Yanling, Zhao, Yihan, Chen, Xiaodong, Guo, Shasha, Liu, Lixin, Li, Haoyang, Xiao, Yong, Zhang, Jing, Li, Qi, Xu, Ke
Large vision-language models (LVLMs) have demonstrated remarkable achievements, yet the generation of non-factual responses remains prevalent in fact-seeking question answering (QA). Current multimodal fact-seeking benchmarks primarily focus on comparing model outputs to ground truth answers, providing limited insights into the performance of modality-specific modules. To bridge this gap, we introduce VisualSimpleQA, a multimodal fact-seeking benchmark with two key features. First, it enables streamlined and decoupled evaluation of LVLMs in visual and linguistic modalities. Second, it incorporates well-defined difficulty criteria to guide human annotation and facilitates the extraction of a challenging subset, VisualSimpleQA-hard. Experiments on 15 LVLMs show that even state-of-the-art models such as GPT-4o achieve merely 60%+ correctness in multimodal fact-seeking QA on VisualSimpleQA and 30%+ on VisualSimpleQA-hard. Furthermore, the decoupled evaluation across these models highlights substantial opportunities for improvement in both visual and linguistic modules. The dataset is available at https://huggingface.co/datasets/WYLing/VisualSimpleQA.
Multi-Level Collaboration in Model Merging
Li, Qi, Yu, Runpeng, Wang, Xinchao
Parameter-level model merging is an emerging paradigm in multi-task learning with significant promise. Previous research has explored its connections with prediction-level model ensembling-commonly viewed as the upper bound for merging-to reveal the potential of achieving performance consistency between the two. However, this observation relies on certain preconditions, such as being limited to two models, using ViT-based models, and all models are fine-tuned from the same pre-trained checkpoint. To further understand the intrinsic connections between model merging and model ensembling, this paper explores an interesting possibility: If these restrictions are removed, can performance consistency still be achieved between merging and ensembling? To answer this question, we first theoretically establish a performance correlation between merging and ensembling. We find that even when previous restrictions are not met, there is still a way for model merging to attain a near-identical and superior performance similar to that of ensembling. To verify whether our findings are practical, we introduce a validation framework termed Neural Ligand (NeuLig). The learning process of NeuLig is meticulously designed with a specialized loss function supported by theoretical foundations. Experimental results demonstrate the robust resilience of NeuLig in terms of both model scale and the number of collaborating models. For instance, for the case involving 5 CLIP-ViT-B/32 models, parameter-level merging achieves the same performance as prediction-level ensembling (merging: 95.44% vs. ensembling: 95.46%).
GraphBridge: Towards Arbitrary Transfer Learning in GNNs
Ju, Li, Yang, Xingyi, Li, Qi, Wang, Xinchao
Graph neural networks (GNNs) are conventionally trained on a per-domain, per-task basis. It creates a significant barrier in transferring the acquired knowledge to different, heterogeneous data setups. This paper introduces GraphBridge, a novel framework to enable knowledge transfer across disparate tasks and domains in GNNs, circumventing the need for modifications to task configurations or graph structures. Specifically, GraphBridge allows for the augmentation of any pre-trained GNN with prediction heads and a bridging network that connects the input to the output layer. This architecture not only preserves the intrinsic knowledge of the original model but also supports outputs of arbitrary dimensions. To mitigate the negative transfer problem, GraphBridge merges the source model with a concurrently trained model, thereby reducing the source bias when applied to the target domain. Our method is thoroughly evaluated across diverse transfer learning scenarios, including Graph2Graph, Node2Node, Graph2Node, and graph2point-cloud. Empirical validation, conducted over 16 datasets representative of these scenarios, confirms the framework's capacity for task- and domain-agnostic transfer learning within graph-like data, marking a significant advancement in the field of GNNs. Code is available at https://github.com/jujulili888/GraphBridge.
Behind the Tip of Efficiency: Uncovering the Submerged Threats of Jailbreak Attacks in Small Language Models
Yi, Sibo, Cong, Tianshuo, He, Xinlei, Li, Qi, Song, Jiaxing
Small language models (SLMs) have become increasingly prominent in the deployment on edge devices due to their high efficiency and low computational cost. While researchers continue to advance the capabilities of SLMs through innovative training strategies and model compression techniques, the security risks of SLMs have received considerably less attention compared to large language models (LLMs).To fill this gap, we provide a comprehensive empirical study to evaluate the security performance of 13 state-of-the-art SLMs under various jailbreak attacks. Our experiments demonstrate that most SLMs are quite susceptible to existing jailbreak attacks, while some of them are even vulnerable to direct harmful prompts.To address the safety concerns, we evaluate several representative defense methods and demonstrate their effectiveness in enhancing the security of SLMs. We further analyze the potential security degradation caused by different SLM techniques including architecture compression, quantization, knowledge distillation, and so on. We expect that our research can highlight the security challenges of SLMs and provide valuable insights to future work in developing more robust and secure SLMs.
from Benign import Toxic: Jailbreaking the Language Model via Adversarial Metaphors
Yan, Yu, Sun, Sheng, Duan, Zenghao, Liu, Teli, Liu, Min, Yin, Zhiyi, Li, Qi, Lei, Jiangyu
Current studies have exposed the risk of Large Language Models (LLMs) generating harmful content by jailbreak attacks. However, they overlook that the direct generation of harmful content from scratch is more difficult than inducing LLM to calibrate benign content into harmful forms. In our study, we introduce a novel attack framework that exploits AdVersArial meTAphoR (AVATAR) to induce the LLM to calibrate malicious metaphors for jailbreaking. Specifically, to answer harmful queries, AVATAR adaptively identifies a set of benign but logically related metaphors as the initial seed. Then, driven by these metaphors, the target LLM is induced to reason and calibrate about the metaphorical content, thus jailbroken by either directly outputting harmful responses or calibrating residuals between metaphorical and professional harmful content. Experimental results demonstrate that AVATAR can effectively and transferable jailbreak LLMs and achieve a state-of-the-art attack success rate across multiple advanced LLMs.
Crime Forecasting: A Spatio-temporal Analysis with Deep Learning Models
Mao, Li, Du, Wei, Wen, Shuo, Li, Qi, Zhang, Tong, Zhong, Wei
This study uses deep-learning models to predict city partition crime counts on specific days. It helps police enhance surveillance, gather intelligence, and proactively prevent crimes. We formulate crime count prediction as a spatiotemporal sequence challenge, where both input data and prediction targets are spatiotemporal sequences. In order to improve the accuracy of crime forecasting, we introduce a new model that combines Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks. We conducted a comparative analysis to access the effects of various data sequences, including raw and binned data, on the prediction errors of four deep learning forecasting models. Directly inputting raw crime data into the forecasting model causes high prediction errors, making the model unsuitable for real - world use. The findings indicate that the proposed CNN-LSTM model achieves optimal performance when crime data is categorized into 10 or 5 groups. Data binning can enhance forecasting model performance, but poorly defined intervals may reduce map granularity. Compared to dividing into 5 bins, binning into 10 intervals strikes an optimal balance, preserving data characteristics and surpassing raw data in predictive modelling efficacy.
Feature Explosion: a generic optimization strategy for outlier detection algorithms
Li, Qi
Outlier detection tasks aim at discovering potential issues or opportunities and are widely used in cybersecurity, financial security, industrial inspection, etc. To date, thousands of outlier detection algorithms have been proposed. Clearly, in real-world scenarios, such a large number of algorithms is unnecessary. In other words, a large number of outlier detection algorithms are redundant. We believe the root cause of this redundancy lies in the current highly customized (i.e., non-generic) optimization strategies. Specifically, when researchers seek to improve the performance of existing outlier detection algorithms, they have to design separate optimized versions tailored to the principles of each algorithm, leading to an ever-growing number of outlier detection algorithms. To address this issue, in this paper, we introduce the explosion from physics into the outlier detection task and propose a generic optimization strategy based on feature explosion, called OSD (Optimization Strategy for outlier Detection algorithms). In the future, when improving the performance of existing outlier detection algorithms, it will be sufficient to invoke the OSD plugin without the need to design customized optimized versions for them. We compared the performances of 14 outlier detection algorithms on 24 datasets before and after invoking the OSD plugin. The experimental results show that the performances of all outlier detection algorithms are improved on almost all datasets. In terms of average accuracy, OSD make these outlier detection algorithms improve by 15% (AUC), 63.7% (AP).
A Theoretical Framework for Data Efficient Multi-Source Transfer Learning Based on Cram\'er-Rao Bound
Zhang, Qingyue, Fu, Haohao, Huang, Guanbo, Liang, Yaoyuan, Chu, Chang, Peng, Tianren, Wu, Yanru, Li, Qi, Li, Yang, Huang, Shao-Lun
Multi-source transfer learning provides an effective solution to data scarcity in real-world supervised learning scenarios by leveraging multiple source tasks. In this field, existing works typically use all available samples from sources in training, which constrains their training efficiency and may lead to suboptimal results. To address this, we propose a theoretical framework that answers the question: what is the optimal quantity of source samples needed from each source task to jointly train the target model? Specifically, we introduce a generalization error measure that aligns with cross-entropy loss, and minimize it based on the Cram\'er-Rao Bound to determine the optimal transfer quantity for each source task. Additionally, we develop an architecture-agnostic and data-efficient algorithm OTQMS to implement our theoretical results for training deep multi-source transfer learning models. Experimental studies on diverse architectures and two real-world benchmark datasets show that our proposed algorithm significantly outperforms state-of-the-art approaches in both accuracy and data efficiency. The code and supplementary materials are available in https://anonymous.4open.science/r/Materials.