Xia, Shuyin
GBFRS: Robust Fuzzy Rough Sets via Granular-ball Computing
Xia, Shuyin, Lian, Xiaoyu, Sang, Binbin, Wang, Guoyin, Gao, Xinbo
Fuzzy rough set theory is effective for processing datasets with complex attributes, supported by a solid mathematical foundation and closely linked to kernel methods in machine learning. Attribute reduction algorithms and classifiers based on fuzzy rough set theory exhibit promising performance in the analysis of high-dimensional multivariate complex data. However, most existing models operate at the finest granularity, rendering them inefficient and sensitive to noise, especially for high-dimensional big data. Thus, enhancing the robustness of fuzzy rough set models is crucial for effective feature selection. Muiti-garanularty granular-ball computing, a recent development, uses granular-balls of different sizes to adaptively represent and cover the sample space, performing learning based on these granular-balls. This paper proposes integrating multi-granularity granular-ball computing into fuzzy rough set theory, using granular-balls to replace sample points. The coarse-grained characteristics of granular-balls make the model more robust. Additionally, we propose a new method for generating granular-balls, scalable to the entire supervised method based on granular-ball computing. A forward search algorithm is used to select feature sequences by defining the correlation between features and categories through dependence functions. Experiments demonstrate the proposed model's effectiveness and superiority over baseline methods.
A New Perspective on Privacy Protection in Federated Learning with Granular-Ball Computing
Lai, Guannan, Feng, Yihui, Yang, Xin, Deng, Xiaoyu, Yu, Hao, Xia, Shuyin, Wang, Guoyin, Li, Tianrui
Federated Learning (FL) facilitates collaborative model training while prioritizing privacy by avoiding direct data sharing. However, most existing articles attempt to address challenges within the model's internal parameters and corresponding outputs, while neglecting to solve them at the input level. To address this gap, we propose a novel framework called Granular-Ball Federated Learning (GrBFL) for image classification. GrBFL diverges from traditional methods that rely on the finest-grained input data. Instead, it segments images into multiple regions with optimal coarse granularity, which are then reconstructed into a graph structure. We designed a two-dimensional binary search segmentation algorithm based on variance constraints for GrBFL, which effectively removes redundant information while preserving key representative features. Extensive theoretical analysis and experiments demonstrate that GrBFL not only safeguards privacy and enhances efficiency but also maintains robust utility, consistently outperforming other state-of-the-art FL methods. The code is available at https://github.com/AIGNLAI/GrBFL.
Graph Coarsening via Supervised Granular-Ball for Scalable Graph Neural Network Training
Xia, Shuyin, Ma, Xinjun, Liu, Zhiyuan, Liu, Cheng, Zhao, Sen, Wang, Guoyin
Graph Neural Networks (GNNs) have demonstrated significant achievements in processing graph data, yet scalability remains a substantial challenge. To address this, numerous graph coarsening methods have been developed. However, most existing coarsening methods are training-dependent, leading to lower efficiency, and they all require a predefined coarsening rate, lacking an adaptive approach. In this paper, we employ granular-ball computing to effectively compress graph data. We construct a coarsened graph network by iteratively splitting the graph into granular-balls based on a purity threshold and using these granular-balls as super vertices. This granulation process significantly reduces the size of the original graph, thereby greatly enhancing the training efficiency and scalability of GNNs. Additionally, our algorithm can adaptively perform splitting without requiring a predefined coarsening rate. Experimental results demonstrate that our method achieves accuracy comparable to training on the original graph. Noise injection experiments further indicate that our method exhibits robust performance. Moreover, our approach can reduce the graph size by up to 20 times without compromising test accuracy, substantially enhancing the scalability of GNNs.
Multi-Granularity Open Intent Classification via Adaptive Granular-Ball Decision Boundary
Li, Yanhua, Ouyang, Xiaocao, Pan, Chaofan, Zhang, Jie, Zhao, Sen, Xia, Shuyin, Yang, Xin, Wang, Guoyin, Li, Tianrui
Open intent classification is critical for the development of dialogue systems, aiming to accurately classify known intents into their corresponding classes while identifying unknown intents. Prior boundary-based methods assumed known intents fit within compact spherical regions, focusing on coarse-grained representation and precise spherical decision boundaries. However, these assumptions are often violated in practical scenarios, making it difficult to distinguish known intent classes from unknowns using a single spherical boundary. To tackle these issues, we propose a Multi-granularity Open intent classification method via adaptive Granular-Ball decision boundary (MOGB). Our MOGB method consists of two modules: representation learning and decision boundary acquiring. To effectively represent the intent distribution, we design a hierarchical representation learning method. This involves iteratively alternating between adaptive granular-ball clustering and nearest sub-centroid classification to capture fine-grained semantic structures within known intent classes. Furthermore, multi-granularity decision boundaries are constructed for open intent classification by employing granular-balls with varying centroids and radii. Extensive experiments conducted on three public datasets demonstrate the effectiveness of our proposed method.
EvoSampling: A Granular Ball-based Evolutionary Hybrid Sampling with Knowledge Transfer for Imbalanced Learning
Pei, Wenbin, Dai, Ruohao, Xue, Bing, Zhang, Mengjie, Zhang, Qiang, Cheung, Yiu-Ming, Xia, Shuyin
Class imbalance would lead to biased classifiers that favor the majority class and disadvantage the minority class. Unfortunately, from a practical perspective, the minority class is of importance in many real-life applications. Hybrid sampling methods address this by oversampling the minority class to increase the number of its instances, followed by undersampling to remove low-quality instances. However, most existing sampling methods face difficulties in generating diverse high-quality instances and often fail to remove noise or low-quality instances on a larger scale effectively. This paper therefore proposes an evolutionary multi-granularity hybrid sampling method, called EvoSampling. During the oversampling process, genetic programming (GP) is used with multi-task learning to effectively and efficiently generate diverse high-quality instances. During the undersampling process, we develop a granular ball-based undersampling method that removes noise in a multi-granular fashion, thereby enhancing data quality. Experiments on 20 imbalanced datasets demonstrate that EvoSampling effectively enhances the performance of various classification algorithms by providing better datasets than existing sampling methods. Besides, ablation studies further indicate that allowing knowledge transfer accelerates the GP's evolutionary learning process.
HumanVLM: Foundation for Human-Scene Vision-Language Model
Dai, Dawei, Long, Xu, Yutang, Li, Yuanhui, Zhang, Xia, Shuyin
Human-scene vision-language tasks are increasingly prevalent in diverse social applications, yet recent advancements predominantly rely on models specifically tailored to individual tasks. Emerging research indicates that large vision-language models (VLMs) can enhance performance across various downstream vision-language understanding tasks. However, general-domain models often underperform in specialized fields. This study introduces a domain-specific Large Vision-Language Model, Human-Scene Vision-Language Model (HumanVLM), designed to provide a foundation for human-scene Vision-Language tasks. Specifically, (1) we create a large-scale human-scene multimodal image-text dataset (HumanCaption-10M) sourced from the Internet to facilitate domain-specific alignment; (2) develop a captioning approach for human-centered images, capturing human faces, bodies, and backgrounds, and construct a high-quality Human-Scene image-text dataset (HumanCaptionHQ, about 311k pairs) that contain as much detailed information as possible about human; (3) Using HumanCaption-10M and HumanCaptionHQ, we train a HumanVLM. In the experiments, we then evaluate our HumanVLM across varous downstream tasks, where it demonstrates superior overall performance among multimodal models of comparable scale, particularly excelling in human-related tasks and significantly outperforming similar models, including Qwen2VL and ChatGPT-4o. HumanVLM, alongside the data introduced, will stimulate the research in human-around fields.
GBCT: An Efficient and Adaptive Granular-Ball Clustering Algorithm for Complex Data
Xia, Shuyin, Shi, Bolun, Wang, Yifan, Xie, Jiang, Wang, Guoyin, Gao, Xinbo
Traditional clustering algorithms often focus on the most fine-grained information and achieve clustering by calculating the distance between each pair of data points or implementing other calculations based on points. This way is not inconsistent with the cognitive mechanism of "global precedence" in human brain, resulting in those methods' bad performance in efficiency, generalization ability and robustness. To address this problem, we propose a new clustering algorithm called granular-ball clustering (GBCT) via granular-ball computing. Firstly, GBCT generates a smaller number of granular-balls to represent the original data, and forms clusters according to the relationship between granular-balls, instead of the traditional point relationship. At the same time, its coarse-grained characteristics are not susceptible to noise, and the algorithm is efficient and robust; besides, as granular-balls can fit various complex data, GBCT performs much better in non-spherical data sets than other traditional clustering methods. The completely new coarse granularity representation method of GBCT and cluster formation mode can also used to improve other traditional methods.
A robust three-way classifier with shadowed granular-balls based on justifiable granularity
Yang, Jie, Xiaodiao, Lingyun, Wang, Guoyin, Pedrycz, Witold, Xia, Shuyin, Zhang, Qinghua, Wu, Di
The granular-ball (GB)-based classifier introduced by Xia, exhibits adaptability in creating coarse-grained information granules for input, thereby enhancing its generality and flexibility. Nevertheless, the current GB-based classifiers rigidly assign a specific class label to each data instance and lacks of the necessary strategies to address uncertain instances. These far-fetched certain classification approachs toward uncertain instances may suffer considerable risks. To solve this problem, we construct a robust three-way classifier with shadowed GBs for uncertain data. Firstly, combine with information entropy, we propose an enhanced GB generation method with the principle of justifiable granularity. Subsequently, based on minimum uncertainty, a shadowed mapping is utilized to partition a GB into Core region, Important region and Unessential region. Based on the constructed shadowed GBs, we establish a three-way classifier to categorize data instances into certain classes and uncertain case. Finally, extensive comparative experiments are conducted with 2 three-way classifiers, 3 state-of-the-art GB-based classifiers, and 3 classical machine learning classifiers on 12 public benchmark datasets. The results show that our model demonstrates robustness in managing uncertain data and effectively mitigates classification risks. Furthermore, our model almost outperforms the other comparison methods in both effectiveness and efficiency.
Open Continual Feature Selection via Granular-Ball Knowledge Transfer
Cao, Xuemei, Yang, Xin, Xia, Shuyin, Wang, Guoyin, Li, Tianrui
This paper presents a novel framework for continual feature selection (CFS) in data preprocessing, particularly in the context of an open and dynamic environment where unknown classes may emerge. CFS encounters two primary challenges: the discovery of unknown knowledge and the transfer of known knowledge. To this end, the proposed CFS method combines the strengths of continual learning (CL) with granular-ball computing (GBC), which focuses on constructing a granular-ball knowledge base to detect unknown classes and facilitate the transfer of previously learned knowledge for further feature selection. CFS consists of two stages: initial learning and open learning. The former aims to establish an initial knowledge base through multi-granularity representation using granular-balls. The latter utilizes prior granular-ball knowledge to identify unknowns, updates the knowledge base for granular-ball knowledge transfer, reinforces old knowledge, and integrates new knowledge. Subsequently, we devise an optimal feature subset mechanism that incorporates minimal new features into the existing optimal subset, often yielding superior results during each period. Extensive experimental results on public benchmark datasets demonstrate our method's superiority in terms of both effectiveness and efficiency compared to state-of-the-art feature selection methods.
Granular-ball computing: an efficient, robust, and interpretable adaptive multi-granularity representation and computation method
Xia, Shuyin, Wang, Guoyin, Gao, Xinbo, Lian, Xiaoyu
Human cognition operates on a "Global-first" cognitive mechanism, prioritizing information processing based on coarse-grained details. This mechanism inherently possesses an adaptive multi-granularity description capacity, resulting in computational traits such as efficiency, robustness, and interpretability. The analysis pattern reliance on the finest granularity and single-granularity makes most existing computational methods less efficient, robust, and interpretable, which is an important reason for the current lack of interpretability in neural networks. Multi-granularity granular-ball computing employs granular-balls of varying sizes to daptively represent and envelop the sample space, facilitating learning based on these granular-balls. Given that the number of coarse-grained "granular-balls" is fewer than sample points, granular-ball computing proves more efficient. Moreover, the inherent coarse-grained nature of granular-balls reduces susceptibility to fine-grained sample disturbances, enhancing robustness. The multi-granularity construct of granular-balls generates topological structures and coarse-grained descriptions, naturally augmenting interpretability. Granular-ball computing has successfully ventured into diverse AI domains, fostering the development of innovative theoretical methods, including granular-ball classifiers, clustering techniques, neural networks, rough sets, and evolutionary computing. This has notably ameliorated the efficiency, noise robustness, and interpretability of traditional methods. Overall, granular-ball computing is a rare and innovative theoretical approach in AI that can adaptively and simultaneously enhance efficiency, robustness, and interpretability. This article delves into the main application landscapes for granular-ball computing, aiming to equip future researchers with references and insights to refine and expand this promising theory.