Goto

Collaborating Authors

 unsupervised feature selection algorithm



Algorithmic stability and generalization of an unsupervised feature selection algorithm

Neural Information Processing Systems

Feature selection, as a vital dimension reduction technique, reduces data dimension by identifying an essential subset of input features, which can facilitate interpretable insights into learning and inference processes. Algorithmic stability is a key characteristic of an algorithm regarding its sensitivity to perturbations of input samples. In this paper, we propose an innovative unsupervised feature selection algorithm attaining this stability with provable guarantees. The architecture of our algorithm consists of a feature scorer and a feature selector. The scorer trains a neural network (NN) to globally score all the features, and the selector adopts a dependent sub-NN to locally evaluate the representation abilities for selecting features. Further, we present algorithmic stability analysis and show that our algorithm has a performance guarantee via a generalization error bound. Extensive experimental results on real-world datasets demonstrate superior generalization performance of our proposed algorithm to strong baseline methods. Also, the properties revealed by our theoretical analysis and the stability of our algorithm-selected features are empirically confirmed.


Supplementary Material of " Algorithmic Stability and Generalization of an Unsupervised Feature Selection Algorithm "

Neural Information Processing Systems

Correspondence should be addressed to: qiang.cheng@uky.edu. The architecture of our algorithm is shown in Figure 1. For the training based on Eq. (2) of the main text, in each iteration of backpropagation, After training, only the trained selector is used to select features and do reconstruction during testing time. In Eq. (2) of the main text, the second term helps obtain During testing time, only the trained sub-NN is used to select features and do reconstruction. It has 5, 744 samples and 561 features.


Algorithmic stability and generalization of an unsupervised feature selection algorithm

Neural Information Processing Systems

Feature selection, as a vital dimension reduction technique, reduces data dimension by identifying an essential subset of input features, which can facilitate interpretable insights into learning and inference processes. Algorithmic stability is a key characteristic of an algorithm regarding its sensitivity to perturbations of input samples. In this paper, we propose an innovative unsupervised feature selection algorithm attaining this stability with provable guarantees. The architecture of our algorithm consists of a feature scorer and a feature selector. The scorer trains a neural network (NN) to globally score all the features, and the selector adopts a dependent sub-NN to locally evaluate the representation abilities for selecting features.


Unsupervised Feature Selection Algorithm Based on Graph Filtering and Self-representation

Liang, Yunhui, Gan, Jianwen, Chen, Yan, Zhou, Peng, Du, Liang

arXiv.org Artificial Intelligence

Aiming at the problem that existing methods could not fully capture the intrinsic structure of data without considering the higher-order neighborhood information of the data, we proposed an unsupervised feature selection algorithm based on graph filtering and self-representation. Firstly,a higher-order graph filter was applied to the data to obtain its smooth representation,and a regularizer was designed to combine the higher-order graph information for the self-representation matrix learning to capture the intrinsic structure of the data. Secondly,l2,1 norm was used to reconstruct the error term and feature selection matrix to enhance the robustness and row sparsity of the model to select the discriminant features. Finally, an iterative algorithm was applied to effectively solve the proposed objective function and simulation experiments were carried out to verify the effectiveness of the proposed algorithm.


Unsupervised Feature Selection Algorithm Based on Dual Manifold Re-ranking

Liang, Yunhui, Gan, Jianwen, Chen, Yan, Zhou, Peng, Du, Liang

arXiv.org Artificial Intelligence

High-dimensional data is commonly encountered in numerous data analysis tasks. Feature selection techniques aim to identify the most representative features from the original high-dimensional data. Due to the absence of class label information, it is significantly more challenging to select appropriate features in unsupervised learning scenarios compared to supervised ones. Traditional unsupervised feature selection methods typically score the features of samples based on certain criteria, treating samples indiscriminately. However, these approaches fail to fully capture the internal structure of the data. The importance of different samples should vary, and there is a dual relationship between the weight of samples and features that will influence each other. Therefore, an unsupervised feature selection algorithm based on dual manifold re-ranking (DMRR) is proposed in this paper. Different similarity matrices are constructed to depict the manifold structures among samples, between samples and features, and among features themselves. Then, manifold re-ranking is performed by combining the initial scores of samples and features. By comparing DMRR with three original unsupervised feature selection algorithms and two unsupervised feature selection post-processing algorithms, experimental results confirm that the importance information of different samples and the dual relationship between sample and feature are beneficial for achieving better feature selection.