An Efficient High-Dimensional Gene Selection Approach based on Binary Horse Herd Optimization Algorithm for Biological Data Classification

Mehrabi, Niloufar, Boroujeni, Sayed Pedram Haeri, Pashaei, Elnaz

arXiv.org Artificial Intelligence 

Abstract: The Horse Herd Optimization Algorithm (HOA) is a new meta-heuristic algorithm based on the behaviors of horses at different ages. The HOA was introduced recently to solve complex and high-dimensional problems. This paper proposes a binary version of the Horse Herd Optimization Algorithm (BHOA) in order to solve discrete problems and select prominent feature subsets. Moreover, this study provides a novel hybrid feature selection framework based on the BHOA and a minimum Redundancy Maximum Relevance (MRMR) filter method. This hybrid feature selection, which is more computationally efficient, produces a beneficial subset of relevant and informative features. Since feature selection is a binary problem, we have applied a new Transfer Function (TF), called X-shape TF, which transforms continuous problems into binary search spaces. Furthermore, the Support Vector Machine (SVM) is utilized to examine the efficiency of the proposed method on ten microarray datasets, namely Lymphoma, Prostate, Brain-1, DLBCL, SRBCT, Leukemia, Ovarian, Colon, Lung, and MLL. In comparison to other state-of-the-art, such as the Gray Wolf (GW), Particle Swarm Optimization (PSO), and Genetic Algorithm (GA), the proposed hybrid method (MRMR-BHOA) demonstrates superior performance in terms of accuracy and minimum selected features. Also, experimental results prove that the X-Shaped BHOA approach outperforms others methods. Introduction In recent years, many researchers have used DNA microarray datasets to analyze thousands of genes simultaneously and correlate their expression with clinical phenotypes in cancer research [1, 2]. Since the microarray dataset contains numerous redundant genes and a limited number of instances, the feature selection technique could be crucial for choosing informative genes [3]. Feature Selection (FS) should be applied in machine learning as a pre-processing phase in order to get optimal output with short training times and low memory consumption [4]. FS plays a significant role in data mining [5] to solve various problems such as data classification[6], data clustering [7], image processing [8], text clustering [9], disaster management [10], and disease forecasting [11]. FS is generally classified into three major groups based on a variety of evaluation criteria, i.e., filter method [12], wrapper model [13], and embedded technique [14]. Also, this technique uses statistical methods for the evaluation of a subset of features [15].