Bayesian Networks have been widely used in the last decades in many fields, to describe statistical dependencies among random variables. In general, learning the structure of such models is a problem with considerable theoretical interest that still poses many challenges. On the one hand, this is a well-known NP-complete problem, which is practically hardened by the huge search space of possible solutions. On the other hand, the phenomenon of I-equivalence, i.e., different graphical structures underpinning the same set of statistical dependencies, may lead to multimodal fitness landscapes further hindering maximum likelihood approaches to solve the task. Despite all these difficulties, greedy search methods based on a likelihood score coupled with a regularization term to account for model complexity, have been shown to be surprisingly effective in practice. In this paper, we consider the formulation of the task of learning the structure of Bayesian Networks as an optimization problem based on a likelihood score. Nevertheless, our approach do not adjust this score by means of any of the complexity terms proposed in the literature; instead, it accounts directly for the complexity of the discovered solutions by exploiting a multi-objective optimization procedure. To this extent, we adopt NSGA-II and define the first objective function to be the likelihood of a solution and the second to be the number of selected arcs. We thoroughly analyze the behavior of our method on a wide set of simulated data, and we discuss the performance considering the goodness of the inferred solutions both in terms of their objective functions and with respect to the retrieved structure. Our results show that NSGA-II can converge to solutions characterized by better likelihood and less arcs than classic approaches, although paradoxically frequently characterized by a lower similarity to the target network.
Biomedical data is filled with continuous real values; these values in the feature set tend to create problems like underfitting, the curse of dimensionality and increase in misclassification rate because of higher variance. In response, pre-processing techniques on dataset minimizes the side effects and have shown success in maintaining the adequate accuracy. Feature selection and discretization are the two necessary preprocessing steps that were effectively employed to handle the data redundancies in the biomedical data. However, in the previous works, the absence of unified effort by integrating feature selection and discretization together in solving the data redundancy problem leads to the disjoint and fragmented field. This paper proposes a novel multi-objective based dimensionality reduction framework, which incorporates both discretization and feature reduction as an ensemble model for performing feature selection and discretization. Selection of optimal features and the categorization of discretized and non-discretized features from the feature subset is governed by the multi-objective genetic algorithm (NSGA-II). The two objective, minimizing the error rate during the feature selection and maximizing the information gain while discretization is considered as fitness criteria.
In the past decades, the rapid growth of computer and database technologies has led to the rapid growth of large-scale datasets. On the other hand, data mining applications with high dimensional datasets that require high speed and accuracy are rapidly increasing. An important issue with these applications is the curse of dimensionality, where the number of features is much higher than the number of patterns. One of the dimensionality reduction approaches is feature selection that can increase the accuracy of the data mining task and reduce its computational complexity. The feature selection method aims at selecting a subset of features with the lowest inner similarity and highest relevancy to the target class. It reduces the dimensionality of the data by eliminating irrelevant, redundant, or noisy data. In this paper, a comparative analysis of different feature selection methods is presented, and a general categorization of these methods is performed. Moreover, in this paper, state-of-the-art swarm intelligence are studied, and the recent feature selection methods based on these algorithms are reviewed. Furthermore, the strengths and weaknesses of the different studied swarm intelligence-based feature selection methods are evaluated.
AutoML systems are currently rising in popularity, as they can build powerful models without human oversight. They often combine techniques from many different sub-fields of machine learning in order to find a model or set of models that optimize a user-supplied criterion, such as predictive performance. The ultimate goal of such systems is to reduce the amount of time spent on menial tasks, or tasks that can be solved better by algorithms while leaving decisions that require human intelligence to the end-user. In recent years, the importance of other criteria, such as fairness and interpretability, and many others have become more and more apparent. Current AutoML frameworks either do not allow to optimize such secondary criteria or only do so by limiting the system's choice of models and preprocessing steps. We propose to optimize additional criteria defined by the user directly to guide the search towards an optimal machine learning pipeline. In order to demonstrate the need and usefulness of our approach, we provide a simple multi-criteria AutoML system and showcase an exemplary application.
Beyond the possible misuses of technology, there is an increased awareness that these processes are not neutral and can reproduce and amplify past and current structural inequalities [1, 2]. Within this context, particular interest is paid to the role of machine learning (ML) with well known examples of models biased against historically discriminated groups [3, 4, 5] or the intersection of these groups [6, 7]. Fairness in ML has emerged as a community initially motivated to develop technological solutions to the disparate impact and treatment by biased algorithms [8, 9, 10, 11, 5] that also moves to a broader and multi-disciplinary understanding of the issues of socio-technological interventions [12, 13, 14, 15]. This work contribute to this field by studying how far bias mitigation can go whilst satisfying the accuracy and transparency of the models, thus providing a tool for a wider understanding of the technological boundaries of socio-technical proposals. Bias mitigation techniques can broadly be divided into three non-exclusive categories : (1) preprocessing, (2) inprocessing, and (3) postprocessing. The preprocessing techniques attempt to learn new representations of data to satisfy fairness definitions. The inprocessing methods involve modifying the classifier algorithm by adding a fairness constraint to the optimization problem. The postprocessing methods aim at removing discriminatory decisions after the model is trained. Normally, in inprocessing approaches the fairness criteria are used as an optimization constraint rather than as a guide to build a more equitable prediction model.