Collaborating Authors

A Comparative Analysis of Ensemble Classifiers: Case Studies in Genomics Machine Learning

The combination of multiple classifiers using ensemble methods is increasingly important for making progress in a variety of difficult prediction problems. We present a comparative analysis of several ensemble methods through two case studies in genomics, namely the prediction of genetic interactions and protein functions, to demonstrate their efficacy on real-world datasets and draw useful conclusions about their behavior. These methods include simple aggregation, meta-learning, cluster-based meta-learning, and ensemble selection using heterogeneous classifiers trained on resampled data to improve the diversity of their predictions. We present a detailed analysis of these methods across 4 genomics datasets and find the best of these methods offer statistically significant improvements over the state of the art in their respective domains. In addition, we establish a novel connection between ensemble selection and meta-learning, demonstrating how both of these disparate methods establish a balance between ensemble diversity and performance.

When does Diversity Help Generalization in Classification Ensembles? Machine Learning

IEEE TRANSACTIONS ON CYBERNETICS 1 When does Diversity Help Generalization in Classification Ensembles? Yijun Bian, and Huanhuan Chen, Senior Member, IEEE Abstract --Ensembles, as a widely used and effective technique in the machine learning community, succeed within a key element--"diversity." The relationship between diversity and generalization, unfortunately, is not entirely understood and remains an open research issue. T o reveal the effect of diversity on the generalization of classification ensembles, we investigate three issues on diversity, i.e., the measurement of diversity, the relationship between the proposed diversity and generalization error, and the utilization of this relationship for ensemble pruning. In the diversity measurement, we measure diversity by error decomposition inspired by regression ensembles, which decomposes the error of classification ensembles into accuracy and diversity. Then we formulate the relationship between the measured diversity and ensemble performance through the theorem of margin and generalization, and observe that the generalization error is reduced effectively only when the measured diversity is increased in a few specific ranges, while in other ranges larger diversity is less beneficial to increase generalization of an ensemble. Besides, we propose a pruning method based on diversity management to utilize this relationship, which could increase diversity appropriately and shrink the size of the ensemble with non-decreasing performance. The experiments validate the effectiveness of this proposed relationship between the proposed diversity and the ensemble generalization error . I NTRODUCTION Ensemble learning has attracted plenty of research attention in the machine learning community thanks to its remarkable potential [1], and it has been widely used in many real-world applications such as object detection, object recognition, and object tracking [2]-[5]. Rather than relying on one single model, an ensemble is a set of learned models that make decisions collectively. It is widely accepted that an ensemble usually generalizes better than one single model [6]-[9].

Apple Researchers Propose A Method For Reconstructing Training Data From Diverse Machine Learning Models By Ensemble Inversion


Model inversion (MI), where an adversary abuses access to a trained Machine Learning (ML) model in order to infer sensitive information about the model's original training data, has gotten a lot of attention in recent years. The trained model under assault is frequently frozen during MI and used to direct the training of a generator, such as a Generative Adversarial Network, to rebuild the distribution of the model's original training data. As a result, scrutiny of the capabilities of MI techniques is essential for the creation of appropriate protection techniques. Reconstruction of training data with high quality using a single model is complex. However, existing MI literature does not consider targeting many models simultaneously, which could offer the adversary extra information and viewpoints.

Hyperparameter Ensembles for Robustness and Uncertainty Quantification Machine Learning

Ensembles over neural network weights trained from different random initialization, known as deep ensembles, achieve state-of-the-art accuracy and calibration. The recently introduced batch ensembles provide a drop-in replacement that is more parameter efficient. In this paper, we design ensembles not only over weights, but over hyperparameters to improve the state of the art in both settings. For best performance independent of budget, we propose hyper-deep ensembles, a simple procedure that involves a random search over different hyperparameters, themselves stratified across multiple random initializations. Its strong performance highlights the benefit of combining models with both weight and hyperparameter diversity. We further propose a parameter efficient version, hyper-batch ensembles, which builds on the layer structure of batch ensembles and self-tuning networks. The computational and memory costs of our method are notably lower than typical ensembles. On image classification tasks, with MLP, LeNet, ResNet 20 and Wide ResNet 28-10 architectures, we improve upon both deep and batch ensembles.

A new selection strategy for selective cluster ensemble based on Diversity and Independency Machine Learning

This research introduces a new strategy in cluster ensemble selection by using Independency and Diversity metrics. In recent years, Diversity and Quality, which are two metrics in evaluation procedure, have been used for selecting basic clustering results in the cluster ensemble selection. Although quality can improve the final results in cluster ensemble, it cannot control the procedures of generating basic results, which causes a gap in prediction of the generated basic results' accuracy. Instead of quality, this paper introduces Independency as a supplementary method to be used in conjunction with Diversity. Therefore, this paper uses a heuristic metric, which is based on the procedure of converting code to graph in Software Testing, in order to calculate the Independency of two basic clustering algorithms. Moreover, a new modeling language, which we called as "Clustering Algorithms Independency Language" (CAIL), is introduced in order to generate graphs which depict Independency of algorithms. Also, Uniformity, which is a new similarity metric, has been introduced for evaluating the diversity of basic results. As a credential, our experimental results on varied different standard data sets show that the proposed framework improves the accuracy of final results dramatically in comparison with other cluster ensemble methods.