Error-Correcting Output Codes (ECOCs) offer a principled approach for combining simple binary classifiers into multiclass classifiers. In this paper, we investigate the problem of designing optimal ECOCs to achieve both nominal and adversarial accuracy using Support Vector Machines (SVMs) and binary deep learning models. In contrast to previous literature, we present an Integer Programming (IP) formulation to design minimal codebooks with desirable error correcting properties. Our work leverages the advances in IP solvers to generate codebooks with optimality guarantees. To achieve tractability, we exploit the underlying graph-theoretic structure of the constraint set in our IP formulation. This enables us to use edge clique covers to substantially reduce the constraint set. Our codebooks achieve a high nominal accuracy relative to standard codebooks (e.g., one-vs-all, one-vs-one, and dense/sparse codes). We also estimate the adversarial accuracy of our ECOC-based classifiers in a white-box setting. Our IP-generated codebooks provide non-trivial robustness to adversarial perturbations even without any adversarial training.
Automated machine learning (AutoML) is the sub-field of machine learning that aims at automating, to some extend, all stages of the design of a machine learning system. In the context of supervised learning, AutoML is concerned with feature extraction, pre processing, model design and post processing. Major contributions and achievements in AutoML have been taking place during the recent decade. We are therefore in perfect timing to look back and realize what we have learned. This chapter aims to summarize the main findings in the early years of AutoML. More specifically, in this chapter an introduction to AutoML for supervised learning is provided and an historical review of progress in this field is presented. Likewise, the main paradigms of AutoML are described and research opportunities are outlined.
This paper introduces an enhanced meta-heuristic (ML-ACO) that combines machine learning (ML) and ant colony optimization (ACO) to solve combinatorial optimization problems. To illustrate the underlying mechanism of our enhanced algorithm, we start by describing a test problem -- the orienteering problem -- used to demonstrate the efficacy of ML-ACO. In this problem, the objective is to find a route that visits a subset of vertices in a graph within a time budget to maximize the collected score. In the first phase of our ML-ACO algorithm, an ML model is trained using a set of small problem instances where the optimal solution is known. Specifically, classification models are used to classify an edge as being part of the optimal route, or not, using problem-specific features and statistical measures. We have tested several classification models including graph neural networks, logistic regression and support vector machines. The trained model is then used to predict the probability that an edge in the graph of a test problem instance belongs to the corresponding optimal route. In the second phase, we incorporate the predicted probabilities into the ACO component of our algorithm. Here, the probability values bias sampling towards favoring those predicted high-quality edges when constructing feasible routes. We empirically show that ML-ACO generates results that are significantly better than the standard ACO algorithm, especially when the computational budget is limited. Furthermore, we show our algorithm is robust in the sense that (a) its overall performance is not sensitive to any particular classification model, and (b) it generalizes well to large and real-world problem instances. Our approach integrating ML with a meta-heuristic is generic and can be applied to a wide range of combinatorial optimization problems.
Highly non-linear machine learning algorithms have the capacity to handle large, complex datasets. However, the predictive performance of a model usually critically depends on the choice of multiple hyperparameters. Optimizing these (often) constitutes an expensive black-box problem. Model-based optimization is one state-of-the-art method to address this problem. Furthermore, resulting models often lack interpretability, as models usually contain many active features with non-linear effects and higher-order interactions. One model-agnostic way to enhance interpretability is to enforce sparse solutions through feature selection. It is in many applications desirable to forego a small drop in performance for a substantial gain in sparseness, leading to a natural treatment of feature selection as a multi-objective optimization task. Despite the practical relevance of both hyperparameter optimization and feature selection, they are often carried out separately from each other, which is neither efficient, nor does it take possible interactions between hyperparameters and selected features into account. We present, discuss and compare two algorithmically different approaches for joint and multi-objective hyperparameter optimization and feature selection: The first uses multi-objective model-based optimization to tune a feature filter ensemble. The second is an evolutionary NSGA-II-based wrapper-approach to feature selection which incorporates specialized sampling, mutation and recombination operators for the joint decision space of included features and hyperparameter settings. We compare and discuss the approaches on a variety of benchmark tasks. While model-based optimization needs fewer objective evaluations to achieve good performance, it incurs significant overhead compared to the NSGA-II-based approach. The preferred choice depends on the cost of training the ML model on the given data.