Scalarizing functions have been widely used to convert a multiobjective optimization problem into a single objective optimization problem. However, their use in solving (computationally) expensive multi- and many-objective optimization problems in Bayesian multiobjective optimization is scarce. Scalarizing functions can play a crucial role on the quality and number of evaluations required when doing the optimization. In this article, we study and review 15 different scalarizing functions in the framework of Bayesian multiobjective optimization and build Gaussian process models (as surrogates, metamodels or emulators) on them. We use expected improvement as infill criterion (or acquisition function) to update the models. In particular, we compare different scalarizing functions and analyze their performance on several benchmark problems with different number of objectives to be optimized. The review and experiments on different functions provide useful insights when using and selecting a scalarizing function when using a Bayesian multiobjective optimization method.
Multi-objective optimization aims at finding trade-off solutions to conflicting objectives. These constitute the Pareto optimal set. In the context of expensive-to-evaluate functions, it is impossible and often non-informative to look for the entire set. As an end-user would typically prefer a certain part of the objective space, we modify the Bayesian multi-objective optimization algorithm which uses Gaussian Processes to maximize the Expected Hypervolume Improvement, to focus the search in the preferred region. The cumulated effects of the Gaussian Processes and the targeting strategy lead to a particularly efficient convergence to the desired part of the Pareto set. To take advantage of parallel computing, a multi-point extension of the targeting criterion is proposed and analyzed.
User preference integration is of great importance in multi-objective optimization, in particular in many objective optimization. Preferences have long been considered in traditional multicriteria decision making (MCDM) which is based on mathematical programming. Recently, it is integrated in multi-objective metaheuristics (MOMH), resulting in focus on preferred parts of the Pareto front instead of the whole Pareto front. The number of publications on preference-based multi-objective metaheuristics has increased rapidly over the past decades. There already exist various preference handling methods and MOMH methods, which have been combined in diverse ways. This article proposes to use the Web Ontology Language (OWL) to model and systematize the results developed in this field. A review of the existing work is provided, based on which an ontology is built and instantiated with state-of-the-art results. The OWL ontology is made public and open to future extension. Moreover, the usage of the ontology is exemplified for different use-cases, including querying for methods that match an engineering application, bibliometric analysis, checking existence of combinations of preference models and MOMH techniques, and discovering opportunities for new research and open research questions.
In reinforcement learning (RL), an autonomous agent learns to perform complex tasks by maximizing an exogenous reward signal while interacting with its environment. In real-world applications, test conditions may differ substantially from the training scenario and, therefore, focusing on pure reward maximization during training may lead to poor results at test time. In these cases, it is important to trade-off between performance and robustness while learning a policy. While several results exist for robust, model-based RL, the model-free case has not been widely investigated. In this paper, we cast the robust, model-free RL problem as a multi-objective optimization problem. To quantify the robustness of a policy, we use delay margin and gain margin, two robustness indicators that are common in control theory. We show how these metrics can be estimated from data in the model-free setting. We use multi-objective Bayesian optimization (MOBO) to solve efficiently this expensive-to-evaluate, multi-objective optimization problem. We show the benefits of our robust formulation both in sim-to-real and pure hardware experiments to balance a Furuta pendulum.
We propose Pareto-frontier entropy search (PFES) for multi-objective Bayesian optimization (MBO). Unlike the existing entropy search for MBO which considers the entropy of the input space, we define the entropy of Pareto-frontier in the output space. By using a sampled Pareto-frontier from the current model, PFES provides a simple formula for directly evaluating the entropy. Besides the usual MBO setting, in which all the objectives are simultaneously observed, we also consider the "decoupled" setting, in which the objective functions can be observed separately. PFES can easily derive an acquisition function for the decoupled setting through the entropy of the marginal density for each output variable. For the both settings, by conditioning on the sampled Pareto-frontier, dependence among different objectives arises in the entropy evaluation. PFES can incorporate this dependency into the acquisition function, while the existing information-based MBO employs an independent Gaussian approximation. Our numerical experiments show effectiveness of PFES through synthetic functions and real-world datasets from materials science.