Using some real-world examples I illustrate the important role of multiobjective optimization in decision making and its interface with preference handling. I explain what optimization in the presence of multiple objectives means and discuss some of the most common methods of solving multiobjective optimization problems using transformations to single-objective optimization problems. Finally, I address linear and combinatorial optimization problems with multiple objectives and summarize techniques for solving them. Throughout the article I refer to the real-world examples introduced at the beginning. There are infinitely many ways to invest money and infinitely many possible radiotherapy treatments, but the number of feasible crew schedules is finite, albeit astronomical in practice.
Abstract-- Multi-objective reinforcement learning (MORL) is the generalization of standard reinforcement learning (RL) approaches to solve sequential decision making problems that consist of several, possibly conflicting, objectives. Generally, in such formulations, there is no single optimal policy which optimizes all the objectives simultaneously, and instead, a number of policies has to be found, each optimizing a preference of the objectives. In this paper, we introduce a novel MORL approach by training a meta-policy, a policy simultaneously trained with multiple tasks sampled from a task distribution, for a number of randomly sampled Markov decision processes (MDPs). In other words, the MORL is framed as a meta-learning problem, with the task distribution given by a distribution over the preferences. We demonstrate that such a formulation results in a better approximation of the Pareto optimal solutions, in terms of both the optimality and the computational efficiency. We evaluated our method on obtaining Pareto optimal policies using a number of continuous control problems with high degrees of freedom. I. INTRODUCTION Reinforcement learning (RL) is a framework to train an agent to acquire a behavior by reinforcing actions that maximize a notion of task-relevant future rewards. A reward function, i.e., the function that assigns a reward value to every action-decision made by the agent, is designed to guide the training to implement the behavior.
In this paper, we investigate the possibility of improving the performance of multi-objective optimization solution approaches using machine learning techniques. Specifically, we focus on multi-objective binary linear programs and employ one of the most effective and recently developed criterion space search algorithms, the so-called KSA, during our study. This algorithm computes all nondominated points of a problem with p objectives by searching on a projected criterion space, i.e., a (p-1)-dimensional criterion apace. We present an effective and fast learning approach to identify on which projected space the KSA should work. We also present several generic features/variables that can be used in machine learning techniques for identifying the best projected space. Finally, we present an effective bi-objective optimization based heuristic for selecting the best subset of the features to overcome the issue of overfitting in learning. Through an extensive computational study over 2000 instances of tri-objective Knapsack and Assignment problems, we demonstrate that an improvement of up to 12% in time can be achieved by the proposed learning method compared to a random selection of the projected space.
Bi-Objective Combinatorial Optimization problems are ubiquitous in real-world applications and designing approaches to solve them efficiently is an important research area of Artificial Intelligence. In Constraint Programming, the recently introduced bi-objective Pareto constraint allows one to solve bi-objective combinatorial optimization problems exactly. Using this constraint, every non-dominated solution is collected in a single tree-search while pruning sub-trees that cannot lead to a non-dominated solution. This paper introduces a simpler and more efficient filtering algorithm for the bi-objective Pareto constraint. The efficiency of this algorithm is experimentally confirmed on classical bi-objective benchmarks.
Sequential decision-making problems with multiple objectives arise naturally in practice and pose unique challenges for research in decision-theoretic planning and learning, which has largely focused on single-objective settings. This article surveys algorithms designed for sequential decision-making problems with multiple objectives. Though there is a growing body of literature on this subject, little of it makes explicit under what circumstances special methods are needed to solve multi-objective problems. Therefore, we identify three distinct scenarios in which converting such a problem to a single-objective one is impossible, infeasible, or undesirable. Furthermore, we propose a taxonomy that classifies multi-objective methods according to the applicable scenario, the nature of the scalarization function (which projects multi-objective values to scalar ones), and the type of policies considered. We show how these factors determine the nature of an optimal solution, which can be a single policy, a convex hull, or a Pareto front. Using this taxonomy, we survey the literature on multi-objective methods for planning and learning. Finally, we discuss key applications of such methods and outline opportunities for future work.