Brys, Tim (Vrije Universiteit Brussel) | Nowé, Ann (Vrije Universiteit Brussel)

We prove that this conflicts may exist between objectives, there is in general modification preserves the total order, and thus also optimality, a need to identify (a set of) tradeoff solutions. The set of policies, mainly relying on the results by Ng, Harada, of optimal, i.e. non-dominated, incomparable solutions is and Russell (1999). This insight - that any MDP can be called the Pareto-front. We identify multi-objective problems framed as a CMOMDP - significantly increases the importance with correlated objectives (CMOP) as a specific subclass of this problem class, as well as techniques developed of multi-objective problems, defined to contain those for it, as these could be used to solve regular single-objective MOPs whose Pareto-front is so limited that one can barely MDPs faster and better, provided several meaningful shapings speak of tradeoffs (Brys et al. 2014b). By consequence, can be devised.

Roijers, Diederik Marijn (University of Amsterdam)

In decision-theoretic planning problems, such as (partially observable) Markov decision problems or coordination graphs, agents typically aim to optimize a scalar value function. However, in many real-world problems agents are faced with multiple possibly conflicting objectives. In such multi-objective problems, the value is a vector rather than a scalar, and we need methods that compute a coverage set, i.e., a set of solutions optimal for all possible trade-offs between the objectives. In this project propose new multi-objective planning methods that compute the so-called convex coverage set (CCS): the coverage set for when policies can be stochastic, or the preferences are linear. We show that the CCS has favorable mathematical properties, and is typically much easier to compute that the Pareto front, which is often axiomatically assumed as the solution set for multi-objective decision problems.

Roijers, Diederik Marijn (University of Amsterdam) | Whiteson, Shimon (University of Amsterdam) | Oliehoek, Frans A. (University of Liverpool)

Many sequential decision-making problems require an agent to reason about both multiple objectives and uncertainty regarding the environment's state. Such problems can be naturally modelled as multi-objective partially observable Markov decision processes (MOPOMDPs). We propose optimistic linear support with alpha reuse (OLSAR), which computes a bounded approximation of the optimal solution set for all possible weightings of the objectives. The main idea is to solve a series of scalarized single-objective POMDPs, each corresponding to a different weighting of the objectives. A key insight underlying OLSAR is that the policies and value functions produced when solving scalarized POMDPs in earlier iterations can be reused to more quickly solve scalarized POMDPs in later iterations. We show experimentally that OLSAR outperforms, both in terms of runtime and approximation quality, alternative methods and a variant of OLSAR that does not leverage reuse.

Roijers, D. M., Vamplew, P., Whiteson, S., Dazeley, R.

Sequential decision-making problems with multiple objectives arise naturally in practice and pose unique challenges for research in decision-theoretic planning and learning, which has largely focused on single-objective settings. This article surveys algorithms designed for sequential decision-making problems with multiple objectives. Though there is a growing body of literature on this subject, little of it makes explicit under what circumstances special methods are needed to solve multi-objective problems. Therefore, we identify three distinct scenarios in which converting such a problem to a single-objective one is impossible, infeasible, or undesirable. Furthermore, we propose a taxonomy that classifies multi-objective methods according to the applicable scenario, the nature of the scalarization function (which projects multi-objective values to scalar ones), and the type of policies considered. We show how these factors determine the nature of an optimal solution, which can be a single policy, a convex hull, or a Pareto front. Using this taxonomy, we survey the literature on multi-objective methods for planning and learning. Finally, we discuss key applications of such methods and outline opportunities for future work.

Using some real-world examples I illustrate the important role of multiobjective optimization in decision making and its interface with preference handling. I explain what optimization in the presence of multiple objectives means and discuss some of the most common methods of solving multiobjective optimization problems using transformations to single-objective optimization problems. Finally, I address linear and combinatorial optimization problems with multiple objectives and summarize techniques for solving them. Throughout the article I refer to the real-world examples introduced at the beginning. There are infinitely many ways to invest money and infinitely many possible radiotherapy treatments, but the number of feasible crew schedules is finite, albeit astronomical in practice.