Sequential decision-making problems with multiple objectives arise naturally in practice and pose unique challenges for research in decision-theoretic planning and learning, which has largely focused on single-objective settings. This article surveys algorithms designed for sequential decision-making problems with multiple objectives. Though there is a growing body of literature on this subject, little of it makes explicit under what circumstances special methods are needed to solve multi-objective problems. Therefore, we identify three distinct scenarios in which converting such a problem to a single-objective one is impossible, infeasible, or undesirable. Furthermore, we propose a taxonomy that classifies multi-objective methods according to the applicable scenario, the nature of the scalarization function (which projects multi-objective values to scalar ones), and the type of policies considered. We show how these factors determine the nature of an optimal solution, which can be a single policy, a convex hull, or a Pareto front. Using this taxonomy, we survey the literature on multi-objective methods for planning and learning. Finally, we discuss key applications of such methods and outline opportunities for future work.
The majority of multi-agent system (MAS) implementations aim to optimise agents' policies with respect to a single objective, despite the fact that many real-world problem domains are inherently multi-objective in nature. Multi-objective multi-agent systems (MOMAS) explicitly consider the possible trade-offs between conflicting objective functions. We argue that, in MOMAS, such compromises should be analysed on the basis of the utility that these compromises have for the users of a system. As is standard in multi-objective optimisation, we model the user utility using utility functions that map value or return vectors to scalar values. This approach naturally leads to two different optimisation criteria: expected scalarised returns (ESR) and scalarised expected returns (SER). We develop a new taxonomy which classifies multi-objective multi-agent decision making settings, on the basis of the reward structures, and which and how utility functions are applied. This allows us to offer a structured view of the field, to clearly delineate the current state-of-the-art in multi-objective multi-agent decision making approaches and to identify promising directions for future research. Starting from the execution phase, in which the selected policies are applied and the utility for the users is attained, we analyse which solution concepts apply to the different settings in our taxonomy. Furthermore, we define and discuss these solution concepts under both ESR and SER optimisation criteria. We conclude with a summary of our main findings and a discussion of many promising future research directions in multi-objective multi-agent systems.
As potential-based reward shaping functions (heuristic signals conflicts may exist between objectives, there is in general guiding exploration) (Brys et al. 2014a). We prove that this a need to identify (a set of) tradeoff solutions. The set modification preserves the total order, and thus also optimality, of optimal, i.e. non-dominated, incomparable solutions is of policies, mainly relying on the results by Ng, Harada, called the Pareto-front. We identify multi-objective problems and Russell (1999). This insight - that any MDP can be with correlated objectives (CMOP) as a specific subclass framed as a CMOMDP - significantly increases the importance of multi-objective problems, defined to contain those of this problem class, as well as techniques developed MOPs whose Pareto-front is so limited that one can barely for it, as these could be used to solve regular single-objective speak of tradeoffs (Brys et al. 2014b). By consequence, MDPs faster and better, provided several meaningful shapings the system designer does not care about which of the very can be devised.
Whether it is in the field of production, logistics, in medicine or biology; everywhere the global optimal solution or the set of global optimal solutions is sought. However, most real-world problems are of nonlinear nature and naturally multimodal which poses severe problems to global optimization. Multimodality, the existence of multiple (local) optima, is regarded as one of the biggest challenges for continuous single-objective problems . A lot of algorithms get stuck searching for the global optimum or are requiring many function evaluations to escape local optima. One of the most popular strategies for dealing with multimodal problems are population-based methods like evolutionary algorithms due to their global search abilities . In this paper we will examine another approach of coping with local traps, namely multiobjectivization. By transforming a single-objective into a multi-objective problem, we aim at exploiting the properties of multi-objective landscapes. So far, the characteristics of single-objective optimization problems have often been directly transferred to the multiobjective domain.
Bonini, Rodrigo Cesar (Escola Politécnica da Universidade de São Paulo) | Silva, Felipe Leno da (Escola Politécnica da Universidade de São Paulo) | Costa, Anna Helena Reali (Escola Politécnica da Universidade de São Paulo)
Reinforcement Learning (RL) is a successful technique to train autonomous agents. However, the classical RL methods take a long time to learn how to solve tasks. Option-based solutions can be used to accelerate learning and transfer learned behaviors across tasks by encapsulating a partial policy into an action. However, the literature report only single-agent and single-objective option-based methods, but many RL tasks, especially real-world problems, are better described through multiple objectives. We here propose a method to learn options in Multiobjective Reinforcement Learning domains in order to accelerate learning and reuse knowledge across tasks. Our initial experiments in the Goldmine Domain show that our proposal learn useful options that accelerate learning in multiobjective domains. Our next steps are to use the learned options to transfer knowledge across tasks and evaluate this method with stochastic policies.