AITopics | momdp

Collaborating Authors

momdp

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

An Analytical Study of Utility Functions in Multi-Objective Reinforcement Learning

Neural Information Processing SystemsFeb-16-2026, 13:50:39 GMT

In this paper, we contribute to both theoretical analyses.

decision support system, machine learning, reinforcement learning, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Decision Support Systems (0.94)
Information Technology > Game Theory (0.94)

Add feedback

8e1419dfa476017e0ab5d1ac5813e297-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 09:10:33 GMT

momdp, optimal policy, utility function, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)

Add feedback

Preference-Based Planning in Stochastic Environments: From Partially-Ordered Temporal Goals to Most Preferred Policies

Rahmani, Hazhar, Kulkarni, Abhishek N., Fu, Jie

arXiv.org Artificial IntelligenceMar-26-2024

Human preferences are not always represented via complete linear orders: It is natural to employ partially-ordered preferences for expressing incomparable outcomes. In this work, we consider decision-making and probabilistic planning in stochastic systems modeled as Markov decision processes (MDPs), given a partially ordered preference over a set of temporally extended goals. Specifically, each temporally extended goal is expressed using a formula in Linear Temporal Logic on Finite Traces (LTL$_f$). To plan with the partially ordered preference, we introduce order theory to map a preference over temporal goals to a preference over policies for the MDP. Accordingly, a most preferred policy under a stochastic ordering induces a stochastic nondominated probability distribution over the finite paths in the MDP. To synthesize a most preferred policy, our technical approach includes two key steps. In the first step, we develop a procedure to transform a partially ordered preference over temporal goals into a computational model, called preference automaton, which is a semi-automaton with a partial order over acceptance conditions. In the second step, we prove that finding a most preferred policy is equivalent to computing a Pareto-optimal policy in a multi-objective MDP that is constructed from the original MDP, the preference automaton, and the chosen stochastic ordering relation. Throughout the paper, we employ running examples to illustrate the proposed preference specification and solution approaches. We demonstrate the efficacy of our algorithm using these examples, providing detailed analysis, and then discuss several potential future directions.

formula, pdfa, probability, (15 more...)

arXiv.org Artificial Intelligence

2403.18212

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Florida > Alachua County > Gainesville (0.04)
Europe > Switzerland (0.04)
Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)

Genre:

Workflow (0.48)
Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.66)

Add feedback

Nonlinear Multi-objective Reinforcement Learning with Provable Guarantees

Peng, Nianli, Fain, Brandon

arXiv.org Artificial IntelligenceDec-14-2023

We describe RA-E3 (Reward-Aware Explicit Explore or Exploit), an algorithm with provable guarantees for solving a single or multi-objective Markov Decision Process (MDP) where we want to maximize the expected value of a nonlinear function over accumulated rewards. This allows us to model fairness-aware welfare optimization for multi-objective reinforcement learning as well as risk-aware reinforcement learning with nonlinear Von Neumann-Morgenstern utility functions in the single objective setting. RA-E3 extends the classic E3 algorithm that solves MDPs with scalar rewards and linear preferences. We first state a distinct reward-aware version of value iteration that calculates a non-stationary policy that is approximately optimal for a given model of the environment. This sub-procedure is based on an extended form of Bellman optimality for nonlinear optimization that explicitly considers time and current accumulated reward. We then describe how to use this optimization procedure in a larger algorithm that must simultaneously learn a model of the environment. The algorithm learns an approximately optimal policy in time that depends polynomially on the MDP size, desired approximation, and smoothness of the nonlinear function, and exponentially on the number of objectives.

algorithm, kearn and singh 2002, probability, (13 more...)

arXiv.org Artificial Intelligence

2311.02544

Country: North America > United States (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Welfare and Fairness in Multi-objective Reinforcement Learning

Fan, Zimeng, Peng, Nianli, Tian, Muhang, Fain, Brandon

arXiv.org Artificial IntelligenceNov-12-2023

We study fair multi-objective reinforcement learning in which an agent must learn a policy that simultaneously achieves high reward on multiple dimensions of a vector-valued reward. Motivated by the fair resource allocation literature, we model this as an expected welfare maximization problem, for some nonlinear fair welfare function of the vector of long-term cumulative rewards. One canonical example of such a function is the Nash Social Welfare, or geometric mean, the log transform of which is also known as the Proportional Fairness objective. We show that even approximately optimal optimization of the expected Nash Social Welfare is computationally intractable even in the tabular case. Nevertheless, we provide a novel adaptation of Q-learning that combines nonlinear scalarized learning updates and non-stationary action selection to learn effective policies for optimizing nonlinear welfare functions. We show that our algorithm is provably convergent, and we demonstrate experimentally that our approach outperforms techniques based on linear scalarization, mixtures of optimal linear scalarizations, or stationary action selection for the Nash Social Welfare Objective.

algorithm, welfare, welfare function, (12 more...)

arXiv.org Artificial Intelligence

doi: 10.5555/3545946.3598870

2212.01382

Country:

North America > United States > North Carolina > Durham County > Durham (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Oceania > New Zealand (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Industry:

Transportation > Passenger (0.48)
Transportation > Ground > Road (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Roijers

AAAI ConferencesFeb-8-2022, 11:27:19 GMT

Iteratively solving a set of linear programs (LPs) is a common strategy for solving various decision-making problems in Artificial Intelligence, such as planning in multi-objective or partially observable Markov Decision Processes (MDPs). A prevalent feature is that the solutions to these LPs become increasingly similar as the solving algorithm converges, because the solution computed by the algorithm approaches the fixed point of a Bellman backup operator. In this paper, we propose to speed up the solving process of these LPs by bootstrapping based on similar LPs solved previously. We use these LPs to initialize a subset of relevant LP constraints, before iteratively generating the remaining constraints. The resulting algorithm is the first to consider such information sharing across iterations. We evaluate our approach on planning in Multi-Objective MDPs (MOMDPs) and Partially Observable MDPs (POMDPs), showing that it solves fewer LPs than the state of the art, which leads to a significant speed-up. Moreover, for MOMDPs we show that our method scales better in both the number of states and the number of objectives, which is vital for multi-objective planning.

constraint, momdp, roijer

AAAI Conferences

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Pineda

AAAI ConferencesFeb-8-2022, 10:58:07 GMT

We consider stochastic planning problems that involve multiple objectives such as minimizing task completion time and energy consumption. These problems can be modeled as multi-objective Markov decision processes (MOMDPs), an extension of the widely-used MDP model to handle problems involving multiple value functions. We focus on a subclass of MOMDPs in which the objectives have a {\em relaxed lexicographic structure}, allowing an agent to seek improvement in a lower-priority objective when the impact on a higher-priority objective is within some small given tolerance. We examine the relationship between this class of problems and {\em constrained MDPs}, showing that the latter offer an alternative solution method with strong guarantees. We show empirically that a recently introduced algorithm for MOMDPs may not offer the same strong guarantees, but it does perform well in practice.

objective, pineda, strong guarantee, (1 more...)

AAAI Conferences

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

Accommodating Picky Customers: Regret Bound and Exploration Complexity for Multi-Objective Reinforcement Learning

Wu, Jingfeng, Braverman, Vladimir, Yang, Lin F.

arXiv.org Machine LearningNov-25-2020

In single-objective reinforcement learning (RL), a scalar reward is pre-specified and an agent learns a policy to maximize the long-term cumulative reward [Azar et al., 2017, Jin et al., 2018]. However, in many real-world applications, we need to optimize multiple objectives for the same (unknown) environment, even when these objectives are possibly contradicting [Roijers et al., 2013]. For example, in an autonomous driving application, each passenger may have a different preference of driving styles: some of the passengers prefer a very steady riding experience while other passengers enjoy the fast acceleration of the car. Therefore, traditional single-objective RL approach may fail to be applied in such scenarios. One way to tackle this issue is the multi-objective reinforcement learning (MORL) [Roijers et al., 2013, Yang et al., 2019, Natarajan and Tadepalli, 2005, Abels et al., 2018] method, which models the multiple objectives by a vectorized reward, and an additional preference vector to specify the relative importance of each objective. The agent of MORL needs to find policies to optimize the cumulative preference-weighted rewards under all possible preferences.

algorithm, probability, transition probability, (14 more...)

arXiv.org Machine Learning

2011.13034

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Industry:

Transportation > Passenger (0.65)
Transportation > Ground > Road (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Multi-objective dynamic programming with limited precision

Mandow, L., de la Cruz, J. L. Pérez, Pozas, N.

arXiv.org Machine LearningSep-17-2020

Markov decision processes (MDP) are a well-known conceptual tool useful for modelling sequential decision processes and have been widely used in real-world applications such as adaptive production control (see e. g. Kuhnle et al. (2020)), equipment maintenance (see Barde et al. (2019), Liu et al. (2019)) or robot planning (see Veeramani et al. (2020)), to name a few. Usual optimization procedures take into account just a scalar value to be maximized. However, in many cases the objective function is more accurately described by a vector (see e. g. Gen and Lin (2014), Zhang and Xu (2017)) and multi-objective optimization must be applied.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Machine Learning

2009.08198

Country: Europe > Spain > Andalusia (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Learning Fair Policies in Multiobjective (Deep) Reinforcement Learning with Average and Discounted Rewards

Siddique, Umer, Weng, Paul, Zimmer, Matthieu

arXiv.org Artificial IntelligenceAug-18-2020

As the operations of autonomous systems generally affect simultaneously several users, it is crucial that their designs account for fairness considerations. In contrast to standard (deep) reinforcement learning (RL), we investigate the problem of learning a policy that treats its users equitably. In this paper, we formulate this novel RL problem, in which an objective function, which encodes a notion of fairness that we formally define, is optimized. For this problem, we provide a theoretical discussion where we examine the case of discounted rewards and that of average rewards. During this analysis, we notably derive a new result in the standard RL setting, which is of independent interest: it states a novel bound on the approximation error with respect to the optimal average reward of that of a policy optimal for the discounted reward. Since learning with discounted rewards is generally easier, this discussion further justifies finding a fair policy for the average reward by learning a fair policy for the discounted reward. Thus, we describe how several classic deep RL algorithms can be adapted to our fair optimization problem, and we validate our approach with extensive experiments in three different domains.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2008.07773

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States > Oregon > Multnomah County > Portland (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback