AITopics | Vamplew, Peter

Plotting

Vamplew, Peter

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

On Generalization Across Environments In Multi-Objective Reinforcement Learning

Teoh, Jayden, Varakantham, Pradeep, Vamplew, Peter

arXiv.org Artificial IntelligenceMar-12-2025

Real-world sequential decision-making tasks often require balancing trade-offs between multiple conflicting objectives, making Multi-Objective Reinforcement Learning (MORL) an increasingly prominent field of research. Despite recent advances, existing MORL literature has narrowly focused on performance within static environments, neglecting the importance of generalizing across diverse settings. Conversely, existing research on generalization in RL has always assumed scalar rewards, overlooking the inherent multi-objectivity of real-world problems. Generalization in the multi-objective context is fundamentally more challenging, as it requires learning a Pareto set of policies addressing varying preferences across multiple objectives. In this paper, we formalize the concept of generalization in MORL and how it can be evaluated. We then contribute a novel benchmark featuring diverse multi-objective domains with parameterized environment configurations to facilitate future studies in this area. Our baseline evaluations of state-of-the-art MORL algorithms on this benchmark reveals limited generalization capabilities, suggesting significant room for improvement. Our empirical findings also expose limitations in the expressivity of scalar rewards, emphasizing the need for multi-objective specifications to achieve effective generalization. We further analyzed the algorithmic complexities within current MORL approaches that could impede the transfer in performance from the single- to multiple-environment settings. This work fills a critical gap and lays the groundwork for future research that brings together two key areas in reinforcement learning: solving multi-objective decision-making problems and generalizing across diverse environments. We make our code available at https://github.com/JaydenTeoh/MORL-Generalization.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2503.00799

Country:

Europe (0.28)
North America > United States > Massachusetts (0.14)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Adaptive Alignment: Dynamic Preference Adjustments via Multi-Objective Reinforcement Learning for Pluralistic AI

Harland, Hadassah, Dazeley, Richard, Vamplew, Peter, Senaratne, Hashini, Nakisa, Bahareh, Cruz, Francisco

arXiv.org Artificial IntelligenceOct-31-2024

Emerging research in Pluralistic Artificial Intelligence (AI) alignment seeks to address how intelligent systems can be designed and deployed in accordance with diverse human needs and values. We contribute to this pursuit with a dynamic approach for aligning AI with diverse and shifting user preferences through Multi-Objective Reinforcement Learning (MORL), via post-learning policy selection adjustment. In this paper, we introduce the proposed framework for this approach, outline its anticipated advantages and assumptions, and discuss technical details about the implementation. We also examine the broader implications of adopting a retroactive alignment approach through the sociotechnical systems perspective.

artificial intelligence, machine learning, reinforcement learning, (11 more...)

arXiv.org Artificial Intelligence

2410.2363

Country: Oceania > Australia > New South Wales (0.14)

Genre: Research Report (0.51)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.95)

Add feedback

Multi-objective Reinforcement Learning: A Tool for Pluralistic Alignment

Vamplew, Peter, Hayes, Conor F, Foale, Cameron, Dazeley, Richard, Harland, Hadassah

arXiv.org Artificial IntelligenceOct-14-2024

Reinforcement learning (RL) is a valuable tool for the creation of AI systems. However it may be problematic to adequately align RL based on scalar rewards if there are multiple conflicting values or stakeholders to be considered. Over the last decade multi-objective reinforcement learning (MORL) using vector rewards has emerged as an alternative to standard, scalar RL. This paper provides an overview of the role which MORL can play in creating pluralistically-aligned AI.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2410.11221

Country: North America > United States (0.68)

Genre: Overview (0.87)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Value function interference and greedy action selection in value-based multi-objective reinforcement learning

Vamplew, Peter, Foale, Cameron, Dazeley, Richard

arXiv.org Artificial IntelligenceFeb-9-2024

Multi-objective reinforcement learning (MORL) algorithms extend conventional reinforcement learning (RL) to the more general case of problems with multiple, conflicting objectives, represented by vector-valued rewards. Widely-used scalar RL methods such as Q-learning can be modified to handle multiple objectives by (1) learning vector-valued value functions, and (2) performing action selection using a scalarisation or ordering operator which reflects the user's utility with respect to the different objectives. However, as we demonstrate here, if the user's utility function maps widely varying vector-values to similar levels of utility, this can lead to interference in the value-function learned by the agent, leading to convergence to sub-optimal policies. This will be most prevalent in stochastic environments when optimising for the Expected Scalarised Return criterion, but we present a simple example showing that interference can also arise in deterministic environments. We demonstrate empirically that avoiding the use of random tie-breaking when identifying greedy actions can ameliorate, but not fully overcome, the problems caused by value function interference.

function interference, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

2402.06266

Country:

Oceania > Australia (0.15)
Europe (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Utility-Based Reinforcement Learning: Unifying Single-objective and Multi-objective Reinforcement Learning

Vamplew, Peter, Foale, Cameron, Hayes, Conor F., Mannion, Patrick, Howley, Enda, Dazeley, Richard, Johnson, Scott, Källström, Johan, Ramos, Gabriel, Rădulescu, Roxana, Röpke, Willem, Roijers, Diederik M.

arXiv.org Artificial IntelligenceFeb-4-2024

So far the flow of knowledge has primarily been from conventional single-objective RL (SORL) into MORL, with algorithmic Research in multi-objective reinforcement learning(MORL) has introduced innovations from SORL being adapted to the context of multiple the utility-based paradigm, which makes use of both environmental objectives [2, 6, 22, 34]. This paper runs counter to that trend, rewards and a function that defines the utility derived as we will argue that the utility-based paradigm which has been bytheuser from thoserewards. Inthis paperweextend this paradigm widely adopted in MORL [5, 13, 21], has both relevance and benefits to the context of single-objective reinforcement learning(RL), to SORL. We present a general framework for utility-based RL and outline multiple potential benefits including the ability to perform (UBRL), which unifies the SORL and MORL frameworks, and discuss multi-policy learning across tasks relating to uncertain objectives, benefits and potential applications of this for single-objective risk-aware RL, discounting, and safe RL. We also examine problems - in particular focusing on the novel potential UBRL offers the algorithmic implications of adopting a utility-based approach.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2402.02665

Country:

Oceania > New Zealand (0.15)
Europe > Sweden (0.14)
Europe > Ireland (0.14)
Europe > Belgium (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

An Empirical Investigation of Value-Based Multi-objective Reinforcement Learning for Stochastic Environments

Ding, Kewen, Vamplew, Peter, Foale, Cameron, Dazeley, Richard

arXiv.org Artificial IntelligenceJan-6-2024

One common approach to solve multi-objective reinforcement learning (MORL) problems is to extend conventional Q-learning by using vector Q-values in combination with a utility function. However issues can arise with this approach in the context of stochastic environments, particularly when optimising for the Scalarised Expected Reward (SER) criterion. This paper extends prior research, providing a detailed examination of the factors influencing the frequency with which value-based MORL Q-learning algorithms learn the SER-optimal policy for an environment with stochastic state transitions. We empirically examine several variations of the core multi-objective Q-learning algorithm as well as reward engineering approaches, and demonstrate the limitations of these methods. In particular, we highlight the critical impact of the noisy Q-value estimates issue on the stability and convergence of these algorithms.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2401.03163

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Industry:

Media > Television (0.67)
Leisure & Entertainment (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Intent-aligned AI systems deplete human agency: the need for agency foundations research in AI safety

Mitelut, Catalin, Smith, Ben, Vamplew, Peter

arXiv.org Artificial IntelligenceMay-30-2023

The rapid advancement of artificial intelligence (AI) systems suggests that artificial general intelligence (AGI) systems may soon arrive. Many researchers are concerned that AIs and AGIs will harm humans via intentional misuse (AI-misuse) or through accidents (AI-accidents). In respect of AI-accidents, there is an increasing effort focused on developing algorithms and paradigms that ensure AI systems are aligned to what humans intend, e.g. AI systems that yield actions or recommendations that humans might judge as consistent with their intentions and goals. Here we argue that alignment to human intent is insufficient for safe AI systems and that preservation of long-term agency of humans may be a more robust standard, and one that needs to be separated explicitly and a priori during optimization. We argue that AI systems can reshape human intention and discuss the lack of biological and psychological mechanisms that protect humans from loss of agency. We provide the first formal definition of agency-preserving AI-human interactions which focuses on forward-looking agency evaluations and argue that AI systems - not humans - must be increasingly tasked with making these evaluations. We show how agency loss can occur in simple environments containing embedded agents that use temporal-difference learning to make action recommendations. Finally, we propose a new area of research called "agency foundations" and pose four initial topics designed to improve our understanding of agency in AI-human interactions: benevolent game theory, algorithmic foundations of human rights, mechanistic interpretability of agency representation in neural-networks and reinforcement learning from internal states.

ai system, machine learning, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

2305.19223

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.45)

Industry:

Information Technology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Government (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Add feedback

Scalar reward is not enough: A response to Silver, Singh, Precup and Sutton (2021)

Vamplew, Peter, Smith, Benjamin J., Kallstrom, Johan, Ramos, Gabriel, Radulescu, Roxana, Roijers, Diederik M., Hayes, Conor F., Heintz, Fredrik, Mannion, Patrick, Libin, Pieter J. K., Dazeley, Richard, Foale, Cameron

arXiv.org Artificial IntelligenceNov-24-2021

Specifically they present the reward-is-enough hypothesis that "Intelligence, and its associated abilities, can be understood as subserving the maximisation of reward by an agent acting in its environment", and argue in favour of reward maximisation as a pathway to the creation of artificial general intelligence (AGI). While others have criticised this hypothesis and the subsequent claims [44,54,60,64], here we make the argument that Silver et al. have erred in focusing on the maximisation of scalar rewards. The ability to consider multiple conflicting objectives is a critical aspect of both natural and artificial intelligence, and one which will not necessarily arise or be adequately addressed by maximising a scalar reward. In addition, even if the maximisation of a scalar reward is sufficient to support the emergence of AGI, we contend that this approach is undesirable as it greatly increases the likelihood of adverse outcomes resulting from the deployment of that AGI. Therefore, we advocate that a more appropriate model of intelligence should explicitly consider multiple objectives via the use of vector-valued rewards. Our paper starts by confirming that the reward-is-enough hypothesis is indeed referring specifically to scalar rather than vector rewards (Section 2). In Section 3 we then consider limitations of scalar rewards compared to vector rewards, and review the list of intelligent abilities proposed by Silver et al. to determine which of these exhibit multi-objective characteristics. Section 4 identifies multi-objective aspects of natural intelligence (animal and human). Section 5 considers the possibility of vector rewards being internally derived by an agent in response to a global scalar reward.

artificial intelligence, machine learning, objective, (17 more...)

arXiv.org Artificial Intelligence

2112.15422

Country:

Europe (1.00)
North America > United States > Oregon > Lane County > Eugene (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback

Explainable Reinforcement Learning for Broad-XAI: A Conceptual Framework and Survey

Dazeley, Richard, Vamplew, Peter, Cruz, Francisco

arXiv.org Artificial IntelligenceAug-20-2021

Broad Explainable Artificial Intelligence moves away from interpreting individual decisions based on a single datum and aims to provide integrated explanations from multiple machine learning algorithms into a coherent explanation of an agent's behaviour that is aligned to the communication needs of the explainee. Reinforcement Learning (RL) methods, we propose, provide a potential backbone for the cognitive model required for the development of Broad-XAI. RL represents a suite of approaches that have had increasing success in solving a range of sequential decision-making problems. However, these algorithms all operate as black-box problem solvers, where they obfuscate their decision-making policy through a complex array of values and functions. EXplainable RL (XRL) is relatively recent field of research that aims to develop techniques to extract concepts from the agent's: perception of the environment; intrinsic/extrinsic motivations/beliefs; Q-values, goals and objectives. This paper aims to introduce a conceptual framework, called the Causal XRL Framework (CXF), that unifies the current XRL research and uses RL as a backbone to the development of Broad-XAI. Additionally, we recognise that RL methods have the ability to incorporate a range of technologies to allow agents to adapt to their environment. CXF is designed for the incorporation of many standard RL extensions and integrated with external ontologies and communication facilities so that the agent can answer questions that explain outcomes and justify its decisions.

deep learning, explanation, neural network, (16 more...)

arXiv.org Artificial Intelligence

2108.09003

Country: North America > United States > California (0.27)

Genre:

Overview (1.00)
Research Report > New Finding (0.45)

Industry:

Transportation (1.00)
Leisure & Entertainment > Games (0.93)
Information Technology (0.93)
Government > Regional Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.45)

Add feedback

Levels of explainable artificial intelligence for human-aligned conversational explanations

Dazeley, Richard, Vamplew, Peter, Foale, Cameron, Young, Charlotte, Aryal, Sunil, Cruz, Francisco

arXiv.org Artificial IntelligenceJul-7-2021

Over the last few years there has been rapid research growth into eXplainable Artificial Intelligence (XAI) and the closely aligned Interpretable Machine Learning (IML). Drivers for this growth include recent legislative changes and increased investments by industry and governments, along with increased concern from the general public. People are affected by autonomous decisions every day and the public need to understand the decision-making process to accept the outcomes. However, the vast majority of the applications of XAI/IML are focused on providing low-level `narrow' explanations of how an individual decision was reached based on a particular datum. While important, these explanations rarely provide insights into an agent's: beliefs and motivations; hypotheses of other (human, animal or AI) agents' intentions; interpretation of external cultural expectations; or, processes used to generate its own explanation. Yet all of these factors, we propose, are essential to providing the explanatory depth that people require to accept and trust the AI's decision-making. This paper aims to define levels of explanation and describe how they can be integrated to create a human-aligned conversational explanation system. In so doing, this paper will survey current approaches and discuss the integration of different technologies to achieve these levels with Broad eXplainable Artificial Intelligence (Broad-XAI), and thereby move towards high-level `strong' explanations.

deep learning, explanation, neural network, (22 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.artint.2021.103525

2107.03178

Country:

Europe > United Kingdom > England (0.14)
North America > United States > California (0.14)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Leisure & Entertainment > Games (1.00)
Automobiles & Trucks (0.68)
Information Technology > Security & Privacy (0.67)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)

Add feedback