Goto

Collaborating Authors

 Agents


Toward Ensuring Ethical Behavior from Autonomous Systems: A Case-Supported Principle-Based Paradigm

AAAI Conferences

A paradigm of case-supported principle-based behavior (CPB) is proposed to help ensure ethical behavior of autonomous machines. We argue that ethically significant behavior of autonomous systems should be guided by explicit ethical principles determined through a consensus of ethicists. Such a consensus is likely to emerge in many areas in which autonomous systems are apt to be deployed and for the actions they are liable to undertake, as we are more likely to agree on how machines ought to treat us than on how human beings ought to treat one another. Given such a consensus, particular cases of ethical dilemmas where ethicists agree on the ethically relevant features and the right course of action can be used to help discover principles needed for ethical guidance of the behavior of autonomous systems. Such principles help ensure the ethical behavior of complex and dynamic systems and further serve as a basis for justification of their actions as well as a control abstraction for managing unanticipated behavior. The requirements, methods, implementation, and evaluation components of the CPB paradigm are detailed.


Automatic Parameterization of Automation Software for Plug-and-Produce

AAAI Conferences

Cyber-Physical Production Systems’ (CPPSs) main feature is adaptability, i.e. they can adapt quickly to new production goals such as new products or product variants. Today, the bottleneck of such approaches is the automation system, which still requires high manual engineering efforts for every adaptation step. Many recent solutions for a more adaptable automation software have focused on the automatic orchestration of software systems: for a new product and production configuration, a software solutions is created by putting together reusable software components. But such solutions come with a price: reusable software components must be, by definition, applicable to wide range of configurations. For this, software components come with free parameters that must be set according to the current configuration. Typically, the main problem is not the orchestration of software components but their correct parameterization. This paper presents, to the best of our knowledge for the first time, a solution to the parameterization problem of adaptable, CPPS-enable software systems. Due to the nature of CPPSs, no direct computation of parameters is possible. Instead, an iteration-based approach using a model of both the plant and the automation system is needed. An example from process industry illustrates the ideas.


Effect of Bundle Method in Distributed Lagrangian Relaxation Protocol

AAAI Conferences

The Generalized Mutual Assignment Problem (GMAP) is a maximization problem in distributed environments, where multiple agents select goods under resource constraints. Distributed Lagrangian Relaxation Protocols (DisLRP) are peer-to-peer communication protocols for solving GMAP instances. In DisLRPs, agents seek a good quality upper bound on the optimal value by solving the Lagrangian dual problem, which is a convex minimization problem. Existing DisLRPs exploit a subgradient method to explore a better upper bound by updating the Lagrange multipliers (prices) of goods. While the computational complexity of the subgradient method is very low, it cannot detect tha fact that an upper bound converges to the minimum. Moreover, solution oscillation sometimes occurs, which is critical for its performance. In this paper, we present a new DisLRP with a Bundle Method and refer to it as Bundle DisLRP (BDisLRP). The bundle method, which is also called the stabilized cutting planes method, has recently attracted much attention as a way to solve Lagrangian dual problems in centralized environments. We show that this method can also work in distributed environments. We experimentally compared BDisLRP with Adaptive DisLRP (ADisLRP), which is a previous protocol that exploits the subgradient method, to demonstrate that BDisLRP converged faster with better quality upper bounds than ADisLRP.


Nonparametric Bayesian Learning of Other Agents' Policies in Interactive POMDPs

AAAI Conferences

We consider an autonomous agent facing a partially observable, stochastic, multiagent environment where the unknown policies of other agents are represented as finite state controllers (FSCs). We show how an agent can (i) learn the FSCs of the other agents, and (ii) exploit these models during interactions. To separate the issues of off-line versus on-line learning we consider here an off-line two-phase approach. During the first phase the agent observes as the other player(s) are interacting with the environment (the observations may be imperfect and the learning agent is not taking part in the interaction.) The collected data is used to learn an ensemble of FSCs that explain the behavior of the other agent(s) using a Bayesian non-parametric (BNP) approach. We verify the quality of the learned models during the second phase by allowing the agent to compute its own optimal policy and interact with the observed agent. The optimal policy for the learning agent is obtained by solving an interactive POMDP in which the states are augmented by the other agent(s)' possible FSCs. The advantage of using the Bayesian nonparametric approach in the first phase is that the complexity (number of nodes) of the learned controllers is not bounded a priori. Our two-phase approach is preliminary and separates the learning using BNP from the complexities of learning on-line while the other agent may be modifying its policy (on-line approach is subject of our future work.) We describe our implementation and results in a multiagent Tiger domain. Our results show that learning improves the agent's performance, which increases with the amount of data collected during the learning phase.


Agent Partitioning with Reward/Utility-Based Impact

AAAI Conferences

Reinforcement learning with reward shaping is a well established but often computationally expensive approach to large multiagent systems. Agent partitioning can reduce this computational complexity by treating each partition of agents as an independent problem. We introduce a novel agent partitioning approach called Reward/Utility-Based Impact (RUBI). RUBI finds an effective partitioning of agents while requiring no prior domain knowledge, improves performance by discovering a non-trivial agent partitioning, and leads to faster simulations. We test RUBI in the Air Traffic Flow Management Problem (ATFMP), where there are tens of thousands of aircraft affecting the system and no obvious similarity metric between agents. When partitioning with RUBI in the ATFMP, there is a 37% increase in performance, with a 510x speed increase over non-partitioning approaches. Additionally, RUBI matches the performance of the current domain-dependent ATFMP gold standard using no prior knowledge and with 10% faster performance.


E-HBA: Using Action Policies for Expert Advice and Agent Typification

AAAI Conferences

Past research has studied two approaches to utilise pre-defined policy sets in repeated interactions: as experts, to dictate our own actions, and as types, to characterise the behaviour of other agents. In this work, we bring these complementary views together in the form of a novel meta-algorithm, called Expert-HBA (E-HBA), which can be applied to any expert algorithm that considers the average (or total) payoff an expert has yielded in the past. E-HBA gradually mixes the past payoff with a predicted future payoff, which is computed using the type-based characterisation. We present results from a comprehensive set of repeated matrix games, comparing the performance of several well-known expert algorithms with and without the aid of E-HBA. Our results show that E-HBA has the potential to significantly improve the performance of expert algorithms.


Every Team Makes Mistakes: An Initial Report on Predicting Failure in Teamwork

AAAI Conferences

Voting among different agents is a powerful tool in problem solving, and it has been widely applied to improve the performance in machine learning. However, the potential of voting has been explored only in improving the ability of finding the correct answer to a complex problem. In this paper we present a novel benefit in voting, that has not been observed before: we show that we can use the voting patterns to assess the performance of a team and predict their final outcome. This prediction can be executed at any moment during problem-solving and it is completely domain independent. We present a preliminary theoretical explanation of why our prediction method works, where we show that the accuracy is better for diverse teams composed by different agents than for uniform teams made of copies of the same agent. We also perform experiments in the Computer Go domain, where we show that we can obtain a high accuracy in predicting the final outcome of the games. We analyze the prediction accuracy for 3 different teams, and we show that the prediction works significantly better for a diverse team. Since our approach is completely domain independent, it can be easily applied to a variety of domains, such as the video games in the Arcade Learning Environment.


An Accelerated Approach to Decentralized Reinforcement Learning of the Ball-Dribbling Behavior

AAAI Conferences

In the context of soccer robotics, ball dribbling is a complex behavior where a robot player attempts to maneuver the ball in a very controlled way, while moving towards a desired target. To learn when and how to modify the robot’s velocity vector is a complex problem, hardly solvable in an effective way with methods based on identification of the system dynamics and/or kinematics and mathematical models. We propose a decentralized reinforcement learning strategy, where each component of the omnidirectional biped walk (𝑣𝑥,𝑣𝑦,𝑣𝜃) is learned in parallel with single-agents working in a multiagent task. Moreover, we propose an approach to accelerate the decentralized learning based on knowledge transfer from simple linear controllers. Obtained results are successful; with less human effort, and less required designer knowledge, the decentralized reinforcement learning scheme shows better performances than the current dribbling engine used by UChile Robotics Team in the SPL robot soccer competitions. The proposed decentralized rein- forcement learning scheme achieves asymptotic performance after 1500 episodes and can be accelerated up to 70% by using our approach to share actions.


A New Perspective of Trust Through Multi-Attribute Auctions

AAAI Conferences

Auction mechanisms are very well known methods to allocate tasks when several agents are involved. Particularly, multi-attribute auctions are a special mechanism that allows the consideration of task attributes other than prices, such as delivery time or energy consumptions. Incentive compatible mechanisms encourage agents to reveal the attributes which agents estimate truthful, however, these mechanisms by themselves cannot know if such estimations are reliable or not due to uncertainty. Under such circumstances, trust could complement incentive compatibility reducing the risk of losses by the auctioneer. The use of trust in auctions is a well-studied problem; however, most of the works in the literature focus on how to model trust rather on how trust is used in the mechanism. Thus, this paper proposes an easy and systematic way to include a multi-faceted model of trust into multi-attribute auctions. Conversely to other previous works where trust is only used in the winner determination problem, the presented approach uses trust both in deciding the winner of the auction and in the payment to the corresponding bidder. According to the results obtained from the experimentation, the use of trust following the methodology presented in this paper highly reduces the number of winner bids from unreliable bidders and, therefore, the number of tasks executed in worse conditions than the agreed. Complementary, this paper proposes a new trust adaptation method which consists of increasing or decreasing the trust value (depending on whether the task is executed properly or not) according to a simple mathematical function with asymptotes on 0 and 1. This model does not present the rigidity problem present in other models of the literature when it comes to agents that have inconstant performances.


A Trust Establishment Model in Multi-Agent Systems

AAAI Conferences

In open multi-agent systems, often, agents interact with each other to meet their objectives. Trust is, therefore, considered essential to make such interactions useful. However, trust is a complex, multifaceted concept and includes more than just evaluating other’s honesty. Many trust evaluation models have been proposed and implemented in different areas; most of them focused on creating algorithms for trusters to model the honesty of trustees in order to make effective decisions about which trustees to select. However, slight consideration is paid to trust establishment. This work describes a trust establishment model that goes beyond trust evaluation to outline actions to guide trustees (instead of trustors). The model uses a multicriteria method for measuring and analysing needs of trusters and evaluates the satisfaction level of trusters based on their values and expressed preferences. Using the feedback from trusters, trustees attempt to modify their behavior in order to achieve higher confidence levels as part of their plans to be selected as partners of other agents in the community for future interactions. Simulation results indicate that trustees can become more trusted if they adjust their behaviour based of satisfaction feedback from trusters.