Agents
Allocation of Multi-Robot Tasks with Task Variants
Task allocation has been a well studied problem. In most prior problem formulations, it is assumed that each task is associated with a unique set of resource requirements. In the scope of multi-robot task allocation problem, these requirements can be satisfied by a coalition of robots. In this paper, we introduce a more general formulation of multi-robot task allocation problem that allows more than one option for specifying the set of task requirements--satisfying any one of the options will satisfy the task. We referred to this new problem as the multi-robot task allocation problem with task variants. First, we theoretically show that this extension fortunately does not impact the complexity class, which is still NP-complete. For solution methods, we adapt two previous greedy methods for the task allocation problem without task variants to solve this new problem and analyze their effectiveness. In particular, we "flatten" the new problem to the problem without task variants, modify the previous methods to solve the flattened problem, and prove that the bounds still hold. Finally, we thoroughly evaluate these two methods along with a random baseline to demonstrate their efficacy for the new problem.
Contestable Black Boxes
Tubella, Andrea Aler, Theodorou, Andreas, Dignum, Virginia, Michael, Loizos
The right to contest a decision with consequences on individuals or the society is a well-established democratic right. Despite this right also being explicitly included in GDPR in reference to automated decision-making, its study seems to have received much less attention in the AI literature compared, for example, to the right for explanation. This paper investigates the type of assurances that are needed in the contesting process when algorithmic black boxes are involved, opening new questions about the interplay of contestability and explainability. We argue that specialised complementary methodologies to evaluate automated decision-making in the case of a particular decision being contested need to be developed. Further, we propose a combination of well-established software engineering and rule-based approaches as a possible socio-technical solution to the issue of contestability, one of the new democratic challenges posed by the automation of decision making.
Enforcing Almost-Sure Reachability in POMDPs
Junges, Sebastian, Jansen, Nils, Seshia, Sanjit A.
Partially-Observable Markov Decision Processes (POMDPs) are a well-known formal model for planning scenarios where agents operate under limited information about their environment. In safety-critical domains, the agent must adhere to a policy satisfying certain behavioral constraints. We study the problem of synthesizing policies that almost-surely reach some goal state while a set of bad states is never visited. In particular, we present an iterative symbolic approach that computes a winning region, that is, a set of system configurations such that all policies that stay within this set are guaranteed to satisfy the constraints. The approach generalizes and improves previous work in terms of scalability and efficacy, as demonstrated in the empirical evaluation. Additionally, we show the applicability to safe exploration by restricting agent behavior to these winning regions.
Reinforcement Learning based Control of Imitative Policies for Near-Accident Driving
Cao, Zhangjie, Bฤฑyฤฑk, Erdem, Wang, Woodrow Z., Raventos, Allan, Gaidon, Adrien, Rosman, Guy, Sadigh, Dorsa
Autonomous driving has achieved significant progress in recent years, but autonomous cars are still unable to tackle high-risk situations where a potential accident is likely. In such near-accident scenarios, even a minor change in the vehicle's actions may result in drastically different consequences. To avoid unsafe actions in near-accident scenarios, we need to fully explore the environment. However, reinforcement learning (RL) and imitation learning (IL), two widely-used policy learning methods, cannot model rapid phase transitions and are not scalable to fully cover all the states. To address driving in near-accident scenarios, we propose a hierarchical reinforcement and imitation learning (H-ReIL) approach that consists of low-level policies learned by IL for discrete driving modes, and a high-level policy learned by RL that switches between different driving modes. Our approach exploits the advantages of both IL and RL by integrating them into a unified learning framework. Experimental results and user studies suggest our approach can achieve higher efficiency and safety compared to other methods. Analyses of the policies demonstrate our high-level policy appropriately switches between different low-level policies in near-accident driving situations.
R2-B2: Recursive Reasoning-Based Bayesian Optimization for No-Regret Learning in Games
Dai, Zhongxiang, Chen, Yizhou, Low, Kian Hsiang, Jaillet, Patrick, Ho, Teck-Hua
This paper presents a recursive reasoning formalism of Bayesian optimization (BO) to model the reasoning process in the interactions between boundedly rational, self-interested agents with unknown, complex, and costly-to-evaluate payoff functions in repeated games, which we call Recursive Reasoning-Based BO (R2-B2). Our R2-B2 algorithm is general in that it does not constrain the relationship among the payoff functions of different agents and can thus be applied to various types of games such as constant-sum, general-sum, and common-payoff games. We prove that by reasoning at level 2 or more and at one level higher than the other agents, our R2-B2 agent can achieve faster asymptotic convergence to no regret than that without utilizing recursive reasoning. We also propose a computationally cheaper variant of R2-B2 called R2-B2-Lite at the expense of a weaker convergence guarantee. The performance and generality of our R2-B2 algorithm are empirically demonstrated using synthetic games, adversarial machine learning, and multi-agent reinforcement learning.
Developing cooperative policies for multi-stage tasks
Erskine, Jordan, Lehnert, Chris
This paper proposes the Cooperative Soft Actor Critic (CSAC) method of enabling consecutive reinforcement learning agents to cooperatively solve a long time horizon multi-stage task. This method is achieved by modifying the policy of each agent to maximise both the current and next agent's critic. Cooperatively maximising each agent's critic allows each agent to take actions that are beneficial for its task as well as subsequent tasks. Using this method in a multi-room maze domain, the cooperative policies were able to outperform both uncooperative policies as well as a single agent trained across the entire domain. CSAC achieved a success rate of at least 20\% higher than the uncooperative policies, and converged on a solution at least 4 times faster than the single agent.
The Evolutionary Dynamics of Independent Learning Agents in Population Games
Hu, Shuyue, Leung, Chin-Wing, Leung, Ho-fung, Soh, Harold
Understanding the evolutionary dynamics of reinforcement learning under multi-agent settings has long remained an open problem. While previous works primarily focus on 2-player games, we consider population games, which model the strategic interactions of a large population comprising small and anonymous agents. This paper presents a formal relation between stochastic processes and the dynamics of independent learning agents who reason based on the reward signals. Using a master equation approach, we provide a novel unified framework for characterising population dynamics via a single partial differential equation (Theorem 1). Through a case study involving Cross learning agents, we illustrate that Theorem 1 allows us to identify qualitatively different evolutionary dynamics, to analyse steady states, and to gain insights into the expected behaviour of a population. In addition, we present extensive experimental results validating that Theorem 1 holds for a variety of learning methods and population games.
Human Trust-based Feedback Control: Dynamically varying automation transparency to optimize human-machine interactions
Akash, Kumar, McMahon, Griffon, Reid, Tahira, Jain, Neera
Human trust in automation plays an essential role in interactions between humans and automation. While a lack of trust can lead to a human's disuse of automation, over-trust can result in a human trusting a faulty autonomous system which could have negative consequences for the human. Therefore, human trust should be calibrated to optimize human-machine interactions with respect to context-specific performance objectives. In this article, we present a probabilistic framework to model and calibrate a human's trust and workload dynamics during his/her interaction with an intelligent decision-aid system. This calibration is achieved by varying the automation's transparency--the amount and utility of information provided to the human. The parameterization of the model is conducted using behavioral data collected through human-subject experiments, and three feedback control policies are experimentally validated and compared against a non-adaptive decision-aid system. The results show that human-automation team performance can be optimized when the transparency is dynamically updated based on the proposed control policy. This framework is a first step toward widespread design and implementation of real-time adaptive automation for use in human-machine interactions. Automation has become prevalent in the everyday lives of humans. However, despite significant technological advancements, human supervision and intervention are still necessary in almost all sectors of automation, ranging from manufacturing and transportation to disaster-management and healthcare [1]. Therefore, we expect that the future will be built around human-agent collectives [2] that will require efficient and successful interaction and coordination between humans and machines. It is well established that to achieve this coordination, human trust in automation plays a central role [3]-[5]. For example, the benefits of automation are lost when humans override automation due to a fundamental lack of trust [3], [5], and accidents may occur due to human mistrust in such systems [6]. Therefore, trust should be appropriately calibrated to avoid disuse or misuse of automation [4].
Using Reinforcement Learning to Herd a Robotic Swarm to a Target Distribution
Kakish, Zahi M., Elamvazhuthi, Karthik, Berman, Spring
In this paper, we present a reinforcement learning approach to designing a control policy for a "leader'' agent that herds a swarm of "follower'' agents, via repulsive interactions, as quickly as possible to a target probability distribution over a strongly connected graph. The leader control policy is a function of the swarm distribution, which evolves over time according to a mean-field model in the form of an ordinary difference equation. The dependence of the policy on agent populations at each graph vertex, rather than on individual agent activity, simplifies the observations required by the leader and enables the control strategy to scale with the number of agents. Two Temporal-Difference learning algorithms, SARSA and Q-Learning, are used to generate the leader control policy based on the follower agent distribution and the leader's location on the graph. A simulation environment corresponding to a grid graph with 4 vertices was used to train and validate the control policies for follower agent populations ranging from 10 to 100. Finally, the control policies trained on 100 simulated agents were used to successfully redistribute a physical swarm of 10 small robots to a target distribution among 4 spatial regions.
Modeling and Uncertainty Analysis of Groundwater Level Using Six Evolutionary Optimization Algorithms Hybridized with ANFIS, SVM, and ANN
Seifi, Akram, Ehteram, Mohammad, Singh, Vijay P., Mosavi, Amir
In the present study, six meta-heuristic schemes are hybridized with artificial neural network (ANN), adaptive neuro-fuzzy interface system (ANFIS), and support vector machine (SVM), to predict monthly groundwater level (GWL), evaluate uncertainty analysis of predictions and spatial variation analysis. The six schemes, including grasshopper optimization algorithm (GOA), cat swarm optimization (CSO), weed algorithm (WA), genetic algorithm (GA), krill algorithm (KA), and particle swarm optimization (PSO), were used to hybridize for improving the performance of ANN, SVM, and ANFIS models. Groundwater level (GWL) data of Ardebil plain (Iran) for a period of 144 months were selected to evaluate the hybrid models. The pre-processing technique of principal component analysis (PCA) was applied to reduce input combinations from monthly time series up to 12-month prediction intervals. The results showed that the ANFIS-GOA was superior to the other hybrid models for predicting GWL in the first piezometer and third piezometer in the testing stage. The performance of hybrid models with optimization algorithms was far better than that of classical ANN, ANFIS, and SVM models without hybridization. The percent of improvements in the ANFIS-GOA versus standalone ANFIS in piezometer 10 were 14.4%, 3%, 17.8%, and 181% for RMSE, MAE, NSE, and PBIAS in the training stage and 40.7%, 55%, 25%, and 132% in testing stage, respectively. The improvements for piezometer 6 in train step were 15%, 4%, 13%, and 208% and in the test step were 33%, 44.6%, 16.3%, and 173%, respectively, that clearly confirm the superiority of developed hybridization schemes in GWL modeling. Uncertainty analysis showed that ANFIS-GOA and SVM had, respectively, the best and worst performances among other models. In general, GOA enhanced the accuracy of the ANFIS, ANN, and SVM models.