Goto

Collaborating Authors

 Agents


Avoiding Negative Side Effects due to Incomplete Knowledge of AI Systems

arXiv.org Artificial Intelligence

Autonomous agents acting in the real-world often operate based on models that ignore certain aspects of the environment. The incompleteness of any given model---handcrafted or machine acquired---is inevitable due to practical limitations of any modeling technique for complex real-world settings. Due to the limited fidelity of its model, an agent's actions may have unexpected, undesirable consequences during execution. Learning to recognize and avoid such negative side effects of the agent's actions is critical to improving the safety and reliability of autonomous systems. This emerging research topic is attracting increased attention due to the increased deployment of AI systems and their broad societal impacts. This article provides a comprehensive overview of different forms of negative side effects and the recent research efforts to address them. We identify key characteristics of negative side effects, highlight the challenges in avoiding negative side effects, and discuss recently developed approaches, contrasting their benefits and limitations. We conclude with a discussion of open questions and suggestions for future research directions.


Process Querying, Manipulation, and Intelligence 2020 โ€“ Process Querying

#artificialintelligence

The Fifth International Workshop on Process Querying, Manipulation, and Intelligence (PQMI 2020) aims to provide a high-quality forum for researchers and practitioners to exchange research findings and ideas on methods and practices in the corresponding areas. Process Querying combines concepts from Big Data and Process Modeling and Analysis with Business Process Intelligence and Process Analytics to study techniques for retrieving and manipulating models of processes, both observed and recorded in the real-world and envisioned and designed in conceptual models, to systematically organize and extract process-related information for subsequent use. Process Manipulation studies inferences from real-world observations for augmenting, enhancing, and redesigning models of processes with the ultimate goal of improving real-world business processes. Process Intelligence looks into application of the representation models and approaches in Artificial Intelligence (AI), like knowledge representation, search, automated planning, reasoning, natural language processing, autonomous agents, and multi-agent systems, among others, for solving problems in process mining, that is automated process discovery, conformance checking, and process enhancement, and vice versa using process mining techniques to tackle problems in AI. Techniques, methods, and tools for process querying, manipulation, and intelligence have applications in Business Process Management and Process Mining.


Research suggests Soldiers, AI are trusting one another

#artificialintelligence

Army researchers recently completed a simulation study where crew members and artificial intelligent agents demonstrated trust and cohesion while working together.U.S. Army Combat Capabilities Development Command's Army Research Laboratory researchers and U.S. Army Military Academy at West Point cadets conducted the study as part of an academic capstone project. It also supports the Army Wingman Joint Capabilities Technology Demonstration and the Army's Next Generation Combat Vehicle mission prioritization."The "Subjective, behavioral, performance, communication and physiological data were collected to identify possible team trust and team cohesion metrics."Researchers "The cadets used the Wingman simulation testbed, which allows a human crew to interact with the actual robotic vehicle autonomy on a realistic gunnery task. They collected informed consent, briefed the participants, and collected questionnaire data, along with timing the event."The "The cadets filled the roles of mobility and lethality operator with me as vehicle commander.


Artificial intelligence, Autonomy, and Human-Machine Teams -- Interdependence, Context, and Explainable AI

Interactive AI Magazine

Because in military situations, as well as for self-driving cars, information must be processed faster than humans can achieve, determination of context computationally, also known as situational assessment, is increasingly important. In this article, we introduce the topic of context, and we discuss what is known about the heretofore intractable research problem on the effects of interdependence, present in the best of human teams; we close by proposing that interdependence must be mastered mathematically to operate human-machine teams efficiently, to advance theory, and to make the machine actions directed by AI explainable to team members and society. The special topic articles in this issue and a subsequent issue of AI Magazine review ongoing mature research and operational programs that address context for human-machine teams. In 1983, William Lawless blew the whistle on Department of Energy (DOE) mismanagement of military radioactive wastes. After his PhD, he joined DOE's citizen advisory board at its Savannah River Site where he coauthored over 100 recommendations on its cleanup.


A principled analysis of Behavior Trees and their generalisations

arXiv.org Artificial Intelligence

As complex autonomous robotic systems become more widespread, the goals of transparent and reusable Artificial Intelligence (AI) become more important. In this paper we analyse how the principles behind Behavior Trees (BTs), an increasingly popular tree-structured control architecture, are applicable to these goals. Using structured programming as a guide, we analyse the BT principles of reactiveness and modularity in a formal framework of action selection. Proceeding from these principles, we review a number of challenging use-cases of BTs in the literature, and show that reasoning via these principles leads to compatible solutions. Extending these arguments, we introduce a new class of control architectures we call generalised BTs or $k$-BTs and show how they can extend the applicability of BTs to some of the aforementioned challenging BT use-cases while preserving the BT principles. We compare BTs to a number of other control architectures within this framework, and show which forms of decision-making can and cannot be equivalently represented by BTs. This allows us to construct a hierarchy of architectures and to show how BTs fit into such a hierarchy.


Dynamic Models Applied to Value Learning in Artificial Intelligence

arXiv.org Artificial Intelligence

Experts in Artificial Intelligence (AI) development predict that advances in the development of intelligent systems and agents will reshape vital areas in our society. Nevertheless, if such an advance is not made prudently and critically-reflexively, it can result in negative outcomes for humanity. For this reason, several researchers in the area are trying to develop a robust, beneficial, and safe concept of AI for the preservation of humanity and the environment. Currently, several of the open problems in the field of AI research arise from the difficulty of avoiding unwanted behaviors of intelligent agents and systems, and at the same time specifying what we want such systems to do, especially when we look for the possibility of intelligent agents acting in several domains over the long term. It is of utmost importance that artificial intelligent agents have their values aligned with human values, given the fact that we cannot expect an AI to develop human moral values simply because of its intelligence, as discussed in the Orthogonality Thesis. Perhaps this difficulty comes from the way we are addressing the problem of expressing objectives, values, and ends, using representational cognitive methods. A solution to this problem would be the dynamic approach proposed by Dreyfus, whose phenomenological philosophy shows that the human experience of being-in-the-world in several aspects is not well represented by the symbolic or connectionist cognitive method, especially in regards to the question of learning values. A possible approach to this problem would be to use theoretical models such as SED (situated embodied dynamics) to address the values learning problem in AI.


The Advantage Regret-Matching Actor-Critic

arXiv.org Artificial Intelligence

Regret minimization has played a key role in online learning, equilibrium computation in games, and reinforcement learning (RL). In this paper, we describe a general model-free RL method for no-regret learning based on repeated reconsideration of past behavior. We propose a model-free RL algorithm, the AdvantageRegret-Matching Actor-Critic (ARMAC): rather than saving past state-action data, ARMAC saves a buffer of past policies, replaying through them to reconstruct hindsight assessments of past behavior. These retrospective value estimates are used to predict conditional advantages which, combined with regret matching, produces a new policy. In particular, ARMAC learns from sampled trajectories in a centralized training setting, without requiring the application of importance sampling commonly used in Monte Carlo counterfactual regret (CFR) minimization; hence, it does not suffer from excessive variance in large environments. In the single-agent setting, ARMAC shows an interesting form of exploration by keeping past policies intact. In the multiagent setting, ARMAC in self-play approaches Nash equilibria on some partially-observable zero-sum benchmarks. We provide exploitability estimates in the significantly larger game of betting-abstracted no-limit Texas Hold'em.


Learning to Play No-Press Diplomacy with Best Response Policy Iteration

arXiv.org Artificial Intelligence

Recent advances in deep reinforcement learning (RL) have led to considerable progress in many 2-player zero-sum games, such as Go, Poker and Starcraft. The purely adversarial nature of such games allows for conceptually simple and principled application of RL methods. However real-world settings are many-agent, and agent interactions are complex mixtures of common-interest and competitive aspects. We consider Diplomacy, a 7-player board game designed to accentuate dilemmas resulting from many-agent interactions. It also features a large combinatorial action space and simultaneous moves, which are challenging for RL algorithms. We propose a simple yet effective approximate best response operator, designed to handle large combinatorial action spaces and simultaneous moves. We also introduce a family of policy iteration methods that approximate fictitious play. With these methods, we successfully apply RL to Diplomacy: we show that our agents convincingly outperform the previous state-of-the-art, and game theoretic equilibrium analysis shows that the new process yields consistent improvements.


A Two-Stage Metaheuristic Algorithm for the Dynamic Vehicle Routing Problem in Industry 4.0 approach

arXiv.org Artificial Intelligence

Industry 4.0 is a concept that assists companies in developing a modern supply chain (MSC) system when they are faced with a dynamic process. Because Industry 4.0 focuses on mobility and real-time integration, it is a good framework for a dynamic vehicle routing problem (DVRP). This research works on DVRP. The aim of this research is to minimize transportation cost without exceeding the capacity constraint of each vehicle while serving customer demands from a common depot. Meanwhile, new orders arrive at a specific time into the system while the vehicles are executing the delivery of existing orders. This paper presents a two-stage hybrid algorithm for solving the DVRP. In the first stage, construction algorithms are applied to develop the initial route. In the second stage, improvement algorithms are applied. Experimental results were designed for different sizes of problems. Analysis results show the effectiveness of the proposed algorithm.


Reputation-driven Decision-making in Networks of Stochastic Agents

arXiv.org Artificial Intelligence

This paper studies multi-agent systems that involve networks of self-interested agents. We propose a Markov Decision Process-derived framework, called RepNet-MDP, tailored to domains in which agent reputation is a key driver of the interactions between agents. The fundamentals are based on the principles of RepNet-POMDP, a framework developed by Rens et al. [11] in 2018, but addresses its mathematical inconsistencies and alleviates its intractability by only considering fully observable environments. We furthermore use an online learning algorithm for finding approximate solutions to RepNet-MDPs. In a series of experiments, RepNet agents are shown to be able to adapt their own behavior to the past behavior and reliability of the remaining agents of the network. Finally, our work identifies a limitation of the framework in its current formulation that prevents its agents from learning in circumstances in which they are not a primary actor.