Goto

Collaborating Authors

 Rosenbluth, David


Principal Trade-off Analysis

arXiv.org Artificial Intelligence

How are the advantage relations between a set of agents playing a game organized and how do they reflect the structure of the game? In this paper, we illustrate "Principal Trade-off Analysis" (PTA), a decomposition method that embeds games into a low-dimensional feature space. We argue that the embeddings are more revealing than previously demonstrated by developing an analogy to Principal Component Analysis (PCA). PTA represents an arbitrary two-player zero-sum game as the weighted sum of pairs of orthogonal 2D feature planes. We show that the feature planes represent unique strategic trade-offs and truncation of the sequence provides insightful model reduction. We demonstrate the validity of PTA on a quartet of games (Kuhn poker, RPS+2, Blotto, and Pokemon). In Kuhn poker, PTA clearly identifies the trade-off between bluffing and calling. In Blotto, PTA identifies game symmetries, and specifies strategic trade-offs associated with distinct win conditions. These symmetries reveal limitations of PTA unaddressed in previous work. For Pokemon, PTA recovers clusters that naturally correspond to Pokemon types, correctly identifies the designed trade-off between those types, and discovers a rock-paper-scissor (RPS) cycle in the Pokemon generation type - all absent any specific information except game outcomes.


Soft Actor-Critic with Inhibitory Networks for Faster Retraining

arXiv.org Artificial Intelligence

Reusing previously trained models is critical in deep reinforcement learning to speed up training of new agents. However, it is unclear how to acquire new skills when objectives and constraints are in conflict with previously learned skills. Moreover, when retraining, there is an intrinsic conflict between exploiting what has already been learned and exploring new skills. In soft actor-critic (SAC) methods, a temperature parameter can be dynamically adjusted to weight the action entropy and balance the explore $\times$ exploit trade-off. However, controlling a single coefficient can be challenging within the context of retraining, even more so when goals are contradictory. In this work, inspired by neuroscience research, we propose a novel approach using inhibitory networks to allow separate and adaptive state value evaluations, as well as distinct automatic entropy tuning. Ultimately, our approach allows for controlling inhibition to handle conflict between exploiting less risky, acquired behaviors and exploring novel ones to overcome more challenging tasks. We validate our method through experiments in OpenAI Gym environments.


Ethics, Rules of Engagement, and AI: Neural Narrative Mapping Using Large Transformer Language Models

arXiv.org Artificial Intelligence

The problem of determining if a military unit has correctly understood an order and is properly executing on it is one that has bedeviled military planners throughout history. The advent of advanced language models such as OpenAI's GPT-series offers new possibilities for addressing this problem. This paper presents a mechanism to harness the narrative output of large language models and produce diagrams or "maps" of the relationships that are latent in the weights of such models as the GPT-3. The resulting "Neural Narrative Maps" (NNMs), are intended to provide insight into the organization of information, opinion, and belief in the model, which in turn provide means to understand intent and response in the context of physical distance. This paper discusses the problem of mapping information spaces in general, and then presents a concrete implementation of this concept in the context of OpenAI's GPT-3 language model for determining if a subordinate is following a commander's intent in a high-risk situation. The subordinate's locations within the NNM allow a novel capability to evaluate the intent of the subordinate with respect to the commander. We show that is is possible not only to determine if they are nearby in narrative space, but also how they are oriented, and what "trajectory" they are on. Our results show that our method is able to produce high-quality maps, and demonstrate new ways of evaluating intent more generally. N the 1979 motion picture Apocalypse Now, Captain Willard (played by Martin Sheen) is sent on a mission to assassinate Colonel Kurtz (played by Marlon Brando), a highly decorated officer who, in the words of the general authorizing the mission, has gone from "one of the most outstanding officers this country has ever produced" to someone "out there operating without any decent restraint, totally beyond the pale of any acceptable human conduct." The movie explores the paradoxes in war, where some illegal acts are embraced by the command structure, some tolerated, and some are to be terminated, "with extreme prejudice." Willard has to navigate these conflicts as he moves towards Kurtz' compound deep in Cambodia. Apocalypse Now provides an example of the difficulty that any intent-aware system must face in a military context [1]. Not only does the system need to determine if an order is being followed, it should also determine if the order itself is valid, so that the warriors implementing the order are not placed in ethical dilemmas. This is the goal that we attempt to address in this paper, with the concept of Neural Narrative Mapping (NNM). By placing narrative elements at coordinates in a virtual space, we can determine sophisticated relationships between concepts that go well beyond textual comparison.


Cognitive Adaptive Learning, Classification, and Response for Communications Threats (CALCR): A Case-Based Reasoning Approach

AAAI Conferences

The Cognitive Adaptive Learning Classification and Response for Communications Threats system, (CALCR) uses a case-based reasoning (CBR) and case-based learning (CBL) approach to address issues encountered in a contested RF communications environment. CALCR was the result of a research project that explored new approaches to understanding communications threats and responding with appropriate countermeasures. Modern communications threats may be modified from existing systems, or may be completely new systems, and CALCR enables a response to these unknown or unanticipated threats. CALCR integrates existing properties of CBR, along with several innovations, making it ideal for this problem: the ability for a case library to be extended through CBL as new conditions are encountered; the robustness of CBR in situations where there is missing data, which CALCR addresses with an advanced intelligent similarity measure; the ability to detect classes unknown to the case library through the use of a confidence measure; and the ability to provide a best-attempt solution, when multiple threat classes are matched, through the use of a new approach called the taxonomy reasoner.