Aha, David W.


Learning Planning Operators from Episodic Traces

AAAI Conferences

Learning is an important aspect of human intelligence. People learn from various aspects of their experience over time. We present an episodic infrastructure for learning in the context of a cognitive architecture, \icarus/. After a review of this architecture, we formally define the architectural extensions for episodic capabilities. We then demonstrate the extended system's capability to learn planning operators using the episodic traces from two Minecraft-like scenarios.


Human-Agent Teaming as a Common Problem for Goal Reasoning

AAAI Conferences

Human-agent teaming is a difficult yet relevant problem domain to which many goal reasoning systems are well suited, due to their ability to accept outside direction and (relatively) human-understandable internal state. We propose a formal model, and multiple variations on a multi-agent problem, to clarify and unify research in goal reasoning. We describe examples of these concepts, and propose standard evaluation methods for goal reasoning agents that act as a member of a team or on behalf of a supervisor.


Comparing Reward Shaping, Visual Hints, and Curriculum Learning

AAAI Conferences

Common approaches to learn complex tasks in reinforcement learning include reward shaping, environmental hints, or a curriculum. Yet few studies examine how they compare to each other, when one might prefer one approach, or how they may complement each other. As a first step in this direction, we compare reward shaping, hints, and curricula for a Deep RL agent in the game of Minecraft. We seek to answer whether reward shaping, visual hints, or the curricula have the most impact on performance, which we measure as the time to reach the target, the distance from the target, the cumulative reward, or the number of actions taken. Our analyses show that performance is most impacted by the curriculum used and visual hints; shaping had less impact. For similar navigation tasks, the results suggest that designing an effective curriculum and providing appropriate hints most improve the performance. Common approaches to learn complex tasks in reinforcement learning include reward shaping, environmental hints, or a curriculum, yet few studies examine how they compare to each other. We compare these approaches for a Deep RL agent in the game of Minecraft and show performance is most impacted by the curriculum used and visual hints; shaping had less impact. For similar navigation tasks, this suggests that designing an effective curriculum with hints most improve the performance.


A New Approach to Temporal Planning with Rich Metric Temporal Properties

AAAI Conferences

Temporal logics have been used in autonomous planning to represent and reason about temporal planning problems. However, such techniques have typically been restricted to either (1) representing actions, events, and goals with temporal properties or (2) planning for temporally-extended goals under restrictive assumptions. We introduce Mixed Propositional Metric Temporal Logic (MPMTL) where formulae are built over mixed binary and continuous real variables. We introduce a planner, MTP, that solves MPMTL problems and includes a SAT-solver, model checker for a polynomial fragment of MPMTL, and a forward search algorithm. We extend PDDL 2.1 with MPMTL syntax to create MPDDL and an associated parser. The empirical study shows that MTP outperforms the state-of-the-art PDDL+ planner SMTPlan+ on several domains it performed best on and MTP performs and scales on problem size well for challenging domains with rich temporal properties we create.


Using Deep Learning to Automate Feature Modeling in Learning by Observation

AAAI Conferences

Learning by observation allows non-technical experts to transfer their skills to an agent by shifting the knowledge-transfer task to the agent. However, for the agent to learn regardless of expert, domain, or observed behavior, it must learn in a general-purpose manner. Existing learning by observation agents allow for domain-independent learning and reasoning but require human intervention to model the agent’s inputs and outputs. We describe Domain-Independent Deep Feature Learning by Observation (DIDFLO), an agent that uses convolutional neural networks to learn without explicitly defining input features. DIDFLO uses the raw visual inputs at two levels of granularity to automatically learn input features using limited training data. We evaluate DIDFLO in scenarios drawn from a simulated soccer domain and provide a comparison to other learning by observation agents in this domain.


Incorporating Domain-Independent Planning Heuristics in Hierarchical Planning

AAAI Conferences

Heuristics serve as a powerful tool in modern domain-independent planning (DIP) systems by providing critical guidance during the search for high-quality solutions. However, they have not been broadly used with hierarchical planning techniques, which are more expressive and tend to scale better in complex domains by exploiting additional domain-specific knowledge. Complicating matters, we show that for Hierarchical Goal Network (HGN) planning, a goal-based hierarchical planning formalism that we focus on in this paper, any poly-time heuristic that is derived from a delete-relaxation DIP heuristic has to make some relaxation of the hierarchical semantics. To address this, we present a principled framework for incorporating DIP heuristics into HGN planning using a simple relaxation of the HGN semantics we call Hierarchy-Relaxation. This framework allows for computing heuristic estimates of HGN problems using any DIP heuristic in an admissibility-preserving manner. We demonstrate the feasibility of this approach by using the LMCut heuristic to guide an optimal HGN planner. Our empirical results with three benchmark domains demonstrate that simultaneously leveraging hierarchical knowledge and heuristic guidance substantially improves planning performance.


The AI Rebellion: Changing the Narrative

AAAI Conferences

Sci-fi narratives permeating the collective consciousness endow AI Rebellion with ample negative connotations. However, for AI agents, as for humans, attitudes of protest, objection, and rejection have many potential benefits in support of ethics, safety, self-actualization, solidarity, and social justice, and are necessary in a wide variety of contexts. We launch a conversation on constructive AI rebellion and describe a framework meant to support discussion, implementation, and deployment of AI Rebel Agents as protagonists of positive narratives.


Social Attitudes of AI Rebellion: A Framework

AAAI Conferences

Human attitudes of objection, protest, and rebellion have undeniable potential to bring about social benefits, from social justice to healthy balance in relationships. At times, they can even be argued to be ethically obligatory. Conversely, AI rebellion is largely seen as a dangerous, destructive prospect. With the increase of interest in collaborative human/AI environments in which synthetic agents play social roles or, at least, exhibit behavior with social and ethical implications, we believe that AI rebellion could have benefits similar to those of its counterpart in humans. We introduce a framework meant to help categorize and design Rebel Agents, discuss their social and ethical implications, and assess their potential benefits and the risks they may pose. We also present AI rebellion scenarios in two considerably different contexts (military unmanned vehicles and computational social creativity) that exemplify components of the framework.


Dynamic Goal Recognition Using Windowed Action Sequences

AAAI Conferences

For robots to work with humans as a team, they need to be aware of what their teammates are doing. Since it is unrealistic to expect humans to constantly communicate their goals and intentions, it is crucial for the robots to accurately and autonomously recognize their teammates’ goals. Furthermore, as these human–robot teams may perform a variety of missions in dynamically changing contexts, the teammates’ goals may change suddenly without warning. Historically, goal recognition systems have not directly addressed this is- sue, but have predominantly focused on situations with static goals. This paper presents a windowing strategy that enables goal recognition systems to detect goal changes in a fast and accurate manner. We describe this novel approach, and show its benefits through an empirical study spanning three different domains.


ActorSim, A Toolkit for Studying Cross-Disciplinary Challenges in Autonomy

AAAI Conferences

We introduce ActorSim, the Actor Simulator, a toolkit for studying situated autonomy. As background, we review three goal-reasoning projects implemented in ActorSim: one project that uses information metrics in foreign disaster relief and two projects that learn subgoal selection for sequential decision making in Minecraft. We then discuss how ActorSim can be used to address cross-disciplinary gaps in several ongoing projects. To varying degrees, the projects integrate concerns within distinct specializations of AI and between AI and other more human-focused disciplines. These areas include automated planning, learning, cognitive architectures, robotics, cognitive modeling, sociology, and psychology.