Goto

Collaborating Authors

 Country


Lifelong Credit Assignment with the Success-Story Algorithm

AAAI Conferences

Consider an embedded agent with a self-modifying, Turing-equivalent policy that can change only through active self-modifications. How can we make sure that it learns to continually accelerate reward intake? Throughout its life the agent remains ready to undo any self-modification generated during any earlier point of its life, provided the reward per time since then has not increased, thus enforcing a lifelong success-story of self-modifications, each followed by long-term reward acceleration up to the present time. The stack-based method for enforcing this is called the success-story algorithm. It fully takes into account that early self-modifications set the stage for later ones (learning a learning algorithm), and automatically learns to extend self-evaluations until the collected reward statistics are reliable ... a very simple but general method waiting to be re-discovered! Time permitting, I will also briefly discuss more recent mathematically optimal universal maximizers of lifelong reward, in particular, the fully self-referential Goedel machine.


A Formal Systems Approach to Machine Capture, Representation and Use of Activity Context

AAAI Conferences

Britain's trains are not noted for their AAAI Activity Context Representation Workshop. The punctuality and they are deemed on-time within a window first paper, 'Defining and Representing Activity Context of ten or so minutes, so just using the train timetable to for Systems Analysis', summarizes the author's formal predict bad spots is not feasible. Over a number of journeys, Simplified Set Theory (SST) approach and the use of his the user attempts to find journey landmarks that precede PentaVenn diagram. This second paper uses these in a the bad spots by a few minutes ("a few" being less modest, partially worked example to explore the contexts than the predicted time for file transfer). Some landmarks of an activity and how a formal approach can aid systems might be easy to identify, e.g.


Helping Intelligence Analysts Make Connections

AAAI Conferences

Discovering latent connections between seemingly unconnected documents and constructing "stories" from scattered pieces of evidence are staple tasks in intelligence analysis. We have worked with government intelligence analysts to understand the strategies they use to make connections. Beyond techniques like clustering that aim to provide an initial broad summary of large document collections, an important goal of analysts in this domain is to assimilate and synthesize fine grained information from a smaller set of foraged documents. Further, analysts' domain expertise is crucial because it provides rich contextual background for making connections and thus the goal of KDD is to augment human discovery capabilities, not supplant it. We describe a visual analytics system we have built - Analyst's Workspace (AW) - that integrates browsing tools with a storytelling algorithm in a large screen display environment. AW helps analysts systematically construct stories of desired fidelity from document collections and helps marshall evidence as longer stories are constructed.


Modeling Bounded Rationality of Agents During Interactions

AAAI Conferences

Frequently, it is advantageous for an agent to model other agents in order to predict their behavior during an interaction. Modeling others as rational has a long tradition in AI and game theory, but modeling other agents’ departures from rationality is difficult and controversial. This paper proposes that bounded rationality be modeled as errors the agent being modeled is making while deciding on its action. We are motivated by the work on quantal response equilibria in behavioral game theory which uses Nash equilibria as the solution concept. In contrast, we use decision-theoretic maximization of expected utility. Quantal response assumes that a decision maker is rational, i.e., is maximizing his expected utility, but only approximately so, with an error rate characterized by a single error parameter. Another agent’s error rate may be unknown and needs to be estimated during an interaction. We show that the error rate of the quantal response can be estimated using Bayesian update of a suitable conjugate prior, and that it has a finitely dimensional sufficient statistic under strong simplifying assumptions. However, if the simplifying assumptions are relaxed, the quantal response does not admit a finite sufficient statistic and a more complex update is needed. This confirms the difficulty of using simple models of bounded rationality in general settings.


Strategy Purification

AAAI Conferences

There has been significant recent interest in computing effective practical strategies for playing large games. Most prior work involves computing an approximate equilibrium strategy in a smaller abstract game, then playing this strategy in the full game. In this paper, we present a modification of this approach that works by constructing a deterministic strategy in the full game from the solution to the abstract game; we refer to this procedure as purification. We show that purification, and its generalization which we call thresholding, lead to significantly stronger play than the standard approach in a wide variety of experimental domains. First, we show that purification improves performance in random 4x4 matrix games using random 3x3 abstractions. We observe that whether or not purification helps in this setting depends crucially on the support of the equilibrium in the full game, and we precisely specify the supports for which purification helps. Next we consider a simplifed version of poker called Leduc Hold'em; again we show that purification leads to a significant performance improvement over the standard approach, and furthermore that whenever thresholding improves a strategy, the biggest improvement is often achieved using full purification. Finally, we consider actual strategies that used our algorithms in the 2010 AAAI Computer Poker Competition. One of our programs, which uses purification, won the two-player no-limit Texas Hold'em bankroll division. Furthermore, experiments in two-player limit Texas Hold'em show that these performance gains do not necessarily come at the expense of worst-case exploitability and that our algorithms can actually produce strategies with lower exploitabilities than the standard approach.


Speech Acts of Argumentation: Inference Anchors and Peripheral Cues in Dialogue

AAAI Conferences

It is well known that argumentation can usefully be analysed as a distinct, if complex, type of speech act. Speech acts that form a part of argumentative discourse, and in particular, of argumentative dialogue, can be seen as anchors for the establishment of inferences between propositions in the domain of discourse. Most often, the speech acts that directly give rise to inference are implicit, but can be drawn out in analysis by consideration of the type of dialogue game being played. AI approaches to argumentation often focus solely on such inferences as the means by which persuasion can be effected – but this is in contrast with psychological and rhetorical models which have long recognised the role played by extra-logical features of the dialogical context. These ‘peripheral’ cues can not only affect persuasive effect of the logical, ‘central’ argumentation, but can override and dominate it. This paper presents a theory which allows both central and peripheral aspects of argumentation to be represented in a coherent analytical account based on the sequences of speech acts which constitute dialogues.


Ethical Implications of Using the Paro Robot, with a Focus on Dementia Patient Care

AAAI Conferences

This paper examines the ability of the Paro robot to improve the lives of elderly dementia patients by applying modern technology to medicine. Paro is not intended to be a replacement for social interaction with people or animals. Some patients who know Paro is a robot still enjoy using the robotic seal, and it can calm patients who are otherwise unreachable. Robots like Paro which mimic the behaviors of pets offer excellent opportunities to connect with challenging patients; however they raise concerns regarding patient rights and autonomy. While such concerns are worthy of consideration, which we discuss in this paper, we nonetheless conclude that the benefits of using such a treatment tool outweigh its potential risks.


FAQ-Learning in Matrix Games: Demonstrating Convergence Near Nash Equilibria, and Bifurcation of Attractors in the Battle of Sexes

AAAI Conferences

This article studies Frequency Adjusted Q-learning (FAQ-learning), a variation of Q-learning that simulates simultaneous value function updates. The main contributions are empirical and theoretical support for the convergence of FAQ-learning to attractors near Nash equilibria in two-agent two-action matrix games.The games can be divided into three types: Matching pennies, Prisoners' Dilemma and Battle of Sexes. This article shows that the Matching pennies and Prisoners' Dilemma yield one attractor of the learning dynamics, while the Battle of Sexes exhibits a supercritical pitchfork bifurcation at a critical temperature, where one attractor splits into two attractors and one repellent fixed point. Experiments illustrate that the distance between fixed points of the FAQ-learning dynamics and Nash equilibria tends to zero as the exploration parameter of FAQ-learning approaches zero.


Task Behavior and Interaction Planning for a Mobile Service Robot that Occasionally Requires Help

AAAI Conferences

In our work, a robot can proactively ask for help when necessary, based on its awareness of its sensing and actuation limitations. Approaches in which humans provide help to robots do not necessarily reason about the human availability and accuracy. Instead, we model the availability of humans in the robot's environment and present a planning approach that uses such model to generate the robot navigational plans. In particular, we contribute two separate planners that allow a robot to distinguish actions that it cannot complete autonomously from ones that it can. In the first planner, the robot plans autonomous actions when possible and requests help to complete actions that it could not otherwise complete. Then for actions that it can perform autonomously, we use a POMDP policy that incorporates the human availability model to plan actions that reduce uncertainty or that increase the likelihood of the robot finding an available human to help it reduce its uncertainty. We have shown in prior work that asking people in the environment for help during tasks can reduce task completion time and increase the robot's ability to perform tasks.


CARe: An Ontology for Representing Context of Activity-Aware Healthcare Environments

AAAI Conferences

Representing computational activities is still an open problem in the field of Activity-Aware Computing. In this paper, drawn from our experiences in developing activity-aware applications in support of two populations: nurses working in hospitals and elders living independently; we defined the Context Aware Representational (CARe) model. CARe is an ontology that enables the representation and management of computational activities. We illustrate, through application scenarios, that the CARe ontology is flexible enough to enable developers to c