Goto

Collaborating Authors

Accelerating Reinforcement Learning through Implicit Imitation

arXiv.org Artificial Intelligence

Imitation can be viewed as a means of enhancing learning in multiagent environments. It augments an agent's ability to learn useful behaviors by making intelligent use of the knowledge implicit in behaviors demonstrated by cooperative teachers or other more experienced agents. We propose and study a formal model of implicit imitation that can accelerate reinforcement learning dramatically in certain cases. Roughly, by observing a mentor, a reinforcement-learning agent can extract information about its own capabilities in, and the relative value of, unvisited parts of the state space. We study two specific instantiations of this model, one in which the learning agent and the mentor have identical abilities, and one designed to deal with agents and mentors with different action sets. We illustrate the benefits of implicit imitation by integrating it with prioritized sweeping, and demonstrating improved performance and convergence through observation of single and multiple mentors. Though we make some stringent assumptions regarding observability and possible interactions, we briefly comment on extensions of the model that relax these restricitions.


Accelerating Reinforcement Learning through Implicit Imitation

Journal of Artificial Intelligence Research

Imitation can be viewed as a means of enhancing learning in multiagent environments. It augments an agent's ability to learn useful behaviors by making intelligent use of the knowledge implicit in behaviors demonstrated by cooperative teachers or other more experienced agents. We propose and study a formal model of implicit imitation that can accelerate reinforcement learning dramatically in certain cases. Roughly, by observing a mentor, a reinforcement-learning agent can extract information about its own capabilities in, and the relative value of, unvisited parts of the state space. We study two specific instantiations of this model, one in which the learning agent and the mentor have identical abilities, and one designed to deal with agents and mentors with different action sets. We illustrate the benefits of implicit imitation by integrating it with prioritized sweeping, and demonstrating improved performance and convergence through observation of single and multiple mentors. Though we make some stringent assumptions regarding observability and possible interactions, we briefly comment on extensions of the model that relax these restricitions.


Accelerating Reinforcement Learning through Implicit Imitation

AAAI Conferences

Imitation can be viewed as a means of enhancing learning in multiagent environments. It augments an agent's ability to learn useful behaviors by making intelligent use of the knowledge implicit in behaviors demonstrated by cooperative teachers or other more experienced agents. We propose and study a formal model of implicit imitation that can accelerate reinforcement learning dramatically in certain cases. Roughly, by observing a mentor, a reinforcement-learning agent can extract information about its own capabilities in, and the relative value of, unvisited parts of the state space. We study two specific instantiations of this model, one in which the learning agent and the mentor have identical abilities, and one designed to deal with agents and mentors with different action sets. We illustrate the benefits of implicit imitation by integrating it with prioritized sweeping, and demonstrating improved performance and convergence through observation of single and multiple mentors. Though we make some stringent assumptions regarding observability and possible interactions, we briefly comment on extensions of the model that relax these restricitions.


Decentralized Supply Chain Formation: A Market Protocol and Competitive Equilibrium Analysis

Journal of Artificial Intelligence Research

Supply chain formation is the process of determining the structure and terms of exchange relationships to enable a multilevel, multiagent production activity. We present a simple model of supply chains, highlighting two characteristic features: hierarchical subtask decomposition, and resource contention. To decentralize the formation process, we introduce a market price system over the resources produced along the chain. In a competitive equilibrium for this system, agents choose locally optimal allocations with respect to prices, and outcomes are optimal overall. To determine prices, we define a market protocol based on distributed, progressive auctions, and myopic, non-strategic agent bidding policies. In the presence of resource contention, this protocol produces better solutions than the greedy protocols common in the artificial intelligence and multiagent systems literature. The protocol often converges to high-value supply chains, and when competitive equilibria exist, typically to approximate competitive equilibria. However, complementarities in agent production technologies can cause the protocol to wastefully allocate inputs to agents that do not produce their outputs. A subsequent decommitment phase recovers a significant fraction of the lost surplus.


Decentralized Supply Chain Formation: A Market Protocol and Competitive Equilibrium Analysis

arXiv.org Artificial Intelligence

Supply chain formation is the process of determining the structure and terms of exchange relationships to enable a multilevel, multiagent production activity. We present a simple model of supply chains, highlighting two characteristic features: hierarchical subtask decomposition, and resource contention. To decentralize the formation process, we introduce a market price system over the resources produced along the chain. In a competitive equilibrium for this system, agents choose locally optimal allocations with respect to prices, and outcomes are optimal overall. To determine prices, we define a market protocol based on distributed, progressive auctions, and myopic, non-strategic agent bidding policies. In the presence of resource contention, this protocol produces better solutions than the greedy protocols common in the artificial intelligence and multiagent systems literature. The protocol often converges to high-value supply chains, and when competitive equilibria exist, typically to approximate competitive equilibria. However, complementarities in agent production technologies can cause the protocol to wastefully allocate inputs to agents that do not produce their outputs. A subsequent decommitment phase recovers a significant fraction of the lost surplus.