Skill Transfer in Deep Reinforcement Learning under Morphological Heterogeneity

arXiv.org Machine Learning

Transfer learning methods for reinforcement learning (RL) domains facilitate the acquisition of new skills using previously acquired knowledge. The vast majority of existing approaches assume that the agents have the same design, e.g. same shape and action spaces. In this paper we address the problem of transferring previously acquired skills amongst morphologically different agents (MDAs). For instance, assuming that a bipedal agent has been trained to move forward, could this skill be transferred on to a one-leg hopper so as to make its training process for the same task more sample efficient? We frame this problem as one of subspace learning whereby we aim to infer latent factors representing the control mechanism that is common between MDAs. We propose a novel paired variational encoder-decoder model, PVED, that disentangles the control of MDAs into shared and agent-specific factors. The shared factors are then leveraged for skill transfer using RL. Theoretically, we derive a theorem indicating how the performance of PVED depends on the shared factors and agent morphologies. Experimentally, PVED has been extensively validated on four MuJoCo environments. We demonstrate its performance compared to a state-of-the-art approach and several ablation cases, visualize and interpret the hidden factors, and identify avenues for future improvements.


APNewsBreak: Undercover Agents Target Cybersecurity Watchdog

U.S. News

The Associated Press has found that researchers who reported the role of Israeli spyware in the targeting of Washington Post journalist Jamal Khashoggi's inner circle are in turn being targeted by international undercover operatives.


Awesome: Hitman's next elusive assassination target is Gary "Wildcard" Busey

PCWorld

We haven't been covering Hitman's Elusive Targets on a week-by-week basis. I figure if you're playing Hitman you already know the deal: Every week or so, Square tasks you with tracking down and killing a random NPC on a random map. You have one chance to succeed, and if you screw it up then the mission disappears. The schedule's been somewhat unpredictable, so other outlets have taken to writing "The next Elusive Target is live!" stories. But what is our thing is "Really Stupid Gimmick Ideas," so it is with some small amount of pleasure I write this story about how the next Elusive Target is Gary Busey.


Vadim Bulitko and Nathan Sturtevant

AAAI Conferences

The pursuit of moving targets in real-time environments such as computer games and robotics presents several challenges to situated agents. A priori unknown state spaces and the need to interleave acting and planning limits the applicability of traditional search, learning, and adversarial game-tree search methods. In this paper we build on the previous idea of hierarchical state abstraction, showing how it can be effectively applied to each of the problems in moving target pursuit. First, we develop new algorithms for both chasing agents and target agents that automatically build and refine a hierarchical state abstraction. We then demonstrate the improvements in performance that the state abstraction affords when applied to incremental A* search, learning real-time heuristic search, and minimax adversarial search. Finally, we propose and evaluate a systematic and efficient method of using single-agent techniques in adversarial domains. This leads to a diverse family of new moving target pursuit algorithms which draw on recent advances in single-agent search. In a preliminary empirical evaluation, we demonstrate effects of state abstraction on search and learning in the agents.


Florida man accused in plot to bomb Target stores to buy cheap stock

Los Angeles Times

An Ocala man made at least 10 explosive devices in hopes of blowing up Targets along the East Coast in an elaborate and deadly scheme to buy cheap stocks of the company, according to the U.S. Department of Justice. Mark Charles Barnett, 48, was charged with possession of a destructive device affecting commerce by a previously convicted felon. He faces a maximum of 10 years in federal prison. Barnett concocted a plan to place explosives disguised as food items in Target stores along the East Coast from Florida to New York, thinking the plot would cause stock prices for the retail-store giant to plummet and he could buy cheap shares of the company before they rebounded, a federal complaint alleges. He paid a man $10,000 to place the bombs on shelves, the complaint said.