precup
Off-PolicyEvaluationforAction-Dependent Non-StationaryEnvironments
Methods for sequential decision making are often built upon a foundational assumption that the underlying decision process is stationary [Sutton and Barto, 2018]. While this assumption was a cornerstone when laying the theoretical foundations of the field, and while is often reasonable, it isseldom trueinpractice andcanbeunreasonable [Dulac-Arnold etal.,2019].
0f3d014eead934bbdbacb62a01dc4831-Supplemental.pdf
Inreinforcement learning, option models (Sutton, Precup & Singh, 1999; Precup, 2000) provide the framework for this kind of temporally abstract prediction and reasoning. Natural intelligent agents are also able to focus their attention on courses of action that are relevant or feasible in agiven situation, sometimes termed affordable actions.
La veille de la cybersécurité
At RE•WORK, we are strong advocates for supporting women working towards advancing technology, so ahead of the upcoming Toronto AI Summit, on November 9-10, we set out to highlight inspirational women who are working at the forefront of AI developments, and who deserve recognition for their achievements. While we set out to create a list of just 20 – we couldn't narrow it down, as there are so many inspiring and prominent females in this space! Hear from many of them at our Toronto AI Summit, and more at our Women in AI Reception, both being held in Toronto next month. Help us to continue highlighting leading women in AI by nominating your influential woman for our next edition. RE•WORK holds Women in AI events, podcasts, and blogs.
Attention Option-Critic
Chunduru, Raviteja, Precup, Doina
Temporal abstraction in reinforcement learning is the ability of an agent to learn and use high-level behaviors, called options. The option-critic architecture provides a gradient-based end-to-end learning method to construct options. We propose an attention-based extension to this framework, which enables the agent to learn to focus different options on different aspects of the observation space. We show that this leads to behaviorally diverse options which are also capable of state abstraction, and prevents the degeneracy problems of option domination and frequent option switching that occur in option-critic, while achieving a similar sample complexity. We also demonstrate the more efficient, interpretable, and reusable nature of the learned options in comparison with option-critic, through different transfer learning tasks. Experimental results in a relatively simple four-rooms environment and the more complex ALE (Arcade Learning Environment) showcase the efficacy of our approach.
Metrics and continuity in reinforcement learning
Lan, Charline Le, Bellemare, Marc G., Castro, Pablo Samuel
In most practical applications of reinforcement learning, it is untenable to maintain direct estimates for individual states; in continuous-state systems, it is impossible. Instead, researchers often leverage state similarity (whether explicitly or implicitly) to build models that can generalize well from a limited set of samples. The notion of state similarity used, and the neighbourhoods and topologies they induce, is thus of crucial importance, as it will directly affect the performance of the algorithms. Indeed, a number of recent works introduce algorithms assuming the existence of "well-behaved" neighbourhoods, but leave the full specification of such topologies for future work. In this paper we introduce a unified formalism for defining these topologies through the lens of metrics. We establish a hierarchy amongst these metrics and demonstrate their theoretical implications on the Markov Decision Process specifying the reinforcement learning problem. We complement our theoretical results with empirical evaluations showcasing the differences between the metrics considered.
Interpretable Reinforcement Learning Inspired by Piaget's Theory of Cognitive Development
Hakimzadeh, Aref, Xue, Yanbo, Setoodeh, Peyman
Endeavors for designing robots with human-level cognitive abilities have led to different categories of learning machines. According to Skinner's theory, reinforcement learning (RL) plays a key role in human intuition and cognition. Majority of the state-of-the-art methods including deep RL algorithms are strongly influenced by the connectionist viewpoint. Such algorithms can significantly benefit from theories of mind and learning in other disciplines. This paper entertains the idea that theories such as language of thought hypothesis (LOTH), script theory, and Piaget's cognitive development theory provide complementary approaches, which will enrich the RL field. Following this line of thinking, a general computational building block is proposed for Piaget's schema theory that supports the notions of productivity, systematicity, and inferential coherence as described by Fodor in contrast with the connectionism theory. Abstraction in the proposed method is completely upon the system itself and is not externally constrained by any predefined architecture. The whole process matches the Neisser's perceptual cycle model. Performed experiments on three typical control problems followed by behavioral analysis confirm the interpretability of the proposed method and its competitiveness compared to the state-of-the-art algorithms. Hence, the proposed framework can be viewed as a step towards achieving human-like cognition in artificial intelligent systems.