AITopics | Harutyunyan, Anna

Collaborating Authors

Harutyunyan, Anna

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Agency Is Frame-Dependent

Abel, David, Barreto, André, Bowling, Michael, Dabney, Will, Dong, Shi, Hansen, Steven, Harutyunyan, Anna, Khetarpal, Khimya, Lyle, Clare, Pascanu, Razvan, Piliouras, Georgios, Precup, Doina, Richens, Jonathan, Rowland, Mark, Schaul, Tom, Singh, Satinder

arXiv.org Artificial IntelligenceFeb-6-2025

Agency is a system's capacity to steer outcomes toward a goal, and is a central topic of study across biology, philosophy, cognitive science, and artificial intelligence. Determining if a system exhibits agency is a notoriously difficult question: Dennett (1989), for instance, highlights the puzzle of determining which principles can decide whether a rock, a thermostat, or a robot each possess agency. We here address this puzzle from the viewpoint of reinforcement learning by arguing that agency is fundamentally frame-dependent: Any measurement of a system's agency must be made relative to a reference frame. We support this claim by presenting a philosophical argument that each of the essential properties of agency proposed by Barandiaran et al. (2009) and Moreno (2018) are themselves frame-dependent. We conclude that any basic science of agency requires frame-dependence, and discuss the implications of this claim for reinforcement learning.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2502.04403

Country:

North America > United States (0.14)
North America > Canada > Alberta (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)

Add feedback

Three Dogmas of Reinforcement Learning

Abel, David, Ho, Mark K., Harutyunyan, Anna

arXiv.org Artificial IntelligenceJul-15-2024

Modern reinforcement learning has been conditioned by at least three dogmas. The first is the environment spotlight, which refers to our tendency to focus on modeling environments rather than agents. The second is our treatment of learning as finding the solution to a task, rather than adaptation. The third is the reward hypothesis, which states that all goals and purposes can be well thought of as maximization of a reward signal. These three dogmas shape much of what we think of as the science of reinforcement learning. While each of the dogmas have played an important role in developing the field, it is time we bring them to the surface and reflect on whether they belong as basic ingredients of our scientific paradigm. In order to realize the potential of reinforcement learning as a canonical frame for researching intelligent agents, we suggest that it is time we shed dogmas one and two entirely, and embrace a nuanced approach to the third.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

arXiv.org Artificial Intelligence

2407.10583

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Texas (0.14)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Bootstrapped Representations in Reinforcement Learning

Lan, Charline Le, Tu, Stephen, Rowland, Mark, Harutyunyan, Anna, Agarwal, Rishabh, Bellemare, Marc G., Dabney, Will

arXiv.org Artificial IntelligenceJun-16-2023

In reinforcement learning (RL), state representations are key to dealing with large or continuous state spaces. While one of the promises of deep learning algorithms is to automatically construct features well-tuned for the task they try to solve, such a representation might not emerge from end-to-end training of deep RL agents. To mitigate this issue, auxiliary objectives are often incorporated into the learning process and help shape the learnt state representation. Bootstrapping methods are today's method of choice to make these additional predictions. Yet, it is unclear which features these algorithms capture and how they relate to those from other auxiliary-task-based approaches. In this paper, we address this gap and provide a theoretical characterization of the state representation learnt by temporal difference learning (Sutton, 1988). Surprisingly, we find that this representation differs from the features learned by Monte Carlo and residual gradient algorithms for most transition structures of the environment in the policy evaluation setting. We describe the efficacy of these representations for policy evaluation, and use our theoretical analysis to design new auxiliary learning rules. We complement our theoretical results with an empirical comparison of these learning rules for different cumulant functions on classic domains such as the four-room domain (Sutton et al, 1999) and Mountain Car (Moore, 1990).

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2306.10171

Country:

North America > United States > New Jersey (0.14)
North America > United States > Hawaii (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm

Tang, Yunhao, Kozuno, Tadashi, Rowland, Mark, Harutyunyan, Anna, Munos, Rémi, Pires, Bernardo Ávila, Valko, Michal

arXiv.org Artificial IntelligenceMay-29-2023

Multi-step learning applies lookahead over multiple time steps and has proved valuable in policy evaluation settings. However, in the optimal control case, the impact of multi-step learning has been relatively limited despite a number of prior efforts. Fundamentally, this might be because multi-step policy improvements require operations that cannot be approximated by stochastic samples, hence hindering the widespread adoption of such methods in practice. To address such limitations, we introduce doubly multi-step off-policy VI (DoMo-VI), a novel oracle algorithm that combines multi-step policy improvements and policy evaluations. DoMo-VI enjoys guaranteed convergence speed-up to the optimal policy and is applicable in general off-policy learning settings. We then propose doubly multi-step off-policy actor-critic (DoMo-AC), a practical instantiation of the DoMo-VI algorithm. DoMo-AC introduces a bias-variance trade-off that ensures improved policy gradient estimates. When combined with the IMPALA architecture, DoMo-AC has showed improvements over the baseline algorithm on Atari-57 game benchmarks.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

2305.18501

Country: North America > United States > Hawaii (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

An Analysis of Quantile Temporal-Difference Learning

Rowland, Mark, Munos, Rémi, Azar, Mohammad Gheshlaghi, Tang, Yunhao, Ostrovski, Georg, Harutyunyan, Anna, Tuyls, Karl, Bellemare, Marc G., Dabney, Will

arXiv.org Artificial IntelligenceMay-25-2023

We analyse quantile temporal-difference learning (QTD), a distributional reinforcement learning algorithm that has proven to be a key component in several successful large-scale applications of reinforcement learning. Despite these empirical successes, a theoretical understanding of QTD has proven elusive until now. Unlike classical TD learning, which can be analysed with standard stochastic approximation tools, QTD updates do not approximate contraction mappings, are highly non-linear, and may have multiple fixed points. The core result of this paper is a proof of convergence to the fixed points of a related family of dynamic programming procedures with probability 1, putting QTD on firm theoretical footing. The proof establishes connections between QTD and non-linear differential inclusions through stochastic approximation theory and non-smooth analysis.

differential inclusion, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

2301.04462

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

On the Expressivity of Markov Reward

Abel, David, Dabney, Will, Harutyunyan, Anna, Ho, Mark K., Littman, Michael L., Precup, Doina, Singh, Satinder

arXiv.org Artificial IntelligenceNov-1-2021

Reward is the driving force for reinforcement-learning agents. This paper is dedicated to understanding the expressivity of reward as a way to capture tasks that we would want an agent to perform. We frame this study around three new abstract notions of "task" that might be desirable: (1) a set of acceptable behaviors, (2) a partial ordering over behaviors, or (3) a partial ordering over trajectories. Our main results prove that while reward can express many of these tasks, there exist instances of each task type that no Markov reward function can capture. We then provide a set of polynomial-time algorithms that construct a Markov reward function that allows an agent to optimize tasks of each of these three types, and correctly determine when no such reward function exists. We conclude with an empirical study that corroborates and illustrates our theoretical findings.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2111.00876

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)

Add feedback

Useful Policy Invariant Shaping from Arbitrary Advice

Behboudian, Paniz, Satsangi, Yash, Taylor, Matthew E., Harutyunyan, Anna, Bowling, Michael

arXiv.org Artificial IntelligenceNov-2-2020

Reinforcement learning is a powerful learning paradigm in which agents can learn to maximize sparse and delayed reward signals. Although RL has had many impressive successes in complex domains, learning can take hours, days, or even years of training data. A major challenge of contemporary RL research is to discover how to learn with less data. Previous work has shown that domain information can be successfully used to shape the reward; by adding additional reward information, the agent can learn with much less data. Furthermore, if the reward is constructed from a potential function, the optimal policy is guaranteed to be unaltered. While such potential-based reward shaping (PBRS) holds promise, it is limited by the need for a well-defined potential function. Ideally, we would like to be able to take arbitrary advice from a human or other agent and improve performance without affecting the optimal policy. The recently introduced dynamic potential based advice (DPBA) method tackles this challenge by admitting arbitrary advice from a human or other agent and improves performance without affecting the optimal policy. The main contribution of this paper is to expose, theoretically and empirically, a flaw in DPBA. Alternatively, to achieve the ideal goals, we present a simple method called policy invariant explicit shaping (PIES) and show theoretically and empirically that PIES succeeds where DPBA fails.

agent, artificial intelligence, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2011.01297

Country: North America > Canada > Alberta (0.15)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Conditional Importance Sampling for Off-Policy Learning

Rowland, Mark, Harutyunyan, Anna, van Hasselt, Hado, Borsa, Diana, Schaul, Tom, Munos, Rémi, Dabney, Will

arXiv.org Machine LearningOct-16-2019

The principal contribution of this paper is a conceptual framework for off-policy reinforcement learning, based on conditional expectations of importance sampling ratios. This framework yields new perspectives and understanding of existing off-policy algorithms, and reveals a broad space of unexplored algorithms. We theoretically analyse this space, and concretely investigate several algorithms that arise from this framework.

artificial intelligence, estimator, reinforcement learning, (18 more...)

arXiv.org Machine Learning

1910.07479

Country: North America (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

The Termination Critic

Harutyunyan, Anna, Dabney, Will, Borsa, Diana, Heess, Nicolas, Munos, Remi, Precup, Doina

arXiv.org Artificial IntelligenceFeb-26-2019

In this work, we consider the problem of autonomously discovering behavioral abstractions, or options, for reinforcement learning agents. We propose an algorithm that focuses on the termination condition, as opposed to -- as is common -- the policy. The termination condition is usually trained to optimize a control objective: an option ought to terminate if another has better value. We offer a different, information-theoretic perspective, and propose that terminations should focus instead on the compressibility of the option's encoding -- arguably a key reason for using abstractions. To achieve this algorithmically, we leverage the classical options framework, and learn the option transition model as a "critic" for the termination condition. Using this model, we derive gradients that optimize the desired criteria. We show that the resulting options are non-trivial, intuitively meaningful, and useful for learning and planning.

artificial intelligence, neural network, objective, (19 more...)

arXiv.org Artificial Intelligence

1902.09996

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

Reinforcement Learning in POMDPs With Memoryless Options and Option-Observation Initiation Sets

Steckelmacher, Denis (Vrije Universiteit Brussels) | Roijers, Diederik M. (Vrije Universiteit Brussels) | Harutyunyan, Anna (Vrije Universiteit Brussels) | Vrancx, Peter (PROWLER.io) | Plisnier, Hélène (Vrije Universiteit Brussels) | Nowé, Ann (Vrije Universiteit Brussels)

AAAI ConferencesFeb-8-2018

Many real-world reinforcement learning problems have a hierarchical nature, and often exhibit some degree of partial observability. While hierarchy and partial observability are usually tackled separately (for instance by combining recurrent neural networks and options), we show that addressing both problems simultaneously is simpler and more efficient in many cases. More specifically, we make the initiation set of options conditional on the previously-executed option, and show that options with such Option-Observation Initiation Sets (OOIs) are at least as expressive as Finite State Controllers (FSCs), a state-of-the-art approach for learning in POMDPs. OOIs are easy to design based on an intuitive description of the task, lead to explainable policies and keep the top-level and option policies memoryless. Our experiments show that OOIs allow agents to learn optimal policies in challenging POMDPs, while being much more sample-efficient than a recurrent neural network over options.

deep learning, neural network, oois, (21 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country:

Europe (0.68)
North America > United States (0.49)

Genre: Research Report (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback