AITopics

2108.05828

Country:

North America > Canada > Alberta (0.14)
North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.40)

arXiv.org Artificial IntelligenceJun-3-2021

MICo: Learning improved representations via sampling-based state similarity for Markov decision processes

Castro, Pablo Samuel, Kastner, Tyler, Panangaden, Prakash, Rowland, Mark

The success of reinforcement learning (RL) algorithms in large-scale, complex tasks depends on forming useful representations of the environment with which the algorithms interact. Feature selection and feature learning has long been an important subdomain of RL, and with the advent of deep reinforcement learning there has been much recent interest in understanding and improving the representations learnt by RL agents. Much of the work in representation learning has taken place from the perspective of auxiliary tasks [Jaderberg et al., 2017, Bellemare et al., 2017, Fedus et al., 2019]; in addition to the primary reinforcement learning task, the agent may attempt to predict and control additional aspects of the environment. Auxiliary tasks shape the agent's representation of the environment implicitly, typically via gradient descent on the additional learning objectives. As such, while auxiliary tasks continue to play an important role in improving the performance of deep RL algorithms, our understanding of the effects of auxiliary tasks on representations in RL is still in its infancy.

artificial intelligence, reinforcement learning, representation, (14 more...)

2106.08229

Country: North America (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.64)

arXiv.org Artificial IntelligenceFeb-2-2021

Metrics and continuity in reinforcement learning

Lan, Charline Le, Bellemare, Marc G., Castro, Pablo Samuel

In most practical applications of reinforcement learning, it is untenable to maintain direct estimates for individual states; in continuous-state systems, it is impossible. Instead, researchers often leverage state similarity (whether explicitly or implicitly) to build models that can generalize well from a limited set of samples. The notion of state similarity used, and the neighbourhoods and topologies they induce, is thus of crucial importance, as it will directly affect the performance of the algorithms. Indeed, a number of recent works introduce algorithms assuming the existence of "well-behaved" neighbourhoods, but leave the full specification of such topologies for future work. In this paper we introduce a unified formalism for defining these topologies through the lens of metrics. We establish a hierarchy amongst these metrics and demonstrate their theoretical implications on the Markov Decision Process specifying the reinforcement learning problem. We complement our theoretical results with empirical evaluations showcasing the differences between the metrics considered.

artificial intelligence, continuity, reinforcement learning, (19 more...)

2102.01514

Country:

North America > United States (0.28)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Machine LearningJan-13-2021

Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning

Agarwal, Rishabh, Machado, Marlos C., Castro, Pablo Samuel, Bellemare, Marc G.

Reinforcement learning methods trained on few environments rarely learn policies that generalize to unseen environments. To improve generalization, we incorporate the inherent sequential structure in reinforcement learning into the representation learning process. This approach is orthogonal to recent approaches, which rarely exploit this structure explicitly. Specifically, we introduce a theoretically motivated policy similarity metric (PSM) for measuring behavioral similarity between states. PSM assigns high similarity to states for which the optimal policies in those states as well as in future states are similar. We also present a contrastive representation learning procedure to embed any state similarity metric, which we instantiate with PSM to obtain policy similarity embeddings (PSEs). We demonstrate that PSEs improve generalization on diverse benchmarks, including LQR with spurious correlations, a jumping task from pixels, and Distracting DM Control Suite.

artificial intelligence, generalization, survey article, (19 more...)

2101.05265

Country: Europe > Austria > Vienna (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceNov-20-2020

Revisiting Rainbow: Promoting more insightful and inclusive deep reinforcement learning research

Obando-Ceron, Johan S., Castro, Pablo Samuel

Since the introduction of DQN, a vast majority of reinforcement learning research has focused on reinforcement learning with deep neural networks as function approximators. New methods are typically evaluated on a set of environments that have now become standard, such as Atari 2600 games. While these benchmarks help standardize evaluation, their computational cost has the unfortunate side effect of widening the gap between those with ample access to computational resources, and those without. In this work we argue that, despite the community's emphasis on large-scale environments, the traditional small-scale environments can still yield valuable scientific insights and can help reduce the barriers to entry for underprivileged communities. To substantiate our claims, we empirically revisit the paper which introduced the Rainbow algorithm [Hessel et al., 2018] and present some new insights into the algorithms used by Rainbow.

deep learning, default, neural network, (19 more...)

2011.14826

Country:

North America > United States (0.14)
North America > Canada (0.14)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

arXiv.org Artificial IntelligenceNov-6-2020

GANterpretations

Castro, Pablo Samuel

Since the introduction of Generative Adversarial Networks (GANs) [Goodfellow et al., 2014] there has been a regular stream of both technical advances (e.g., Arjovsky et al. [2017]) and creative uses of these generative models (e.g., [Karras et al., 2019, Zhu et al., 2017, Jin et al., 2017]). In this work we propose an approach for using the power of GANs to automatically generate videos to accompany audio recordings by aligning to spectral properties of the recording. This allows musicians to explore new forms of multi-modal creative expression, where musical performance can induce an AIgenerated musical video that is guided by said performance, as well as a medium for creating a visual narrative to follow a storyline (similar to what was proposed by Frosst and Kereliuk [2019]). When trained properly, these latent spaces are learned in a structured manner, where nearby points generate similar images. For our work we make use of the BigGAN family of models [Brock et al., 2019], which are class-conditional generative models.

artificial intelligence, inflection point, machine learning, (16 more...)

2011.05158

Country:

Oceania > Australia (0.15)
North America > Canada (0.15)

Genre: Research Report (0.40)

Industry:

Media > Music (0.59)
Leisure & Entertainment (0.59)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.57)

arXiv.org Machine LearningJul-31-2019

Inverse Reinforcement Learning with Multiple Ranked Experts

Castro, Pablo Samuel, Li, Shijian, Zhang, Daqing

We consider the problem of learning to behave optimally in a Markov Decision Process when a reward function is not specified, but instead we have access to a set of demonstrators of varying performance. We assume the demonstrators are classified into one of k ranks, and use ideas from ordinal regression to find a reward function that maximizes the margin between the different ranks. This approach is based on the idea that agents should not only learn how to behave from experts, but also how not to behave from non-experts. We show there are MDPs where important differences in the reward function would be hidden from existing algorithms by the behaviour of the expert. Our method is particularly useful for problems where we have access to a large set of agent behaviours with varying degrees of expertise (such as through GPS or cellphones). We highlight the differences between our approach and existing methods using a simple grid domain and demonstrate its efficacy on determining passenger-finding strategies for taxi drivers, using a large dataset of GPS trajectories.

ground transportation, passenger, reward function, (22 more...)

1907.13411

Country: Asia > China (0.28)

Genre: Research Report (1.00)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Machine LearningFeb-21-2019

A Comparative Analysis of Expected and Distributional Reinforcement Learning

Lyle, Clare, Castro, Pablo Samuel, Bellemare, Marc G.

Since their introduction a year ago, distributional approaches to reinforcement learning (distributional RL) have produced strong results relative to the standard approach which models expected values (expected RL). However, aside from convergence guarantees, there have been few theoretical results investigating the reasons behind the improvements distributional RL provides. In this paper we begin the investigation into this fundamental question by analyzing the differences in the tabular, linear approximation, and non-linear approximation settings. We prove that in many realizations of the tabular and linear approximation settings, distributional RL behaves exactly the same as expected RL. In cases where the two methods behave differently, distributional RL can in fact hurt performance when it does not induce identical behaviour. We then continue with an empirical analysis comparing distributional and expected RL methods in control settings with non-linear approximators to tease apart where the improvements from distributional RL methods are coming from.

algorithm, artificial intelligence, reinforcement learning, (18 more...)

1901.11084

Country:

Oceania > Australia (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Machine LearningFeb-8-2019

Distributional reinforcement learning with linear function approximation

Bellemare, Marc G., Roux, Nicolas Le, Castro, Pablo Samuel, Moitra, Subhodeep

Despite many algorithmic advances, our theoretical understanding of practical distributional reinforcement learning methods remains limited. One exception is Rowland et al. (2018)'s analysis of the C51 algorithm in terms of the Cram\'er distance, but their results only apply to the tabular setting and ignore C51's use of a softmax to produce normalized distributions. In this paper we adapt the Cram\'er distance to deal with arbitrary vectors. From it we derive a new distributional algorithm which is fully Cram\'er-based and can be combined to linear function approximation, with formal guarantees in the context of policy evaluation. In allowing the model's prediction to be any real vector, we lose the probabilistic interpretation behind the method, but otherwise maintain the appealing properties of distributional approaches. To the best of our knowledge, ours is the first proof of convergence of a distributional algorithm combined with function approximation. Perhaps surprisingly, our results provide evidence that Cram\'er-based distributional methods may perform worse than directly approximating the value function.

artificial intelligence, fuzzy logic, reinforcement learning, (18 more...)

1902.03149

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.82)

arXiv.org Machine LearningJan-31-2019

A Geometric Perspective on Optimal Representations for Reinforcement Learning

Bellemare, Marc G., Dabney, Will, Dadashi, Robert, Taiga, Adrien Ali, Castro, Pablo Samuel, Roux, Nicolas Le, Schuurmans, Dale, Lattimore, Tor, Lyle, Clare

This paper proposes a new approach to representation learning based on geometric properties of the space of value functions. We study a two-part approximation of the value function: a nonlinear map from states to vectors, or representation, followed by a linear map from vectors to values. Our formulation considers adapting the representation to minimize the (linear) approximation of the value function of all stationary policies for a given environment. We show that this optimization reduces to making accurate predictions regarding a special class of value functions which we call adversarial value functions (AVFs). We argue that these AVFs make excellent auxiliary tasks, and use them to construct a loss which can be efficiently minimized to find a near-optimal representation for reinforcement learning. We highlight characteristics of the method in a series of experiments on the four-room domain.

deep learning, neural network, representation, (19 more...)

1901.1153

Country:

Europe > United Kingdom > England (0.14)
North America > Canada > Alberta (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)