Goto

Collaborating Authors

 Reinforcement Learning


An Exploration of Embodied Visual Exploration

arXiv.org Artificial Intelligence

Embodied computer vision considers perception for robots in general, unstructured environments. Of particular importance is the embodied visual exploration problem: how might a robot equipped with a camera scope out a new environment? Despite the progress thus far, many basic questions pertinent to this problem remain unanswered: (i) What does it mean for an agent to explore its environment well? (ii) Which methods work well, and under which assumptions and environmental settings? (iii) Where do current approaches fall short, and where might future work seek to improve? Seeking answers to these questions, we perform a thorough empirical study of four state-of-the-art paradigms on two photorealistic simulated 3D environments. We present a taxonomy of key exploration methods and a standard framework for benchmarking visual exploration algorithms. Our experimental results offer insights, and suggest new performance metrics and baselines for future work in visual exploration.


Experimental Analysis of Reinforcement Learning Techniques for Spectrum Sharing Radar

arXiv.org Machine Learning

Abstract--In this work, we first describe a framework for the application of Reinforcement Learning (RL) control to a radar system that operates in a congested spectral setting. We then compare the utility of several RL algorithms through a discussion of experiments performed on Commercial off-the -shelf (COTS) hardware. Each RL technique is evaluated in terms of convergence, radar detection performance achieved in a con gested spectral environment, and the ability to share 100MHz spect rum with an uncooperative communications system. We examine po licy iteration, which solves an environment posed as a Markov Dec ision Process (MDP) by directly solving for a stochastic mapping between environmental states and radar waveforms, as well a s Deep RL techniques, which utilize a form of Q -Learning to approximate a parameterized function that is used by the rad ar to select optimal actions. We show that RL techniques are benefi cial over a Sense-and-A void (SAA) scheme and discuss the conditi ons under which each approach is most effective. The Third Generation Partnership Project (3GPP) has recently received FCC approval to support 5G New Radio (NR) operation in sub-6 GHz frequency bands that are heavily utilized by radar systems [1], [2]. Thus, there is a significa nt need for radar systems capable of dynamic spectrum sharing.


Artificial Intelligence for Social Good: A Survey

arXiv.org Artificial Intelligence

Its impact is drastic and real: Youtube's AIdriven recommendation system would present sports videos for days if one happens to watch a live baseball game on the platform [1]; email writing becomes much faster with machine learning (ML) based auto-completion [2]; many businesses have adopted natural language processing based chatbots as part of their customer services [3]. AI has also greatly advanced human capabilities in complex decision-making processes ranging from determining how to allocate security resources to protect airports [4] to games such as poker [5] and Go [6]. All such tangible and stunning progress suggests that an "AI summer" is happening. As some put it, "AI is the new electricity" [7]. Meanwhile, in the past decade, an emerging theme in the AI research community is the so-called "AI for social good" (AI4SG): researchers aim at developing AI methods and tools to address problems at the societal level and improve the wellbeing of the society.


Incentivizing the Emergence of Grounded Discrete Communication Between General Agents

arXiv.org Artificial Intelligence

We converted the recently developed BabyAI grid world platform to a sender/receiver setup in order to test the hypothesis that established deep reinforcement learning techniques are sufficient to incentivize the emergence of a grounded discrete communication protocol between general agents. This is in contrast to previous experiments that employed straight-through estimation or tailored inductive biases. Our results show that these can indeed be avoided, by instead providing proper environmental incentives. Moreover, they show that a longer interval between communications in-centivized more abstract semantics. In some cases, the communicating agents adapted to new environments more quickly than monolithic agents, showcasing the potential of emergent discrete communication for transfer learning.


Optimal Options for Multi-Task Reinforcement Learning Under Time Constraints

arXiv.org Artificial Intelligence

However, even to learn to solve simple tasks it can require millions of interactions. A promising approach to improve the learning speed relies on the options framework [6] An option is a'chunk of behaviour' that is formally defined as an initiation set, establishing in which states the option is available; a policy, indicating which actions to perform in each state; and a termination condition, establishing when the option execution is terminated. RL systems can benefit from the use of options to support faster exploration and learning especially when rewards are sparse or when the solution to a problem involves recurring behaviours. An important open problem is how can an agent autonomously learn options that are useful to solve tasks drawn from a given task distribution. Recent approaches have searched options for specific optimisation problems but they have not studied how optimal options are affected by different task features such as limited learning time budgets, task rewards, initial states, and the learning algorithm used.


Learning Reusable Options for Multi-Task Reinforcement Learning

arXiv.org Artificial Intelligence

One of the main reasons why RL has worked so well in these applications is that we are able simulate millions of interactions with the environment in a relatively short period of time, allowing the agent to experience a large number of different situations in the environment and learn the consequences of its actions. In many real world applications, however, where the agent interacts with the physical world, it might not be easy to generate such a large number of interactions. The time and cost associated with training such systems could render RL an unfeasible approach for training in large scale. As a concrete example, consider training a large number of humanoid robots (agents) to move quickly, as in the Robocup competition [ Farchy et al., 2013 ] . Although the agents have similar dynamics, subtle variations mean that a single policy shared across all agents would not be an effective solution.


google/trax

#artificialintelligence

Trax helps you understand deep learning. We start with basic maths and go through layers, models, supervised and reinforcement learning. We get to advanced deep learning results, including recent papers and state-of-the-art models. Trax is a successor to the Tensor2Tensor library and is actively used and maintained by researchers and engineers within the Google Brain team and a community of users. We're eager to collaborate with you too, so feel free to open an issue on GitHub or send along a pull request (see our contribution doc).


How To Build Your Own MuZero AI Using Python (Part 1/3)

#artificialintelligence

If you want to learn how one of the most sophisticated AI systems ever built works, you've come to the right place. In this three part series, we'll explore the inner workings of the DeepMind MuZero model -- the younger (and even more impressive) brother of AlphaZero. We'll be walking through the pseudocode that accompanies the MuZero paper -- so grab yourself a cup of tea and a comfy chair and let's begin. On 19th November 2019 DeepMind released their latest model-based reinforcement learning algorithm to the world -- MuZero. This is the fourth in a line of DeepMind reinforcement learning papers that have continually smashed through the barriers of possibility, starting with AlphaGo in 2016.


A Boolean Task Algebra for Reinforcement Learning

arXiv.org Machine Learning

We propose a framework for defining a Boolean algebra over the space of tasks. This allows us to formulate new tasks in terms of the negation, disjunction and conjunction of a set of base tasks. We then show that by learning goal-oriented value functions and restricting the transition dynamics of the tasks, an agent can solve these new tasks with no further learning. We prove that by composing these value functions in specific ways, we immediately recover the optimal policies for all tasks expressible under the Boolean algebra. We verify our approach in two domains, including a high-dimensional video game environment requiring function approximation, where an agent first learns a set of base skills, and then composes them to solve a super-exponential number of new tasks.


Hierarchical Reinforcement Learning as a Model of Human Task Interleaving

arXiv.org Artificial Intelligence

How do people decide how long to continue in a task, when to switch, and to which other task? Understanding the mechanisms that underpin task interleaving is a long-standing goal in the cognitive sciences. Prior work suggests greedy heuristics and a policy maximizing the marginal rate of return. However, it is unclear how such a strategy would allow for adaptation to everyday environments that offer multiple tasks with complex switch costs and delayed rewards. Here we develop a hierarchical model of supervisory control driven by reinforcement learning (RL). The supervisory level learns to switch using task-specific approximate utility estimates, which are computed on the lower level. A hierarchically optimal value function decomposition can be learned from experience, even in conditions with multiple tasks and arbitrary and uncertain reward and cost structures. The model reproduces known empirical effects of task interleaving. It yields better predictions of individual-level data than a myopic baseline in a six-task problem (N=211). The results support hierarchical RL as a plausible model of task interleaving.