Undirected Networks
Graph Convolutional Memory for Deep Reinforcement Learning
Morad, Steven D., Liwicki, Stephan, Prorok, Amanda
Solving partially-observable Markov decision processes (POMDPs) is critical when applying deep reinforcement learning (DRL) to real-world robotics problems, where agents have an incomplete view of the world. We present graph convolutional memory (GCM) for solving POMDPs using deep reinforcement learning. Unlike recurrent neural networks (RNNs) or transformers, GCM embeds domain-specific priors into the memory recall process via a knowledge graph. By encapsulating priors in the graph, GCM adapts to specific tasks but remains applicable to any DRL task. Using graph convolutions, GCM extracts hierarchical graph features, analogous to image features in a convolutional neural network (CNN). We show GCM outperforms long short-term memory (LSTM), gated transformers for reinforcement learning (GTrXL), and differentiable neural computers (DNCs) on control, long-term non-sequential recall, and 3D navigation tasks while using significantly fewer parameters.
Building Intelligent Autonomous Navigation Agents
Breakthroughs in machine learning in the last decade have led to `digital intelligence', i.e. machine learning models capable of learning from vast amounts of labeled data to perform several digital tasks such as speech recognition, face recognition, machine translation and so on. The goal of this thesis is to make progress towards designing algorithms capable of `physical intelligence', i.e. building intelligent autonomous navigation agents capable of learning to perform complex navigation tasks in the physical world involving visual perception, natural language understanding, reasoning, planning, and sequential decision making. Despite several advances in classical navigation methods in the last few decades, current navigation agents struggle at long-term semantic navigation tasks. In the first part of the thesis, we discuss our work on short-term navigation using end-to-end reinforcement learning to tackle challenges such as obstacle avoidance, semantic perception, language grounding, and reasoning. In the second part, we present a new class of navigation methods based on modular learning and structured explicit map representations, which leverage the strengths of both classical and end-to-end learning methods, to tackle long-term navigation tasks. We show that these methods are able to effectively tackle challenges such as localization, mapping, long-term planning, exploration and learning semantic priors. These modular learning methods are capable of long-term spatial and semantic understanding and achieve state-of-the-art results on various navigation tasks.
Fundamental limits for learning hidden Markov model parameters
Abraham, Kweku, Naulet, Zacharie, Gassiat, Elisabeth
We study the frontier between learnable and unlearnable hidden Markov models (HMMs). HMMs are flexible tools for clustering dependent data coming from unknown populations. The model parameters are known to be identifiable as soon as the clusters are distinct and the hidden chain is ergodic with a full rank transition matrix. In the limit as any one of these conditions fails, it becomes impossible to identify parameters. For a chain with two hidden states we prove nonasymptotic minimax upper and lower bounds, matching up to constants, which exhibit thresholds at which the parameters become learnable.
Most Important Skills Required For IT Professionals in AI and Machine Learning - IMC Grupo
The next digital frontier in the IT world is? The one that is your opponent in PUBG(or other interactive games), that allows you to ask Google to make calls for you, that reminds you to make your insurance paid, suggests what to purchase from your favorite eCommerce site, and suggests movies over Netflix. We are surrounded by Artificial Intelligence and Machine Learning applications so extensively that we don't even realize their presence. When Facebook recommends friends or groups to you, it is AI working behind the scenes. When Google listens to you and acts as per your command, it's ML and AI working.
Provably Efficient Representation Learning in Low-rank Markov Decision Processes
Zhang, Weitong, He, Jiafan, Zhou, Dongruo, Zhang, Amy, Gu, Quanquan
The success of deep reinforcement learning (DRL) is due to the power of learning a representation that is suitable for the underlying exploration and exploitation task. However, existing provable reinforcement learning algorithms with linear function approximation often assume the feature representation is known and fixed. In order to understand how representation learning can improve the efficiency of RL, we study representation learning for a class of low-rank Markov Decision Processes (MDPs) where the transition kernel can be represented in a bilinear form. We propose a provably efficient algorithm called ReLEX that can simultaneously learn the representation and perform exploration. We show that ReLEX always performs no worse than a state-of-the-art algorithm without representation learning, and will be strictly better in terms of sample efficiency if the function class of representations enjoys a certain mild "coverage'' property over the whole state-action space.
Reinforcement learning for PHY layer communications
Mary, Philippe, Koivunen, Visa, Moy, Christophe
In this chapter, we will give comprehensive examples of applying RL in optimizing the physical layer of wireless communications by defining different class of problems and the possible solutions to handle them. In Section 9.2, we present all the basic theory needed to address a RL problem, i.e. Markov decision process (MDP), Partially observable Markov decision process (POMDP), but also two very important and widely used algorithms for RL, i.e. the Q-learning and SARSA algorithms. We also introduce the deep reinforcement learning (DRL) paradigm and the section ends with an introduction to the multi-armed bandits (MAB) framework. Section 9.3 focuses on some toy examples to illustrate how the basic concepts of RL are employed in communication systems. We present applications extracted from literature with simplified system models using similar notation as in Section 9.2 of this Chapter. In Section 9.3, we also focus on modeling RL problems, i.e. how action and state spaces and rewards are chosen. The Chapter is concluded in Section 9.4 with a prospective thought on RL trends and it ends with a review of a broader state of the art in Section 9.5.
Dive into Deep Learning
Zhang, Aston, Lipton, Zachary C., Li, Mu, Smola, Alexander J.
Just a few years ago, there were no legions of deep learning scientists developing intelligent products and services at major companies and startups. When the youngest among us (the authors) entered the field, machine learning did not command headlines in daily newspapers. Our parents had no idea what machine learning was, let alone why we might prefer it to a career in medicine or law. Machine learning was a forward-looking academic discipline with a narrow set of real-world applications. And those applications, e.g., speech recognition and computer vision, required so much domain knowledge that they were often regarded as separate areas entirely for which machine learning was one small component. Neural networks then, the antecedents of the deep learning models that we focus on in this book, were regarded as outmoded tools. In just the past five years, deep learning has taken the world by surprise, driving rapid progress in fields as diverse as computer vision, natural language processing, automatic speech recognition, reinforcement learning, and statistical modeling. With these advances in hand, we can now build cars that drive themselves with more autonomy than ever before (and less autonomy than some companies might have you believe), smart reply systems that automatically draft the most mundane emails, helping people dig out from oppressively large inboxes, and software agents that dominate the worldสผs best humans at board games like Go, a feat once thought to be decades away. Already, these tools exert ever-wider impacts on industry and society, changing the way movies are made, diseases are diagnosed, and playing a growing role in basic sciences--from astrophysics to biology.
Nested Variational Inference
Zimmermann, Heiko, Wu, Hao, Esmaeili, Babak, van de Meent, Jan-Willem
We develop nested variational inference (NVI), a family of methods that learn proposals for nested importance samplers by minimizing an forward or reverse KL divergence at each level of nesting. NVI is applicable to many commonly-used importance sampling strategies and provides a mechanism for learning intermediate densities, which can serve as heuristics to guide the sampler. Our experiments apply NVI to (a) sample from a multimodal distribution using a learned annealing path (b) learn heuristics that approximate the likelihood of future observations in a hidden Markov model and (c) to perform amortized inference in hierarchical deep generative models. We observe that optimizing nested objectives leads to improved sample quality in terms of log average weight and effective sample size.
Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations
Dann, Christoph, Mansour, Yishay, Mohri, Mehryar, Sekhari, Ayush, Sridharan, Karthik
Reinforcement Learning (RL) has achieved several remarkable empirical successes in the last decade, which include playing Atari 2600 video games at superhuman levels (Mnih et al., 2015), AlphaGo or AlphaGo Zero surpassing champions in Go (Silver et al., 2018), AlphaStar's victory over top-ranked professional players in StarCraft (Vinyals et al., 2019), or practical self-driving cars. These applications all correspond to the setting of rich observations, where the state space is very large and where observations may be images, text or audio data. In contrast, most provably efficient RL algorithms are still limited to the classical tabular setting where the state space is small (Kearns and Singh, 2002; Brafman and Tennenholtz, 2002; Azar et al., 2017; Dann et al., 2019) and do not scale to the rich observation setting. To derive guarantees for large state spaces, much of the existing work in RL theory relies on a realizability and a low-rank assumption (Krishnamurthy et al., 2016; Jiang et al., 2017; Dann et al., 2018; Du et al., 2019a; Misra et al., 2020; Agarwal et al., 2020b). Different notions of rank have been adopted in the literature, including that of a low-rank transition matrix (Jin et al., 2020a), a low Bellman rank (Jiang et al., 2017), Wittness rank (Sun et al., 2019), Eluder dimension (Osband and Van Roy, 2014), Bellman-Eluder dimension (Jin et al., 2021), or bilinear classes (Du et al., 2021).
Vehicle Trajectory Prediction in City-scale Road Networks using a Direction-based Sequence-to-Sequence Model with Spatiotemporal Attention Mechanisms
Trajectory prediction of vehicles at the city scale is of great importance to various location-based applications such as vehicle navigation, traffic management, and location-based recommendations. Existing methods typically represent a trajectory as a sequence of grid cells, road segments or intention sets. None of them is ideal, as the cell-based representation ignores the road network structures and the other two are less efficient in analyzing city-scale road networks. In addition, most models focus on predicting the immediate next position, and are difficult to generalize for longer sequences. To address these problems, we propose a novel sequence-to-sequence model named D-LSTM (Direction-based Long Short-Term Memory), which represents each trajectory as a sequence of intersections and associated movement directions, and then feeds them into a LSTM encoder-decoder network for future trajectory generation. Furthermore, we introduce a spatial attention mechanism to capture dynamic spatial dependencies in road networks, and a temporal attention mechanism with a sliding context window to capture both short- and long-term temporal dependencies in trajectory data. Extensive experiments based on two real-world large-scale taxi trajectory datasets show that D-LSTM outperforms the existing state-of-the-art methods for vehicle trajectory prediction, validating the effectiveness of the proposed trajectory representation method and spatiotemporal attention mechanisms.