Undirected Networks
Destination Prediction Based on Partial Trajectory Data
Ebel, Patrick, Gรถl, Ibrahim Emre, Lingenfelder, Christoph, Vogelsang, Andreas
Two-thirds of the people who buy a new car prefer to use a substitute instead of the built-in navigation system. However, for many applications, knowledge about a user's intended destination and route is crucial. For example, suggestions for available parking spots close to the destination can be made or ride-sharing opportunities along the route are facilitated. Our approach predicts probable destinations and routes of a vehicle, based on the most recent partial trajectory and additional contextual data. The approach follows a three-step procedure: First, a $k$-d tree-based space discretization is performed, mapping GPS locations to discrete regions. Secondly, a recurrent neural network is trained to predict the destination based on partial sequences of trajectories. The neural network produces destination scores, signifying the probability of each region being the destination. Finally, the routes to the most probable destinations are calculated. To evaluate the method, we compare multiple neural architectures and present the experimental results of the destination prediction. The experiments are based on two public datasets of non-personalized, timestamped GPS locations of taxi trips. The best performing models were able to predict the destination of a vehicle with a mean error of 1.3 km and 1.43 km respectively.
Reinforcement Learning in a Physics-Inspired Semi-Markov Environment
Bellinger, Colin, Coles, Rory, Crowley, Mark, Tamblyn, Isaac
Reinforcement learning (RL) has been demonstrated to have great potential in many applications of scientific discovery and design. Recent work includes, for example, the design of new structures and compositions of molecules for therapeutic drugs. Much of the existing work related to the application of RL to scientific domains, however, assumes that the available state representation obeys the Markov property. For reasons associated with time, cost, sensor accuracy, and gaps in scientific knowledge, many scientific design and discovery problems do not satisfy the Markov property. Thus, something other than a Markov decision process (MDP) should be used to plan / find the optimal policy. In this paper, we present a physics-inspired semi-Markov RL environment, namely the phase change environment. In addition, we evaluate the performance of value-based RL algorithms for both MDPs and partially observable MDPs (POMDPs) on the proposed environment. Our results demonstrate deep recurrent Q-networks (DRQN) significantly outperform deep Q-networks (DQN), and that DRQNs benefit from training with hindsight experience replay. Implications for the use of semi-Markovian RL and POMDPs for scientific laboratories are also discussed.
K-spin Hamiltonian for quantum-resolvable Markov decision processes
Jones, Eric B., Graf, Peter, Kapit, Eliot, Jones, Wesley
The Markov decision process is the mathematical formalization underlying the modern field of reinforcement learning when transition and reward functions are unknown. We derive a pseudo-Boolean cost function that is equivalent to a K-spin Hamiltonian representation of the discrete, finite, discounted Markov decision process with infinite horizon. This K-spin Hamiltonian furnishes a starting point from which to solve for an optimal policy using heuristic quantum algorithms such as adiabatic quantum annealing and the quantum approximate optimization algorithm on near-term quantum hardware. In proving that the variational minimization of our Hamiltonian is equivalent to the Bellman optimality condition we establish an interesting analogy with classical field theory. Along with proof-of-concept calculations to corroborate our formulation by simulated and quantum annealing against classical Q-Learning, we analyze the scaling of physical resources required to solve our Hamiltonian on quantum hardware.
A Deep Reinforcement Learning Framework for Continuous Intraday Market Bidding
Boukas, Ioannis, Ernst, Damien, Thรฉate, Thibaut, Bolland, Adrien, Huynen, Alexandre, Buchwald, Martin, Wynants, Christelle, Cornรฉlusse, Bertrand
The large integration of variable energy resources is expected to shift a large part of the energy exchanges closer to real-time, where more accurate forecasts are available. In this context, the short-term electricity markets and in particular the intraday market are considered a suitable trading floor for these exchanges to occur. A key component for the successful renewable energy sources integration is the usage of energy storage. In this paper, we propose a novel modelling framework for the strategic participation of energy storage in the European continuous intraday market where exchanges occur through a centralized order book. The goal of the storage device operator is the maximization of the profits received over the entire trading horizon, while taking into account the operational constraints of the unit. The sequential decision-making problem of trading in the intraday market is modelled as a Markov Decision Process. An asynchronous distributed version of the fitted Q iteration algorithm is chosen for solving this problem due to its sample efficiency. The large and variable number of the existing orders in the order book motivates the use of high-level actions and an alternative state representation. Historical data are used for the generation of a large number of artificial trajectories in order to address exploration issues during the learning process. The resulting policy is back-tested and compared against a benchmark strategy that is the current industrial standard. Results indicate that the agent converges to a policy that achieves in average higher total revenues than the benchmark strategy.
Learning to Explore using Active Neural SLAM
Chaplot, Devendra Singh, Gandhi, Dhiraj, Gupta, Saurabh, Gupta, Abhinav, Salakhutdinov, Ruslan
This work presents a modular and hierarchical approach to learn policies for exploring 3D environments, called `Active Neural SLAM'. Our approach leverages the strengths of both classical and learning-based methods, by using analytical path planners with learned SLAM module, and global and local policies. The use of learning provides flexibility with respect to input modalities (in the SLAM module), leverages structural regularities of the world (in global policies), and provides robustness to errors in state estimation (in local policies). Such use of learning within each module retains its benefits, while at the same time, hierarchical decomposition and modular training allow us to sidestep the high sample complexities associated with training end-to-end policies. Our experiments in visually and physically realistic simulated 3D environments demonstrate the effectiveness of our approach over past learning and geometry-based approaches. The proposed model can also be easily transferred to the PointGoal task and was the winning entry of the CVPR 2019 Habitat PointGoal Navigation Challenge.
Risk-Aware High-level Decisions for Automated Driving at Occluded Intersections with Reinforcement Learning
Kamran, Danial, Lopez, Carlos Fernandez, Lauer, Martin, Stiller, Christoph
Reinforcement learning is nowadays a popular framework for solving different decision making problems in automated driving. However, there are still some remaining crucial challenges that need to be addressed for providing more reliable policies. In this paper, we propose a generic risk-aware DQN approach in order to learn high level actions for driving through unsignalized occluded intersections. The proposed state representation provides lane based information which allows to be used for multi-lane scenarios. Moreover, we propose a risk based reward function which punishes risky situations instead of only collision failures. Such rewarding approach helps to incorporate risk prediction into our deep Q network and learn more reliable policies which are safer in challenging situations. The efficiency of the proposed approach is compared with a DQN learned with conventional collision based rewarding scheme and also with a rule-based intersection navigation policy. Evaluation results show that the proposed approach outperforms both of these methods. It provides safer actions than collision-aware DQN approach and is less overcautious than the rule-based policy.
Deep Reinforcement Learning (DRL): Another Perspective for Unsupervised Wireless Localization
Li, You, Hu, Xin, Zhuang, Yuan, Gao, Zhouzheng, Zhang, Peng, El-Sheimy, Naser
Location is key to spatialize internet-of-things (IoT) data. However, it is challenging to use low-cost IoT devices for robust unsupervised localization (i.e., localization without training data that have known location labels). Thus, this paper proposes a deep reinforcement learning (DRL) based unsupervised wireless-localization method. The main contributions are as follows. (1) This paper proposes an approach to model a continuous wireless-localization process as a Markov decision process (MDP) and process it within a DRL framework. (2) To alleviate the challenge of obtaining rewards when using unlabeled data (e.g., daily-life crowdsourced data), this paper presents a reward-setting mechanism, which extracts robust landmark data from unlabeled wireless received signal strengths (RSS). (3) To ease requirements for model re-training when using DRL for localization, this paper uses RSS measurements together with agent location to construct DRL inputs. The proposed method was tested by using field testing data from multiple Bluetooth 5 smart ear tags in a pasture. Meanwhile, the experimental verification process reflected the advantages and challenges for using DRL in wireless localization.
Inference in the Stochastic Block Model with a Markovian assignment of the communities
Large random graphs have been very popular in the last decade since they are powerful tools to model complex phenomena like interactions on social networks or the spread of a disease. In practical cases, detecting communities of well connected nodes in a graph is a major issue, motivating the study of the Stochastic Block Model (SBM). In this model, each node belongs to a particular community and edges are sampled independently according to a probability depending of the communities of the nodes. Aiming at progressively bridging the gap between models and reality, time evolving random graphs have been recently introduced. In [20], a Stochastic Block Temporal Model is considered where the temporal evolution is modeled through a discrete hidden Markov chain on the nodes membership and where the connection probabilities also evolve through time.
Detecting Dynamic Community Structure in Functional Brain Networks Across Individuals: A Multilayer Apporach
Ting, Chee-Ming, Samdin, S. Balqis, Tang, Meini, Ombao, Hernando
We present a unified statistical framework for characterizing community structure of brain functional networks that captures variation across individuals and evolution over time. Existing methods for community detection focus only on single-subject analysis of dynamic networks; while recent extensions to multiple-subjects analysis are limited to static networks. To overcome these limitations, we propose a multi-subject, Markov-switching stochastic block model (MSS-SBM) to identify state-related changes in brain community organization over a group of individuals. We first formulate a multilayer extension of SBM to describe the time-dependent, multi-subject brain networks. We develop a novel procedure for fitting the multilayer SBM that builds on multislice modularity maximization which can uncover a common community partition of all layers (subjects) simultaneously. By augmenting with a dynamic Markov switching process, our proposed method is able to capture a set of distinct, recurring temporal states with respect to inter-community interactions over subjects and the change points between them. Simulation shows accurate community recovery and tracking of dynamic community regimes over multilayer networks by the MSS-SBM. Application to task fMRI reveals meaningful non-assortative brain community motifs, e.g., core-periphery structure at the group level, that are associated with language comprehension and motor functions suggesting their putative role in complex information integration. Our approach detected dynamic reconfiguration of modular connectivity elicited by varying task demands and identified unique profiles of intra and inter-community connectivity across different task conditions. The proposed multilayer network representation provides a principled way of detecting synchronous, dynamic modularity in brain networks across subjects.
Learning Discrete Structured Representations by Adversarially Maximizing Mutual Information
We propose learning discrete structured representations from unlabeled data by maximizing the mutual information between a structured latent variable and a target variable. Calculating mutual information is intractable in this setting. Our key technical contribution is an adversarial objective that can be used to tractably estimate mutual information assuming only the feasibility of cross entropy calculation. We develop a concrete realization of this general formulation with Markov distributions over binary encodings. We report critical and unexpected findings on practical aspects of the objective such as the choice of variational priors. We apply our model on document hashing and show that it outperforms current best baselines based on discrete and vector quantized variational autoencoders. It also yields highly compressed interpretable representations.