Goto

Collaborating Authors

 Markov Models


Optimal and Fast Real-time Resources Slicing with Deep Dueling Neural Networks

arXiv.org Artificial Intelligence

Effective network slicing requires an infrastructure/network provider to deal with the uncertain demand and real-time dynamics of network resource requests. Another challenge is the combinatorial optimization of numerous resources, e.g., radio, computing, and storage. This article develops an optimal and fast real-time resource slicing framework that maximizes the long-term return of the network provider while taking into account the uncertainty of resource demand from tenants. Specifically, we first propose a novel system model which enables the network provider to effectively slice various types of resources to different classes of users under separate virtual slices. We then capture the real-time arrival of slice requests by a semi-Markov decision process. To obtain the optimal resource allocation policy under the dynamics of slicing requests, e.g., uncertain service time and resource demands, a Q-learning algorithm is often adopted in the literature. However, such an algorithm is notorious for its slow convergence, especially for problems with large state/action spaces. This makes Q-learning practically inapplicable to our case in which multiple resources are simultaneously optimized. To tackle it, we propose a novel network slicing approach with an advanced deep learning architecture, called deep dueling that attains the optimal average reward much faster than the conventional Q-learning algorithm. This property is especially desirable to cope with real-time resource requests and the dynamic demands of users. Extensive simulations show that the proposed framework yields up to 40% higher long-term average return while being few thousand times faster, compared with state of the art network slicing approaches.


Long-Range Indoor Navigation with PRM-RL

arXiv.org Artificial Intelligence

Long-range indoor navigation requires guiding robots with noisy sensors and controls through cluttered environments along paths that span a variety of buildings. We achieve this with PRM-RL, a hierarchical robot navigation method in which reinforcement learning agents that map noisy sensors to robot controls learn to solve short-range obstacle avoidance tasks, and then sampling-based planners map where these agents can reliably navigate in simulation; these roadmaps and agents are then deployed on-robot, guiding the robot along the shortest path where the agents are likely to succeed. Here we use Probabilistic Roadmaps (PRMs) as the sampling-based planner and AutoRL as the reinforcement learning method in the indoor navigation context. We evaluate the method in simulation for kinematic differential drive and kinodynamic car-like robots in several environments, and on-robot for differential-drive robots at two physical sites. Our results show PRM-RL with AutoRL is more successful than several baselines, is robust to noise, and can guide robots over hundreds of meters in the face of noise and obstacles in both simulation and on-robot, including over 3.3 kilometers of physical robot navigation.


Iterative Channel Estimation for Discrete Denoising under Channel Uncertainty

arXiv.org Artificial Intelligence

We propose a novel iterative channel estimation (ICE) algorithm that essentially removes the critical known noisy channel assumption for universal discrete denoising problem. Our algorithm is based on Neural DUDE (N-DUDE), a recently proposed neural network-based discrete denoiser, and it estimates the channel transition matrix as well as the neural network parameters in an alternating manner until convergence. While we do not make any probabilistic assumption on the underlying clean data, our ICE resembles Expectation-Maximization (EM) with variational approximation, and it takes advantage of the property of N-DUDE being locally robust around the true channel. With extensive experiments on several radically different types of data, we show that the ICE equipped N-DUDE (dubbed as ICE-N-DUDE) can perform \emph{universally} well regardless of the uncertainties in both the channel and the clean source. Moreover, we show ICE-N-DUDE becomes extremely robust to its hyperparameters and significantly outperforms the strong baseline that can deal with the channel uncertainties for denoising, the widely used Baum-Welch (BW) algorithm for hidden Markov models (HMM).


A Driving Intention Prediction Method Based on Hidden Markov Model for Autonomous Driving

arXiv.org Machine Learning

In a mixed-traffic scenario where both autonomous vehicles and human-driving vehicles exist, a timely prediction of driving intentions of nearby human-driving vehicles is essential for the safe and efficient driving of an autonomous vehicle. In this paper, a driving intention prediction method based on Hidden Markov Model (HMM) is proposed for autonomous vehicles. HMMs representing different driving intentions are trained and tested with field collected data from a flyover. When training the models, either discrete or continuous characterization of the mobility features of vehicles is applied. Experimental results show that the HMMs trained with the continuous characterization of mobility features can give a higher prediction accuracy when they are used for predicting driving intentions. Moreover, when the surrounding traffic of the vehicle is taken into account, the performances of the proposed prediction method are further improved.


Rethinking Action Spaces for Reinforcement Learning in End-to-end Dialog Agents with Latent Variable Models

arXiv.org Artificial Intelligence

Defining action spaces for conversational agents and optimizing their decision-making process with reinforcement learning is an enduring challenge. Common practice has been to use handcrafted dialog acts, or the output vocabulary, e.g. in neural encoder decoders, as the action spaces. Both have their own limitations. This paper proposes a novel latent action framework that treats the action spaces of an end-to-end dialog agent as latent variables and develops unsupervised methods in order to induce its own action space from the data. Comprehensive experiments are conducted examining both continuous and discrete action types and two different optimization methods based on stochastic variational inference. Results show that the proposed latent actions achieve superior empirical performance improvement over previous word-level policy gradient methods on both DealOrNoDeal and MultiWoz dialogs. Our detailed analysis also provides insights about various latent variable approaches for policy learning and can serve as a foundation for developing better latent actions in future research.


Transfer Learning for Non-Intrusive Load Monitoring

arXiv.org Machine Learning

Non-intrusive load monitoring (NILM) is a technique to recover source appliances from only the recorded mains in a household. NILM is unidentifiable and thus a challenge problem because the inferred power value of an appliance given only the mains could not be unique. To mitigate the unidentifiable problem, various methods incorporating domain knowledge into NILM have been proposed and shown effective experimentally. Recently, among these methods, deep neural networks are shown performing best. Arguably, the recently proposed sequence-to-point (seq2point) learning is promising for NILM. However, the results were only carried out on the same data domain. It is not clear if the method could be generalised or transferred to different domains, e.g., the test data were drawn from a different country comparing to the training data. We address this issue in the paper, and two transfer learning schemes are proposed, i.e., appliance transfer learning (ATL) and cross-domain transfer learning (CTL). For ATL, our results show that the latent features learnt by a `complex' appliance, e.g., washing machine, can be transferred to a `simple' appliance, e.g., kettle. For CTL, our conclusion is that the seq2point learning is transferable. Precisely, when the training and test data are in a similar domain, seq2point learning can be directly applied to the test data without fine tuning; when the training and test data are in different domains, seq2point learning needs fine tuning before applying to the test data. Interestingly, we show that only the fully connected layers need fine tuning for transfer learning.


Probabilistic Inference of Binary Markov Random Fields in Spiking Neural Networks through Mean-field Approximation

arXiv.org Machine Learning

Recent studies have suggested that the cognitive process of the human brain is realized as probabilistic inference and can be further modeled by probabilistic graphical models like Markov random fields. Nevertheless, it remains unclear how probabilistic inference can be implemented by a network of spiking neurons in the brain. Previous studies tried to relate the inference equation of binary Markov random fields to the dynamic equation of spiking neural networks through belief propagation algorithm and reparameterization, but they are valid only for Markov random fields with limited network structure. In this paper, we propose a spiking neural network model that can implement inference of arbitrary binary Markov random fields. Specifically, we design a spiking recurrent neural network and prove that its neuronal dynamics are mathematically equivalent to the inference process of Markov random fields by adopting mean-field theory.


Bayesian optimisation under uncertain inputs

arXiv.org Machine Learning

Bayesian optimisation (BO) has been a successful approach to optimise functions which are expensive to evaluate and whose observations are noisy. Classical BO algorithms, however, do not account for errors about the location where observations are taken, which is a common issue in problems with physical components. In these cases, the estimation of the actual query location is also subject to uncertainty. In this context, we propose an upper confidence bound (UCB) algorithm for BO problems where both the outcome of a query and the true query location are uncertain. The algorithm employs a Gaussian process model that takes probability distributions as inputs. Theoretical results are provided for both the proposed algorithm and a conventional UCB approach within the uncertain-inputs setting. Finally, we evaluate each method's performance experimentally, comparing them to other input noise aware BO approaches on simulated scenarios involving synthetic and real data.


World Discovery Models

arXiv.org Machine Learning

As humans we are driven by a strong desire for seeking novelty in our world. Also upon observing a novel pattern we are capable of refining our understanding of the world based on the new information---humans can discover their world. The outstanding ability of the human mind for discovery has led to many breakthroughs in science, art and technology. Here we investigate the possibility of building an agent capable of discovering its world using the modern AI technology. In particular we introduce NDIGO, Neural Differential Information Gain Optimisation, a self-supervised discovery model that aims at seeking new information to construct a global view of its world from partial and noisy observations. Our experiments on some controlled 2-D navigation tasks show that NDIGO outperforms state-of-the-art information-seeking methods in terms of the quality of the learned representation. The improvement in performance is particularly significant in the presence of white or structured noise where other information-seeking methods follow the noise instead of discovering their world.


Hyperbolic Discounting and Learning over Multiple Horizons

arXiv.org Machine Learning

Reinforcement learning (RL) typically defines a discount factor as part of the Markov Decision Process. The discount factor values future rewards by an exponential scheme that leads to theoretical convergence guarantees of the Bellman equation. However, evidence from psychology, economics and neuroscience suggests that humans and animals instead have hyperbolic time-preferences. In this work we revisit the fundamentals of discounting in RL and bridge this disconnect by implementing an RL agent that acts via hyperbolic discounting. We demonstrate that a simple approach approximates hyperbolic discount functions while still using familiar temporal-difference learning techniques in RL. Additionally, and independent of hyperbolic discounting, we make a surprising discovery that simultaneously learning value functions over multiple time-horizons is an effective auxiliary task which often improves over a strong value-based RL agent, Rainbow.