# Markov Models

### Markov Networks: Undirected Graphical Models

This article briefs you about Markov Networks which falls under the family of Undirected Graphical Models (UGM). This article is a follow-up to Bayesian Network, which is a type of Directed Graphical Models. Key Motivation behind these networks is to parameterize the Joint Probability Distribution based on Local Independencies between Random Variables. Generally, Bayesian Network requires to pre-define a directionality to assert an influence of random variable. But there might be cases where interaction between nodes ( or random variables) are symmetric in nature, and we would like to have a model which can represent this symmetricity without directional influence.

### Hyperbolic Discounting and Learning over Multiple Horizons

Reinforcement learning (RL) typically defines a discount factor as part of the Markov Decision Process. The discount factor values future rewards by an exponential scheme that leads to theoretical convergence guarantees of the Bellman equation. However, evidence from psychology, economics and neuroscience suggests that humans and animals instead have hyperbolic time-preferences. In this work we revisit the fundamentals of discounting in RL and bridge this disconnect by implementing an RL agent that acts via hyperbolic discounting. We demonstrate that a simple approach approximates hyperbolic discount functions while still using familiar temporal-difference learning techniques in RL. Additionally, and independent of hyperbolic discounting, we make a surprising discovery that simultaneously learning value functions over multiple time-horizons is an effective auxiliary task which often improves over a strong value-based RL agent, Rainbow.

### Robust Reinforcement Learning in POMDPs with Incomplete and Noisy Observations

In real-world scenarios, the observation data for reinforcement learning with continuous control is commonly noisy and part of it may be dynamically missing over time, which violates the assumption of many current methods developed for this. We addressed the issue within the framework of partially observable Markov Decision Process (POMDP) using a model-based method, in which the transition model is estimated from the incomplete and noisy observations using a newly proposed surrogate loss function with local approximation, while the policy and value function is learned with the help of belief imputation. For the latter purpose, a generative model is constructed and is seamlessly incorporated into the belief updating procedure of POMDP, which enables robust execution even under a significant incompleteness and noise. The effectiveness of the proposed method is verified on a collection of benchmark tasks, showing that our approach outperforms several compared methods under various challenging scenarios.

Sign languages aren't easy to learn and are even harder to teach. They use not just hand gestures but also mouthings, facial expressions and body posture to communicate meaning. This complexity means professional teaching programs are still rare and often expensive. But this could all change soon, with a little help from artificial intelligence (AI). My colleagues and I are working on software for teaching yourself sign languages in an automated, intuitive way.

### Divergence-Based Motivation for Online EM and Combining Hidden Variable Models

Expectation-Maximization (EM) is the fallback method for parameter estimation of hidden (aka latent) variable models. Given the full batch of data, EM forms an upper-bound of the negative log-likelihood of the model at each iteration and then updates to the minimizer of this upper-bound. We introduce a versatile online variant of EM where the data arrives in as a stream. Our motivation is based on the relative entropy divergences between two joint distributions over the hidden and visible variables. We view the EM upper-bound as a Monte Carlo approximation of an expectation and show that the joint relative entropy divergence induces a similar expectation form. As a result, we employ the divergence to the old model as the inertia term to motivate our online EM algorithm. Our motivation is more widely applicable than previous ones and leads to simple online updates for mixture of exponential distributions, hidden Markov models, and the first known online update for Kalman filters. Additionally, the finite sample form of the inertia term lets us derive online updates when there is no closed form solution. Experimentally, sweeping the data with an online update converges much faster than the batch update. Our divergence based methods also lead to a simple way to combine hidden variable models and this immediately gives efficient algorithms for distributed setting.

### WiseMove: A Framework for Safe Deep Reinforcement Learning for Autonomous Driving

Machine learning can provide efficient solutions to the complex problems encountered in autonomous driving, but ensuring their safety remains a challenge. A number of authors have attempted to address this issue, but there are few publicly-available tools to adequately explore the trade-offs between functionality, scalability, and safety. We thus present WiseMove, a software framework to investigate safe deep reinforcement learning in the context of motion planning for autonomous driving. WiseMove adopts a modular learning architecture that suits our current research questions and can be adapted to new technologies and new questions. We present the details of WiseMove, demonstrate its use on a common traffic scenario, and describe how we use it in our ongoing safe learning research.

### Model-Based Detector for SSDs in the Presence of Inter-cell Interference

In this paper, we consider the problem of reducing the bit error rate of flash-based solid state drives (SSDs) when cells are subject to inter-cell interference (ICI). By observing that the outputs of adjacent victim cells can be correlated due to common aggressors, we propose a novel channel model to accurately represent the true flash channel. This model, equivalent to a finite-state Markov channel model, allows the use of the sum-product algorithm to calculate more accurate posterior distributions of individual cell inputs given the joint outputs of victim cells. These posteriors can be easily mapped to the log-likelihood ratios that are passed as inputs to the soft LDPC decoder. When the output is available with high precision, our simulation showed that a significant reduction in the bit-error rate can be obtained, reaching $99.99\%$ reduction compared to current methods, when the diagonal coupling is very strong. In the realistic case of low-precision output, our scheme provides less impressive improvements due to information loss in the process of quantization. To improve the performance of the new detector in the quantized case, we propose a new iterative scheme that alternates multiple times between the detector and the decoder. Our simulations showed that the iterative scheme can significantly improve the bit error rate even in the quantized case.

### Testing Markov Chains without Hitting

We study the problem of identity testing of markov chains. In this setting, we are given access to a single trajectory from a markov chain with unknown transition matrix $Q$ and the goal is to determine whether $Q = P$ for some known matrix $P$ or $\text{Dist}(P, Q) \geq \epsilon$ where $\text{Dist}$ is suitably defined. In recent work by Daskalakis, Dikkala and Gravin, 2018, it was shown that it is possible to distinguish between the two cases provided the length of the observed trajectory is at least super-linear in the hitting time of $P$ which may be arbitrarily large. In this paper, we propose an algorithm that avoids this dependence on hitting time thus enabling efficient testing of markov chains even in cases where it is infeasible to observe every state in the chain. Our algorithm is based on combining classical ideas from approximation algorithms with techniques for the spectral analysis of markov chains.

### Unbiased Smoothing using Particle Independent Metropolis-Hastings

We consider the approximation of expectations with respect to the distribution of a latent Markov process given noisy measurements. This is known as the smoothing problem and is often approached with particle and Markov chain Monte Carlo (MCMC) methods. These methods provide consistent but biased estimators when run for a finite time. We propose a simple way of coupling two MCMC chains built using Particle Independent Metropolis-Hastings (PIMH) to produce unbiased smoothing estimators. Unbiased estimators are appealing in the context of parallel computing, and facilitate the construction of confidence intervals. The proposed scheme only requires access to off-the-shelf Particle Filters (PF) and is thus easier to implement than recently proposed unbiased smoothers. The approach is demonstrated on a L\'evy-driven stochastic volatility model and a stochastic kinetic model.

### Exploiting locality in high-dimensional factorial hidden Markov models

We propose algorithms for approximate filtering and smoothing in high-dimensional factorial hidden Markov models. The approximation involves discarding, in a principled way, likelihood factors according a notion of locality in a factor graph associated with the emission distribution. This allows the exponential-in-dimension cost of exact filtering and smoothing to be avoided. We prove that the approximation accuracy, measured in a local total variation norm, is `dimension-free' in the sense that as the overall dimension of the model increases the error bounds we derive do not necessarily degrade. A key step in the analysis is to quantify the error introduced by localizing the likelihood function in a Bayes' rule update. The factorial structure of the likelihood function which we exploit arises naturally when data have known spatial or network structure. We demonstrate the new algorithms on synthetic examples and a London Underground passenger flow problem, where the factor graph is effectively given by the train network.