Goto

Collaborating Authors

 Markov Models


A Deep Learning Framework for Assessing Physical Rehabilitation Exercises

arXiv.org Machine Learning

The article proposes a new framework for assessment of physical rehabilitation exercises based on a deep learning approach. The objective of the framework is automated quantification of patient performance in completing prescribed rehabilitation exercises, based on captured whole-body joint trajectories. The main components of the framework are metrics for measuring movement performance, scoring functions for mapping the performance metrics into numerical scores of movement quality, and deep neural network models for regressing quality scores of input movements via supervised learning. Furthermore, an overview of the existing methods for modeling and evaluation of rehabilitation movements is presented, encompassing various distance functions, dimensionality-reduction techniques, and movement models employed for this problem in prior studies. To the best of our knowledge, this is the first work that implements deep neural network for assessment of rehabilitation performance. Multiple deep network architectures are repurposed for the task in hand and are validated on a dataset of rehabilitation exercises.


Geometric Matrix Completion with Deep Conditional Random Fields

arXiv.org Machine Learning

The problem of completing high-dimensional matrices from a limited set of observations arises in many big data applications, especially, recommender systems. Existing matrix completion models generally follow either a memory- or a model-based approach, whereas, geometric matrix completion models combine the best from both approaches. Existing deep-learning-based geometric models yield good performance, but, in order to operate, they require a fixed structure graph capturing the relationships among the users and items. This graph is typically constructed by evaluating a pre-defined similarity metric on the available observations or by using side information, e.g., user profiles. In contrast, Markov-random-fields-based models do not require a fixed structure graph but rely on handcrafted features to make predictions. When no side information is available and the number of available observations becomes very low, existing solutions are pushed to their limits. In this paper, we propose a geometric matrix completion approach that addresses these challenges. We consider matrix completion as a structured prediction problem in a conditional random field (CRF), which is characterized by a maximum a posterior (MAP) inference, and we propose a deep model that predicts the missing entries by solving the MAP inference problem. The proposed model simultaneously learns the similarities among matrix entries, computes the CRF potentials, and solves the inference problem. Its training is performed in an end-to-end manner, with a method to supervise the learning of entry similarities. Comprehensive experiments demonstrate the superior performance of the proposed model compared to various state-of-the-art models on popular benchmark datasets and underline its superior capacity to deal with highly incomplete matrices.


Kernel embedded nonlinear observational mappings in the variational mapping particle filter

arXiv.org Machine Learning

Recently, some works have suggested methods to combine variational probabilistic inference with Monte Carlo sampling. One promising approach is via local optimal transport. In this approach, a gradient steepest descent method based on local optimal transport principles is formulated to transform deterministically point samples from an intermediate density to a posterior density. The local mappings that transform the intermediate densities are embedded in a reproducing kernel Hilbert space (RKHS). This variational mapping method requires the evaluation of the log-posterior density gradient and therefore the adjoint of the observational operator. In this work, we evaluate nonlinear observational mappings in the variational mapping method using two approximations that avoid the adjoint, an ensemble based approximation in which the gradient is approximated by the particle covariances in the state and observational spaces the so-called ensemble space and an RKHS approximation in which the observational mapping is embedded in an RKHS and the gradient is derived there. The approximations are evaluated for highly nonlinear observational operators and in a low-dimensional chaotic dynamical system. The RKHS approximation is shown to be highly successful and superior to the ensemble approximation.


Differentially Private Markov Chain Monte Carlo

arXiv.org Machine Learning

Recent developments in differentially private (DP) machine learning and DP Bayesian learning have enabled learning under strong privacy guarantees for the training data subjects. In this paper, we further extend the applicability of DP Bayesian learning by presenting the first general DP Markov chain Monte Carlo (MCMC) algorithm whose privacy-guarantees are not subject to unrealistic assumptions on Markov chain convergence and that is applicable to posterior inference in arbitrary models. Our algorithm is based on a decomposition of the Barker acceptance test that allows evaluating the R\'enyi DP privacy cost of the accept-reject choice. We further show how to improve the DP guarantee through data subsampling and approximate acceptance tests.


Off-Policy Deep Reinforcement Learning by Bootstrapping the Covariate Shift

arXiv.org Machine Learning

In this paper we revisit the method of off-policy corrections for reinforcement learning (COP-TD) pioneered by Hallak et al. (2017). Under this method, online updates to the value function are reweighted to avoid divergence issues typical of off-policy learning. While Hallak et al.'s solution is appealing, it cannot easily be transferred to nonlinear function approximation. First, it requires a projection step onto the probability simplex; second, even though the operator describing the expected behavior of the off-policy learning algorithm is convergent, it is not known to be a contraction mapping, and hence, may be more unstable in practice. We address these two issues by introducing a discount factor into COP-TD. We analyze the behavior of discounted COP-TD and find it better behaved from a theoretical perspective. We also propose an alternative soft normalization penalty that can be minimized online and obviates the need for an explicit projection step. We complement our analysis with an empirical evaluation of the two techniques in an off-policy setting on the game Pong from the Atari domain where we find discounted COP-TD to be better behaved in practice than the soft normalization penalty. Finally, we perform a more extensive evaluation of discounted COP-TD in 5 games of the Atari domain, where we find performance gains for our approach.


Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon MDP

arXiv.org Machine Learning

The goal of reinforcement learning is to construct algorithms that learn and plan in sequential decision making systems when the underlying system dynamics are unknown. A typical model in RL is Markov Decision Process (MDP). At each time step, the environment is in state s. The agent may take an action a, obtain a reward, and then the environment may transit to another state. In reinforcement learning, the transition probability distribution is unknown. The algorithm needs to learn the transition dynamics of MDP, while aiming to maximize the cumulative reward. This causes an exploration-exploitation dilemma: whether to act to gain new information (explore) or to act consistently with past experience to maximize reward (exploit). Theoretical analysis of reinforcement learning falls into two broad categories: those assuming a simulator (a.k.a.


Provably efficient RL with Rich Observations via Latent State Decoding

arXiv.org Machine Learning

We study the exploration problem in episodic MDPs with rich observations generated from a small number of latent states. Under certain identifiability assumptions, we demonstrate how to estimate a mapping from the observations to latent states inductively through a sequence of regression and clustering steps---where previously decoded latent states provide labels for later regression problems---and use it to construct good exploration policies. We provide finite-sample guarantees on the quality of the learned state decoding function and exploration policies, and complement our theory with an empirical evaluation on a class of hard exploration problems. Our method exponentially improves over $Q$-learning with na\"ive exploration, even when $Q$-learning has cheating access to latent states.


Unsupervised speech representation learning using WaveNet autoencoders

arXiv.org Machine Learning

We consider the task of unsupervised extraction of meaningful latent representations of speech by applying autoencoding neural networks to speech waveforms. The goal is to learn a representation able to capture high level semantic content from the signal, e.g. phoneme identities, while being invariant to confounding low level details in the signal such as the underlying pitch contour or background noise. The behavior of autoencoder models depends on the kind of constraint that is applied to the latent representation. We compare three variants: a simple dimensionality reduction bottleneck, a Gaussian Variational Autoencoder (VAE), and a discrete Vector Quantized VAE (VQ-VAE). We analyze the quality of learned representations in terms of speaker independence, the ability to predict phonetic content, and the ability to accurately reconstruct individual spectrogram frames. Moreover, for discrete encodings extracted using the VQ-VAE, we measure the ease of mapping them to phonemes. We introduce a regularization scheme that forces the representations to focus on the phonetic content of the utterance and report performance comparable with the top entries in the ZeroSpeech 2017 unsupervised acoustic unit discovery task.


Distributed Policy Iteration for Scalable Approximation of Cooperative Multi-Agent Policies

arXiv.org Artificial Intelligence

Decision making in multi-agent systems (MAS) is a great challenge due to enormous state and joint action spaces as well as uncertainty, making centralized control generally infeasible. Decentralized control offers better scalability and robustness but requires mechanisms to coordinate on joint tasks and to avoid conflicts. Common approaches to learn decentralized policies for cooperative MAS suffer from non-stationarity and lacking credit assignment, which can lead to unstable and uncoordinated behavior in complex environments. In this paper, we propose Strong Emergent Policy approximation (STEP), a scalable approach to learn strong decentralized policies for cooperative MAS with a distributed variant of policy iteration. For that, we use function approximation to learn from action recommendations of a decentralized multi-agent planning algorithm. STEP combines decentralized multi-agent planning with centralized learning, only requiring a generative model for distributed black box optimization. We experimentally evaluate STEP in two challenging and stochastic domains with large state and joint action spaces and show that STEP is able to learn stronger policies than standard multi-agent reinforcement learning algorithms, when combining multi-agent open-loop planning with centralized function approximation. The learned policies can be reintegrated into the multi-agent planning process to further improve performance.


Fairness with Dynamics

arXiv.org Machine Learning

It has recently been shown that if feedback effects of decisions are ignored, then imposing fairness constraints such as demographic parity or equality of opportunity can actually exacerbate unfairness. We propose to address this challenge by modeling feedback effects as the dynamics of a Markov decision processes (MDPs). First, we define analogs of fairness properties that have been proposed for supervised learning. Second, we propose algorithms for learning fair decision-making policies for MDPs. We also explore extensions to reinforcement learning, where parts of the dynamical system are unknown and must be learned without violating fairness. Finally, we demonstrate the need to account for dynamical effects using simulations on a loan applicant MDP.