Goto

Collaborating Authors

 Markov Models


Approximating Posterior Predictive Distributions by Averaging Output From Many Particle Filters

arXiv.org Machine Learning

This paper introduces the {\it particle swarm algorithm}, a recursive and embarrassingly parallel algorithm that targets an approximation to the sequence of posterior predictive distributions by averaging expectation approximations from many particle filters. A law of large numbers and a central limit theorem are provided, as well as an numerical study of simulated data from a stochastic volatility model.


Policy learning with partial observation and mechanical constraints for multi-person modeling

arXiv.org Machine Learning

Extracting the rules of real-world biological multi-agent behaviors is a current challenge in various scientific and engineering fields. Biological agents generally have limited observation and mechanical constraints; however, most of the conventional data-driven models ignore such assumptions, resulting in lack of biological plausibility and model interpretability for behavioral analyses in biological and cognitive science. Here we propose sequential generative models with partial observation and mechanical constraints, which can visualize whose information the agents utilize and can generate biologically plausible actions. We formulate this as a decentralized multi-agent imitation learning problem, leveraging binary partial observation models with a Gumbel-Softmax reparameterization and policy models based on hierarchical variational recurrent neural networks with physical and biomechanical constraints. We investigate the empirical performances using real-world multi-person motion datasets from basketball and soccer games.


Remote Sensing Scientist at Leidos in Arlington, VA

#artificialintelligence

Want to be a part of an elite team where our innovative technical solutions are delivered to customers that advance the state of the art while addressing long-term problems of importance to national security? At our Leidos' Multi-Spectrum Warfare Research and Analytics Systems (MSWRAS) Division, an organization in the Leidos Innovation Center (LInC), we are looking for you, our next Scientist who specializes in remote sensing data analytics. Join our team of Ph.D. level peers in designing and developing advanced technology-based solutions for contract research and development projects working in our Arlington, VA office. Fun roles you will have in this job: Describe instances of successful, proven, and demonstrable experience contributing to the technical work as part of cross-discipline teams in the development and integration of software-based solutions for competitive, contract-based applied research programs Work with teams composed of members from industry, small businesses, and academic-based researchers and should have experience working on projects focused on multiple technical fields such as machine learning, artificial intelligence, engineering, and software development and integration Describe how the work products to which they contributed had solved customers' problems in such domains as energy, health, and national security or in the commercial sector Work within the MSWRAS Division and across the LInC, performing basic and applied contract research and development projects both leading and working under the guidance of senior scientists and engineers. Processing, interpreting and analyzing large volumes of data collected by remote sensing platforms but may also include other types of phenomenological data such as field measurements, or weather data Independently design and undertake new research as well as partner in a team environment across organizations Contribute to the development of creative and innovative R&D approaches to solving major remote sensing analytics challenges and work with potential sponsors (customers or internal champions) to secure funding for new research efforts based on those topics Contribute to the productivity of teams composed of fellow researchers, data scientists, data engineers, and software engineers to execute complex R&D programs Under the guidance of a senior scientist or engineer, design and develop or integrate secure and scalable applications that are part of broader solutions, that are applicable across multiple domains.


Boltzmann machine learning with a variational quantum algorithm

#artificialintelligence

Boltzmann machine is a powerful tool for modeling probability distributions that govern the training data. A thermal equilibrium state is typically used for Boltzmann machine learning to obtain a suitable probability distribution. The Boltzmann machine learning consists of calculating the gradient of the loss function given in terms of the thermal average, which is the most time consuming procedure. Here, we propose a method to implement the Boltzmann machine learning by using Noisy Intermediate-Scale Quantum (NISQ) devices. We prepare an initial pure state that contains all possible computational basis states with the same amplitude, and apply a variational imaginary time simulation. Readout of the state after the evolution in the computational basis approximates the probability distribution of the thermal equilibrium state that is used for the Boltzmann machine learning. We actually perform the numerical simulations of our scheme and confirm that the Boltzmann machine learning works well by our scheme.


Discount Factor as a Regularizer in Reinforcement Learning

arXiv.org Artificial Intelligence

Specifying a Reinforcement Learning (RL) task involves choosing a suitable planning horizon, which is typically modeled by a discount factor. It is known that applying RL algorithms with a lower discount factor can act as a regularizer, improving performance in the limited data regime. Yet the exact nature of this regularizer has not been investigated. In this work, we fill in this gap. For several Temporal-Difference (TD) learning methods, we show an explicit equivalence between using a reduced discount factor and adding an explicit regularization term to the algorithm's loss. Motivated by the equivalence, we empirically study this technique compared to standard $L_2$ regularization by extensive experiments in discrete and continuous domains, using tabular and functional representations. Our experiments suggest the regularization effectiveness is strongly related to properties of the available data, such as size, distribution, and mixing rate.


Collapsing Bandits and Their Application to Public Health Interventions

arXiv.org Artificial Intelligence

We propose and study Collpasing Bandits, a new restless multi-armed bandit (RMAB) setting in which each arm follows a binary-state Markovian process with a special structure: when an arm is played, the state is fully observed, thus "collapsing" any uncertainty, but when an arm is passive, no observation is made, thus allowing uncertainty to evolve. The goal is to keep as many arms in the "good" state as possible by planning a limited budget of actions per round. Such Collapsing Bandits are natural models for many healthcare domains in which workers must simultaneously monitor patients and deliver interventions in a way that maximizes the health of their patient cohort. Our main contributions are as follows: (i) Building on the Whittle index technique for RMABs, we derive conditions under which the Collapsing Bandits problem is indexable. Our derivation hinges on novel conditions that characterize when the optimal policies may take the form of either "forward" or "reverse" threshold policies. (ii) We exploit the optimality of threshold policies to build fast algorithms for computing the Whittle index, including a closed-form. (iii) We evaluate our algorithm on several data distributions including data from a real-world healthcare task in which a worker must monitor and deliver interventions to maximize their patients' adherence to tuberculosis medication. Our algorithm achieves a 3-order-of-magnitude speedup compared to state-of-the-art RMAB techniques while achieving similar performance.


Learning intuitive physics and one-shot imitation using state-action-prediction self-organizing maps

arXiv.org Artificial Intelligence

Human learning and intelligence work differently from the supervised pattern recognition approach adopted in most deep learning architectures. Humans seem to learn rich representations by exploration and imitation, build causal models of the world, and use both to flexibly solve new tasks. We suggest a simple but effective unsupervised model which develops such characteristics. The agent learns to represent the dynamical physical properties of its environment by intrinsically motivated exploration, and performs inference on this representation to reach goals. For this, a set of self-organizing maps which represent state-action pairs is combined with a causal model for sequence prediction. The proposed system is evaluated in the cartpole environment. After an initial phase of playful exploration, the agent can execute kinematic simulations of the environment's future, and use those for action planning. We demonstrate its performance on a set of several related, but different one-shot imitation tasks, which the agent flexibly solves in an active inference style.


Online learning in MDPs with linear function approximation and bandit feedback

arXiv.org Machine Learning

We consider an online learning problem where the learner interacts with a Markov decision process in a sequence of episodes, where the reward function is allowed to change between episodes in an adversarial manner and the learner only gets to observe the rewards associated with its actions. We allow the state space to be arbitrarily large, but we assume that all action-value functions can be represented as linear functions in terms of a known low-dimensional feature map, and that the learner has access to a simulator of the environment that allows generating trajectories from the true MDP dynamics. Our main contribution is developing a computationally efficient algorithm that we call MDP-LinExp3, and prove that its regret is bounded by $\widetilde{\mathcal{O}}\big(H^2 T^{2/3} (dK)^{1/3}\big)$, where $T$ is the number of episodes, $H$ is the number of steps in each episode, $K$ is the number of actions, and $d$ is the dimension of the feature map. We also show that the regret can be improved to $\widetilde{\mathcal{O}}\big(H^2 \sqrt{TdK}\big)$ under much stronger assumptions on the MDP dynamics. To our knowledge, MDP-LinExp3 is the first provably efficient algorithm for this problem setting.


Deep reinforcement learning driven inspection and maintenance planning under incomplete information and constraints

arXiv.org Artificial Intelligence

Determination of inspection and maintenance policies for minimizing long-term risks and costs in deteriorating engineering environments constitutes a complex optimization problem. Major computational challenges include the (i) curse of dimensionality, due to exponential scaling of state/action set cardinalities with the number of components; (ii) curse of history, related to exponentially growing decision-trees with the number of decision-steps; (iii) presence of state uncertainties, induced by inherent environment stochasticity and variability of inspection/monitoring measurements; (iv) presence of constraints, pertaining to stochastic long-term limitations, due to resource scarcity and other infeasible/undesirable system responses. In this work, these challenges are addressed within a joint framework of constrained Partially Observable Markov Decision Processes (POMDP) and multi-agent Deep Reinforcement Learning (DRL). POMDPs optimally tackle (ii)-(iii), combining stochastic dynamic programming with Bayesian inference principles. Multi-agent DRL addresses (i), through deep function parametrizations and decentralized control assumptions. Challenge (iv) is herein handled through proper state augmentation and Lagrangian relaxation, with emphasis on life-cycle risk-based constraints and budget limitations. The underlying algorithmic steps are provided, and the proposed framework is found to outperform well-established policy baselines and facilitate adept prescription of inspection and intervention actions, in cases where decisions must be made in the most resource- and risk-aware manner.


Robust Inverse Reinforcement Learning under Transition Dynamics Mismatch

arXiv.org Machine Learning

We study the inverse reinforcement learning (IRL) problem under the \emph{transition dynamics mismatch} between the expert and the learner. In particular, we consider the Maximum Causal Entropy (MCE) IRL learner model and provide an upper bound on the learner's performance degradation based on the $\ell_1$-distance between the two transition dynamics of the expert and the learner. Then, by leveraging insights from the Robust RL literature, we propose a robust MCE IRL algorithm, which is a principled approach to help with this mismatch issue. Finally, we empirically demonstrate the stable performance of our algorithm compared to the standard MCE IRL algorithm under transition mismatches in finite MDP problems.