Goto

Collaborating Authors

 Markov Models


Horizon-Free Regret for Linear Markov Decision Processes

arXiv.org Artificial Intelligence

A recent line of works showed regret bounds in reinforcement learning (RL) can be (nearly) independent of planning horizon, a.k.a. the horizon-free bounds. However, these regret bounds only apply to settings where a polynomial dependency on the size of transition model is allowed, such as tabular Markov Decision Process (MDP) and linear mixture MDP. We give the first horizon-free bound for the popular linear MDP setting where the size of the transition model can be exponentially large or even uncountable. In contrast to prior works which explicitly estimate the transition model and compute the inhomogeneous value functions at different time steps, we directly estimate the value functions and confidence sets. We obtain the horizon-free bound by: (1) maintaining multiple weighted least square estimators for the value functions; and (2) a structural lemma which shows the maximal total variation of the inhomogeneous value functions is bounded by a polynomial factor of the feature dimension.


Diffusion-Reinforcement Learning Hierarchical Motion Planning in Adversarial Multi-agent Games

arXiv.org Artificial Intelligence

Reinforcement Learning- (RL-)based motion planning has recently shown the potential to outperform traditional approaches from autonomous navigation to robot manipulation. In this work, we focus on a motion planning task for an evasive target in a partially observable multi-agent adversarial pursuit-evasion games (PEG). These pursuit-evasion problems are relevant to various applications, such as search and rescue operations and surveillance robots, where robots must effectively plan their actions to gather intelligence or accomplish mission tasks while avoiding detection or capture themselves. We propose a hierarchical architecture that integrates a high-level diffusion model to plan global paths responsive to environment data while a low-level RL algorithm reasons about evasive versus global path-following behavior. Our approach outperforms baselines by 51.2% by leveraging the diffusion model to guide the RL algorithm for more efficient exploration and improves the explanability and predictability.


An Improved Strategy for Blood Glucose Control Using Multi-Step Deep Reinforcement Learning

arXiv.org Artificial Intelligence

Diabetes profoundly affects human life and health, regardless of country, age, or gender, and is one of the leading causes of death and disability worldwide [1]. From 1990 to 2021, the age-standardized prevalence of diabetes increased by 90.5 % globally, with increases of more than 100 % in several regions, and it is projected that by 2050, there will be 1.31 billion people with diabetes worldwide [1]. Furthermore, people with diabetes have more than twice the normal risk of early death, resulting in an estimated 150-500 million deaths around the world each year, while generating approximately 12% of health expenditure ($966 billion) [2, 3]. The rising prevalence and serious health and economic hazards have attracted the attention of scientists around the globe, and as a result, the number of studies on diabetes is increasing. The pancreas of a diabetic does not produce or produces very little insulin, or the insulin produced is not used efficiently, leading to high BG and a variety of life-threatening complications such as cardiovascular disease, nerve damage, kidney damage, lower limb amputations, and eye disease leading to decreased vision and even blindness [3]. BG control is their basic treatment, as well as the basis for preventing and treating diabetic complications. Patients mainly maintain the stability of BG by injecting insulin. However, this traditional self-management is usually cumbersome and challenging, as it requires patients to measure their BG levels several times a day, while they suffer from many of the aforementioned complications [2].


Single- and Multi-Agent Private Active Sensing: A Deep Neuroevolution Approach

arXiv.org Artificial Intelligence

The problem of single-agent Evasive AHT (EAHT), Active Hypothesis Testing (AHT) refers to the family of where a passive Eavesdropper (Eve) collects noisy estimates problems where one legitimate agent or decision maker, or a of the legit observations and tries to infer the underlying group of collaborating agents or decision makers, adaptively hypothesis, was studied in [24], focusing however explicitly select(s) sensing actions and collect(s) observations in order on the asymptotical case. In that work, the authors formulated to infer the underlying true hypothesis in a fast and reliable single-agent EAHT as a constrained optimization problem manner [1], [2]. AHT and related problems, such as active including the legitimate agent's and the Eavesdropper's (Eve) parameter estimation [3] and active change point detection [4], error exponent. However, near-optimal or optimal action selection [5], find numerous applications in wireless communications, policies were not presented. In this paper, motivated including anomaly detection over sensor networks [6], strong by the lack of explicit policies for EAHT, we present novel or weak radar models for target detection [7], cyber-intrusion single-and multi-agent EAHT approaches for wireless sensor detection, target search, and adaptive beamforming [8], as well networks that are based on a deep NeuroEvolution (NE) as, very recently, RIS-enabled localization [9] and channel framework. Our contributions are summarized as follows: estimation [10]. In addition, AHT is closely related to the 1) We formulate the single-agent EAHT problem studied feedback channel coding problem [11].


Generative Modelling of Stochastic Rotating Shallow Water Noise

arXiv.org Machine Learning

In recent work, the authors have developed a generic methodology for calibrating the noise in fluid dynamics stochastic partial differential equations where the stochasticity was introduced to parametrize subgrid-scale processes. The stochastic parameterization of sub-grid scale processes is required in the estimation of uncertainty in weather and climate predictions, to represent systematic model errors arising from subgrid-scale fluctuations. The previous methodology used a principal component analysis (PCA) technique based on the ansatz that the increments of the stochastic parametrization are normally distributed. In this paper, the PCA technique is replaced by a generative model technique. This enables us to avoid imposing additional constraints on the increments. The methodology is tested on a stochastic rotating shallow water model with the elevation variable of the model used as input data. The numerical simulations show that the noise is indeed non-Gaussian. The generative modelling technology gives good RMSE, CRPS score and forecast rank histogram results.


A resource-constrained stochastic scheduling algorithm for homeless street outreach and gleaning edible food

arXiv.org Machine Learning

We developed a common algorithmic solution addressing the problem of resource-constrained outreach encountered by social change organizations with different missions and operations: Breaking Ground -- an organization that helps individuals experiencing homelessness in New York transition to permanent housing and Leket -- the national food bank of Israel that rescues food from farms and elsewhere to feed the hungry. Specifically, we developed an estimation and optimization approach for partially-observed episodic restless bandits under $k$-step transitions. The results show that our Thompson sampling with Markov chain recovery (via Stein variational gradient descent) algorithm significantly outperforms baselines for the problems of both organizations. We carried out this work in a prospective manner with the express goal of devising a flexible-enough but also useful-enough solution that can help overcome a lack of sustainable impact in data science for social good.


Spectral Methods for Learning Multivariate Latent Tree Structure

Neural Information Processing Systems

This work considers the problem of learning the structure of multivariate linear tree models, which include a variety of directed tree graphical models with continuous, discrete, and mixed latent variables such as linear-Gaussian models, hidden Markov models, Gaussian mixture models, and Markov evolutionary trees. The setting is one where we only have samples from certain observed variables in the tree, and our goal is to estimate the tree structure (i.e., the graph of how the underlying hidden variables are connected to each other and to the observed variables). We propose the Spectral Recursive Grouping algorithm, an efficient and simple bottom-up procedure for recovering the tree structure from independent samples of the observed variables. Our finite sample size bounds for exact recovery of the tree structure reveal certain natural dependencies on underlying statistical and structural properties of the underlying joint distribution. Furthermore, our sample complexity guarantees have no explicit dependence on the dimensionality of the observed variables, making the algorithm applicable to many high-dimensional settings. At the heart of our algorithm is a spectral quartet test for determining the relative topology of a quartet of variables from second-order statistics.


Periodic Finite State Controllers for Efficient POMDP and DEC-POMDP Planning

Neural Information Processing Systems

Applications such as robot control and wireless communication require planning under uncertainty. Partially observable Markov decision processes (POMDPs) plan policies for single agents under uncertainty and their decentralized versions (DEC-POMDPs) find a policy for multiple agents. The policy in infinite-horizon POMDP and DEC-POMDP problems has been represented as finite state controllers(FSCs). We introduce a novel class of periodic FSCs, composed of layers connected only to the previous and next layer. Our periodic FSC method finds a deterministic finite-horizon policy and converts it to an initial periodic infinitehorizon policy. This policy is optimized by a new infinite-horizon algorithm to yield deterministic periodic policies, and by a new expectation maximization algorithm to yield stochastic periodic policies. Our method yields better results than earlier planning methods and can compute larger solutions than with regular FSCs.


On the Analysis of Multi-Channel Neural Spike Data

Neural Information Processing Systems

Nonparametric Bayesian methods are developed for analysis of multi-channel spike-train data, with the feature learning and spike sorting performed jointly. The feature learning and sorting are performed simultaneously across all channels. Dictionary learning is implemented via the beta-Bernoulli process, with spike sorting performed via the dynamic hierarchical Dirichlet process (dHDP), with these two models coupled. The dHDP is augmented to eliminate refractoryperiod violations, it allows the "appearance" and "disappearance" of neurons over time, and it models smooth variation in the spike statistics.


Priors over Recurrent Continuous Time Processes

Neural Information Processing Systems

We introduce the Gamma-Exponential Process (GEP), a prior over a large family of continuous time stochastic processes. A hierarchical version of this prior (HGEP; the Hierarchical GEP) yields a useful model for analyzing complex time series. Models based on HGEPs display many attractive properties: conjugacy, exchangeability and closed-form predictive distribution for the waiting times, and exact Gibbs updates for the time scale parameters. After establishing these properties, we show how posterior inference can be carried efficiently using Particle MCMC methods [1]. This yields a MCMC algorithm that can resample entire sequences atomically while avoiding the complications of introducing slice and stick auxiliary variables of the beam sampler [2]. We applied our model to the problem of estimating the disease progression in multiple sclerosis [3], and to RNA evolutionary modeling [4]. In both domains, we found that our model outperformed the standard rate matrix estimation approach.