Markov Models
Multi-View Reinforcement Learning
This paper is concerned with multi-view reinforcement learning (MVRL), which allows for decision making when agents share common dynamics but adhere to different observation models. We define the MVRL framework by extending partially observable Markov decision processes (POMDPs) to support more than one observation model and propose two solution methods through observation augmentation and cross-view policy transfer. We empirically evaluate our method and demonstrate its effectiveness in a variety of environments. Specifically, we show reductions in sample complexities and computational time for acquiring policies that handle multi-view environments.
Gradient-based Adaptive Markov Chain Monte Carlo
We introduce a gradient-based learning method to automatically adapt Markov chain Monte Carlo (MCMC) proposal distributions to intractable targets. We define a maximum entropy regularised objective function, referred to as generalised speed measure, which can be robustly optimised over the parameters of the proposal distribution by applying stochastic gradient optimisation. An advantage of our method compared to traditional adaptive MCMC methods is that the adaptation occurs even when candidate state values are rejected. This is a highly desirable property of any adaptation strategy because the adaptation starts in early iterations even if the initial proposal distribution is far from optimum. We apply the framework for learning multivariate random walk Metropolis and Metropolis-adjusted Langevin proposals with full covariance matrices, and provide empirical evidence that our method can outperform other MCMC algorithms, including Hamiltonian Monte Carlo schemes.
Near-Optimal Offline Reinforcement Learning via Double Variance Reduction
We consider the problem of offline reinforcement learning (RL) --- a well-motivated setting of RL that aims at policy optimization using only historical data. Despite its wide applicability, theoretical understandings of offline RL, such as its optimal sample complexity, remain largely open even in basic settings such as \emph{tabular} Markov Decision Processes (MDPs). In this paper, we propose \emph{Off-Policy Double Variance Reduction} (OPDVR), a new variance reduction-based algorithm for offline RL. Our main result shows that OPDVR provably identifies an \epsilon -optimal policy with \widetilde{O}(H 2/d_m\epsilon 2) episodes of offline data in the finite-horizon \emph{stationary transition} setting, where H is the horizon length and d_m is the minimal marginal state-action distribution induced by the behavior policy. This improves over the best-known upper bound by a factor of H .
Large Scale Markov Decision Processes with Changing Rewards
We consider Markov Decision Processes (MDPs) where the rewards are unknown and may change in an adversarial manner. We provide an algorithm that achieves a regret bound of O( \sqrt{\tau (\ln S \ln A)T}\ln(T)), where S is the state space, A is the action space, \tau is the mixing time of the MDP, and T is the number of periods. The algorithm's computational complexity is polynomial in S and A . We then consider a setting often encountered in practice, where the state space of the MDP is too large to allow for exact solutions. By approximating the state-action occupancy measures with a linear architecture of dimension d\ll S, we propose a modified algorithm with a computational complexity polynomial in d and independent of S .
From Boltzmann Machines to Neural Networks and Back Again
Graphical models are powerful tools for modeling high-dimensional data, but learning graphical models in the presence of latent variables is well-known to be difficult. In this work we give new results for learning Restricted Boltzmann Machines, probably the most well-studied class of latent variable models. Our results are based on new connections to learning two-layer neural networks under \ell_{\infty} bounded input; for both problems, we give nearly optimal results under the conjectured hardness of sparse parity with noise. Using the connection between RBMs and feedforward networks, we also initiate the theoretical study of {\em supervised RBMs} \citep{hinton2012practical}, a version of neural-network learning that couples distributional assumptions induced from the underlying graphical model with the architecture of the unknown function class. We then give an algorithm for learning a natural class of supervised RBMs with better runtime than what is possible for its related class of networks without distributional assumptions.
Planning in entropy-regularized Markov decision processes and games
We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the SmoothCruiser. SmoothCruiser makes use of the smoothness of the Bellman operator promoted by the regularization to achieve problem-independent sample complexity of order \tilde{\mathcal{O}}(1/\epsilon 4) for a desired accuracy \epsilon, whereas for non-regularized settings there are no known algorithms with guaranteed polynomial sample complexity in the worst case.
Scalable Inference of Sparsely-changing Gaussian Markov Random Fields
We study the problem of inferring time-varying Gaussian Markov random fields, where the underlying graphical model is both sparse and changes {sparsely} over time. Most of the existing methods for the inference of time-varying Markov random fields (MRFs) rely on the \textit{regularized maximum likelihood estimation} (MLE), that typically suffer from weak statistical guarantees and high computational time. Instead, we introduce a new class of constrained optimization problems for the inference of sparsely-changing Gaussian MRFs (GMRFs). The proposed optimization problem is formulated based on the exact \ell_0 regularization, and can be solved in near-linear time and memory. Moreover, we show that the proposed estimator enjoys a provably small estimation error.
Parameter elimination in particle Gibbs sampling
Bayesian inference in state-space models is challenging due to high-dimensional state trajectories. A viable approach is particle Markov chain Monte Carlo (PMCMC), combining MCMC and sequential Monte Carlo to form exact approximations'' to otherwise-intractable MCMC methods. The performance of the approximation is limited to that of the exact method. We focus on particle Gibbs (PG) and particle Gibbs with ancestor sampling (PGAS), improving their performance beyond that of the ideal Gibbs sampler (which they approximate) by marginalizing out one or more parameters. This is possible when the parameter(s) has a conjugate prior relationship with the complete data likelihood.
KnowGraph: Knowledge-Enabled Anomaly Detection via Logical Reasoning on Graph Data
Zhou, Andy, Xu, Xiaojun, Raghunathan, Ramesh, Lal, Alok, Guan, Xinze, Yu, Bin, Li, Bo
Graph-based anomaly detection is pivotal in diverse security applications, such as fraud detection in transaction networks and intrusion detection for network traffic. Standard approaches, including Graph Neural Networks (GNNs), often struggle to generalize across shifting data distributions. Meanwhile, real-world domain knowledge is more stable and a common existing component of real-world detection strategies. To explicitly integrate such knowledge into data-driven models such as GCNs, we propose KnowGraph, which integrates domain knowledge with data-driven learning for enhanced graph-based anomaly detection. KnowGraph comprises two principal components: (1) a statistical learning component that utilizes a main model for the overarching detection task, augmented by multiple specialized knowledge models that predict domain-specific semantic entities; (2) a reasoning component that employs probabilistic graphical models to execute logical inferences based on model outputs, encoding domain knowledge through weighted first-order logic formulas. Extensive experiments on these large-scale real-world datasets show that KnowGraph consistently outperforms state-of-the-art baselines in both transductive and inductive settings, achieving substantial gains in average precision when generalizing to completely unseen test graphs. Further ablation studies demonstrate the effectiveness of the proposed reasoning component in improving detection performance, especially under extreme class imbalance. These results highlight the potential of integrating domain knowledge into data-driven models for high-stakes, graph-based security applications.
A Gentle Introduction and Tutorial on Deep Generative Models in Transportation Research
Choi, Seongjin, Jin, Zhixiong, Ham, Seung Woo, Kim, Jiwon, Sun, Lijun
Deep Generative Models (DGMs) have rapidly advanced in recent years, becoming essential tools in various fields due to their ability to learn complex data distributions and generate synthetic data. Their importance in transportation research is increasingly recognized, particularly for applications like traffic data generation, prediction, and feature extraction. This paper offers a comprehensive introduction and tutorial on DGMs, with a focus on their applications in transportation. It begins with an overview of generative models, followed by detailed explanations of fundamental models, a systematic review of the literature, and practical tutorial code to aid implementation. The paper also discusses current challenges and opportunities, highlighting how these models can be effectively utilized and further developed in transportation research. This paper serves as a valuable reference, guiding researchers and practitioners from foundational knowledge to advanced applications of DGMs in transportation research.