Markov Models
Reviews: Deep Homogeneous Mixture Models: Representation, Separation, and Approximation
The paper discusses connections between multiple density models within the unifying framework of homogeneous mixture models: tensorial mixtures models [1], hidden Markov models, latent tree models and sum-product networks [2] are discussed. The authors argue that there is a hierarchy among these models by showing that a model lower in the hierarchy can be cast into a model higher in the hierarchy using linear size transformations. Furthermore, the paper gives new theoretical insights in depth efficiency in these models, by establishing a connection between properties of the represented mixture coefficient tensor (e.g. Finally, the paper gives positive and somewhat surprising approximation results using [3]. Strengths: connections between various models, which so far were somewhat folk wisdom, are illustrated a unifying tensor mixture framework.
Reviews: Lifted Weighted Mini-Bucket
The paper proposes a lifted version of the weighted mini-bucket inference algorithm. Weighted mini-bucket is a variant of Variable Elimination that can trade off computational cost (e.g., achieving runtime sub-exponential in the treewidth) for accuracy. It essentially uses Holder inequality and variational approximations to represent "messages" that do not fit in the available memory budget. The main idea is to extend the approach to relational models (e.g., Markov Logic Networks) to take advantage of the additional structure, specifically, the fact that ground factors are produced from first-order templates and therefore share significant structure. The work appears to be solid, but it is difficult to parse.
Reviews: Near-Optimal Time and Sample Complexities for Solving Markov Decision Processes with a Generative Model
In this paper the authors present PAC-RL results for finite-spaces discounted MDP with a generative sampling model (which can generate in O(1) time a state transition for each given state-action pair). They introduce a value-iteration type algorithm for finding an \epsilon -optimal policy, and prove that its running time and sample complexity are nearly the best possible and match, up to logarithmic factors, the lower bounds established previously in [AMK13]. These results are significant: They improve over a recent result in [SWWY18] by a factor of (1 - \gamma) {-1} ( \gamma is the discount factor), and they also fill a known gap in the earlier results of [AMK13] regarding the sample complexity of finding a near-optimal policy. Both the algorithm and its analysis are very interesting. They build upon the prior work [AMK13] and [SWWY18], and combine the algorithmic ideas and proof techniques from both papers in a highly non-trivial and original way.
OPTIMA: Optimized Policy for Intelligent Multi-Agent Systems Enables Coordination-Aware Autonomous Vehicles
Du, Rui, Zhao, Kai, Hou, Jinlong, Zhang, Qiang, Zhang, Peter
Coordination among connected and autonomous vehicles (CAVs) is advancing due to developments in control and communication technologies. However, much of the current work is based on oversimplified and unrealistic task-specific assumptions, which may introduce vulnerabilities. This is critical because CAVs not only interact with their environment but are also integral parts of it. Insufficient exploration can result in policies that carry latent risks, highlighting the need for methods that explore the environment both extensively and efficiently. This work introduces OPTIMA, a novel distributed reinforcement learning framework for cooperative autonomous vehicle tasks. OPTIMA alternates between thorough data sampling from environmental interactions and multi-agent reinforcement learning algorithms to optimize CAV cooperation, emphasizing both safety and efficiency. Our goal is to improve the generality and performance of CAVs in highly complex and crowded scenarios. Furthermore, the industrial-scale distributed training system easily adapts to different algorithms, reward functions, and strategies.
Neural Collaborative Filtering to Detect Anomalies in Human Semantic Trajectories
Liu, Yueyang, Kennedy, Lance, Amiri, Hossein, Zรผfle, Andreas
Human trajectory anomaly detection has become increasingly important across a wide range of applications, including security surveillance and public health. However, existing trajectory anomaly detection methods are primarily focused on vehicle-level traffic, while human-level trajectory anomaly detection remains under-explored. Since human trajectory data is often very sparse, machine learning methods have become the preferred approach for identifying complex patterns. However, concerns regarding potential biases and the robustness of these models have intensified the demand for more transparent and explainable alternatives. In response to these challenges, our research focuses on developing a lightweight anomaly detection model specifically designed to detect anomalies in human trajectories. We propose a Neural Collaborative Filtering approach to model and predict normal mobility. Our method is designed to model users' daily patterns of life without requiring prior knowledge, thereby enhancing performance in scenarios where data is sparse or incomplete, such as in cold start situations. Our algorithm consists of two main modules. The first is the collaborative filtering module, which applies collaborative filtering to model normal mobility of individual humans to places of interest. The second is the neural module, responsible for interpreting the complex spatio-temporal relationships inherent in human trajectory data. To validate our approach, we conducted extensive experiments using simulated and real-world datasets comparing to numerous state-of-the-art trajectory anomaly detection approaches.
Flipping-based Policy for Chance-Constrained Markov Decision Processes
Shen, Xun, Jiang, Shuo, Wachi, Akifumi, Hashimoto, Kaumune, Gros, Sebastien
Safe reinforcement learning (RL) is a promising approach for many real-world decision-making problems where ensuring safety is a critical necessity. In safe RL research, while expected cumulative safety constraints (ECSCs) are typically the first choices, chance constraints are often more pragmatic for incorporating safety under uncertainties. This paper proposes a \textit{flipping-based policy} for Chance-Constrained Markov Decision Processes (CCMDPs). The flipping-based policy selects the next action by tossing a potentially distorted coin between two action candidates. The probability of the flip and the two action candidates vary depending on the state. We establish a Bellman equation for CCMDPs and further prove the existence of a flipping-based policy within the optimal solution sets. Since solving the problem with joint chance constraints is challenging in practice, we then prove that joint chance constraints can be approximated into Expected Cumulative Safety Constraints (ECSCs) and that there exists a flipping-based policy in the optimal solution sets for constrained MDPs with ECSCs. As a specific instance of practical implementations, we present a framework for adapting constrained policy optimization to train a flipping-based policy. This framework can be applied to other safe RL algorithms. We demonstrate that the flipping-based policy can improve the performance of the existing safe RL algorithms under the same limits of safety constraints on Safety Gym benchmarks.
Harnessing the Power of Noise: A Survey of Techniques and Applications
Abdolazimi, Reyhaneh, Jin, Shengmin, Varshney, Pramod K., Zafarani, Reza
In Computer science and across various engineering fields, noise is often considered a nuisance and annoyance. It distorts details and makes data less accurate. In the past, the goal has often been to eliminate noise with the goal to make systems more reliable and accurate. But views on noise are changing. New findings suggest that noise can actually enhance and advance technologies in many areas, making us see it not just as a disruption but as a way to improve system performance. Thus, once unwanted and hard to control, noise now appears to be a key player in improving the performance of complex information processing systems [22]. This phenomena is often known as Stochastic Resonance, which helps clear up signals, improve image quality, and strengthen models in machine learning [7, 22, 101]. This duality of noise -- both a problem and a benefit -- highlights the tricky role of noise while optimizing advanced neural networks and machine learning models.
QGym: Scalable Simulation and Benchmarking of Queuing Network Controllers
Chen, Haozhe, Li, Ang, Che, Ethan, Peng, Tianyi, Dong, Jing, Namkoong, Hongseok
Queuing network control determines the allocation of scarce resources to manage congestion, a fundamental problem in manufacturing, communications, and healthcare. Compared to standard RL problems, queueing problems are distinguished by unique challenges: i) a system operating in continuous time, ii) high stochasticity, and iii) long horizons over which the system can become unstable (exploding delays). To spur methodological progress tackling these challenges, we present an open-sourced queueing simulation framework, QGym, that benchmark queueing policies across realistic problem instances. Our modular framework allows the researchers to build on our initial instances, which provide a wide range of environments including parallel servers, criss-cross, tandem, and re-entrant networks, as well as a realistically calibrated hospital queuing system. QGym makes it easy to compare multiple policies, including both model-free RL methods and classical queuing policies. Our testbed complements the traditional focus on evaluating algorithms based on mathematical guarantees in idealized settings, and significantly expands the scope of empirical benchmarking in prior work.
Jet Expansions of Residual Computation
Chen, Yihong, Xu, Xiangxiang, Lu, Yao, Stenetorp, Pontus, Franceschi, Luca
We introduce a framework for expanding residual computational graphs using jets, operators that generalize truncated Taylor series. Our method provides a systematic approach to disentangle contributions of different computational paths to model predictions. In contrast to existing techniques such as distillation, probing, or early decoding, our expansions rely solely on the model itself and requires no data, training, or sampling from the model. We demonstrate how our framework grounds and subsumes logit lens, reveals a (super-)exponential path structure in the recursive residual depth and opens up several applications. These include sketching a transformer large language model with n-gram statistics extracted from its computations, and indexing the models' levels of toxicity knowledge. Our approach enables data-free analysis of residual computation for model interpretability, development, and evaluation. The project website can be found here. Machine learning models, particularly large-scale foundation models, have become increasingly prevalent and impactful across a wide range of domains (Wei et al., 2021; Bommasani et al., 2023; Touvron et al., 2023b). While delivering strong results, their black-box nature has led to the development of techniques to assess their behavior and gain insights into their internal mechanisms. In this space, mechanistic interpretability (MI) (see e.g. Bereska & Gavves, 2024; Ferrando et al., 2024, for recent surverys) has emerged as an alternative to more classic local attribution methods such as SHAP (Lundberg, 2017) or integrated gradient (Sundararajan et al., 2017).
Heuristics for Partially Observable Stochastic Contingent Planning
Acting to complete tasks in stochastic partially observable domains is an important problem in artificial intelligence, and is often formulated as a goal-based POMDP. Goal-based POMDPs can be solved using the RTDP-BEL algorithm, that operates by running forward trajectories from the initial belief to the goal. These trajectories can be guided by a heuristic, and more accurate heuristics can result in significantly faster convergence. In this paper, we develop a heuristic function that leverages the structured representation of domain models. We compute, in a relaxed space, a plan to achieve the goal, while taking into account the value of information, as well as the stochastic effects. We provide experiments showing that while our heuristic is slower to compute, it requires an order of magnitude less trajectories before convergence. Overall, it thus speeds up RTDP-BEL, particularly in problems where significant information gathering is needed.