Goto

Collaborating Authors

 Markov Models


Linear chain conditional random fields, hidden Markov models, and related classifiers

arXiv.org Artificial Intelligence

Practitioners use Hidden Markov Models (HMMs) in different problems for about sixty years. Besides, Conditional Random Fields (CRFs) are an alternative to HMMs and appear in the literature as different and somewhat concurrent models. We propose two contributions. First, we show that basic Linear-Chain CRFs (LC-CRFs), considered as different from the HMMs, are in fact equivalent to them in the sense that for each LC-CRF there exists a HMM - that we specify - whom posterior distribution is identical to the given LC-CRF. Second, we show that it is possible to reformulate the generative Bayesian classifiers Maximum Posterior Mode (MPM) and Maximum a Posteriori (MAP) used in HMMs, as discriminative ones. The last point is of importance in many fields, especially in Natural Language Processing (NLP), as it shows that in some situations dropping HMMs in favor of CRFs was not necessary.


Understanding The Robustness of Self-supervised Learning Through Topic Modeling

arXiv.org Artificial Intelligence

Self-supervised learning has significantly improved the performance of many NLP tasks. However, how can self-supervised learning discover useful representations, and why is it better than traditional approaches such as probabilistic models are still largely unknown. In this paper, we focus on the context of topic modeling and highlight a key advantage of self-supervised learning - when applied to data generated by topic models, self-supervised learning can be oblivious to the specific model, and hence is less susceptible to model misspecification. In particular, we prove that commonly used self-supervised objectives based on reconstruction or contrastive samples can both recover useful posterior information for general topic models. Empirically, we show that the same objectives can perform on par with posterior inference using the correct model, while outperforming posterior inference using misspecified models. Recently researchers have successfully trained large-scale models like BERT (Devlin et al., 2018) and GPT (Radford et al., 2018), which offers extremely powerful representations for many NLP tasks (see e.g., Liu et al. (2021); Jaiswal et al. (2021) and references therein). To train these models, often one starts with sentences in a large text corpus, mark random words as "unknown" and ask the neural network to predict the unknown words. This approach is known as self-supervised learning (SSL). Why can self-supervised approaches learn useful representations? To understand this we first need to define what are "useful representations". A recent line of work (Tosh et al., 2021a; Wei et al., 2021) studied self-supervised learning in the context of probabilistic models: assuming the data is generated by a probabilistic model (such as a topic model or Hidden Markov Model), one can define representation of observed data as the corresponding hidden variables in the model (such as topic proportions in topic models or hidden states in Hidden Markov Model). These works show that self-supervised learning approach is as good as explicitly doing inference using such models. This approach naturally leads to the next question - why can self-supervised learning perform better than traditional inferencing based on probabilistic models? In this paper we study this question in the context of topic modeling, and highlight one key advantage for self-supervised learning: robustness to model misspecification. Many different models (such as Latent Dirichlet Allocation (LDA) (Blei et al., 2003), Correlated Topic Model (CTM) (Blei & Lafferty, 2007), Pachinko Allocation Model (PAM) (Li & McCallum, 2006)) have been applied in practice. Traditional approaches would require different ways of doing inference depending on which model is used to generate the data.


Tree-structured Policy Planning with Learned Behavior Models

arXiv.org Artificial Intelligence

Autonomous vehicles (AVs) need to reason about the multimodal behavior of neighboring agents while planning their own motion. Many existing trajectory planners seek a single trajectory that performs well under \emph{all} plausible futures simultaneously, ignoring bi-directional interactions and thus leading to overly conservative plans. Policy planning, whereby the ego agent plans a policy that reacts to the environment's multimodal behavior, is a promising direction as it can account for the action-reaction interactions between the AV and the environment. However, most existing policy planners do not scale to the complexity of real autonomous vehicle applications: they are either not compatible with modern deep learning prediction models, not interpretable, or not able to generate high quality trajectories. To fill this gap, we propose Tree Policy Planning (TPP), a policy planner that is compatible with state-of-the-art deep learning prediction models, generates multistage motion plans, and accounts for the influence of ego agent on the environment behavior. The key idea of TPP is to reduce the continuous optimization problem into a tractable discrete Markov Decision Process (MDP) through the construction of two tree structures: an ego trajectory tree for ego trajectory options, and a scenario tree for multi-modal ego-conditioned environment predictions. We demonstrate the efficacy of TPP in closed-loop simulations based on real-world nuScenes dataset and results show that TPP scales to realistic AV scenarios and significantly outperforms non-policy baselines.


The Provable Benefits of Unsupervised Data Sharing for Offline Reinforcement Learning

arXiv.org Artificial Intelligence

Self-supervised methods have become crucial for advancing deep learning by leveraging data itself to reduce the need for expensive annotations. However, the question of how to conduct self-supervised offline reinforcement learning (RL) in a principled way remains unclear. In this paper, we address this issue by investigating the theoretical benefits of utilizing reward-free data in linear Markov Decision Processes (MDPs) within a semi-supervised setting. Further, we propose a novel, Provable Data Sharing algorithm (PDS) to utilize such reward-free data for offline RL. PDS uses additional penalties on the reward function learned from labeled data to prevent overestimation, ensuring a conservative algorithm. Our results on various offline RL tasks demonstrate that PDS significantly improves the performance of offline RL algorithms with reward-free data. Overall, our work provides a promising approach to leveraging the benefits of unlabeled data in offline RL while maintaining theoretical guarantees. We believe our findings will contribute to developing more robust self-supervised RL methods. Offline reinforcement learning (RL) is a promising framework for learning sequential policies with pre-collected datasets.


Simulation of robot swarms for learning communication-aware coordination

arXiv.org Artificial Intelligence

Robotics research has been focusing on cooperative multi-agent problems, where agents must work together and communicate to achieve a shared objective. To tackle this challenge, we explore imitation learning algorithms. These methods learn a controller by observing demonstrations of an expert, such as the behaviour of a centralised omniscient controller, which can perceive the entire environment, including the state and observations of all agents. Performing tasks with complete knowledge of the state of a system is relatively easy, but centralised solutions might not be feasible in real scenarios since agents do not have direct access to the state but only to their observations. To overcome this issue, we train end-to-end Neural Networks that take as input local observations obtained from an omniscient centralised controller, i.e., the agents' sensor readings and the communications received, producing as output the action to be performed and the communication to be transmitted. This study concentrates on two cooperative tasks using a distributed controller: distributing the robots evenly in space and colouring them based on their position relative to others. While an explicit exchange of messages between the agents is required to solve the second task, in the first one, a communication protocol is unnecessary, although it may increase performance. The experiments are run in Enki, a high-performance open-source simulator for planar robots, which provides collision detection and limited physics support for robots evolving on a flat surface. Moreover, it can simulate groups of robots hundreds of times faster than real-time. The results show how applying a communication strategy improves the performance of the distributed model, letting it decide which actions to take almost as precisely and quickly as the expert controller.


The Ordered Matrix Dirichlet for State-Space Models

arXiv.org Artificial Intelligence

Many dynamical systems in the real world are naturally described by latent states with intrinsic orderings, such as "ally", "neutral", and "enemy" relationships in international relations. These latent states manifest through countries' cooperative versus conflictual interactions over time. State-space models (SSMs) explicitly relate the dynamics of observed measurements to transitions in latent states. For discrete data, SSMs commonly do so through a state-to-action emission matrix and a state-to-state transition matrix. This paper introduces the Ordered Matrix Dirichlet (OMD) as a prior distribution over ordered stochastic matrices wherein the discrete distribution in the kth row stochastically dominates the (k+1)th, such that probability mass is shifted to the right when moving down rows. We illustrate the OMD prior within two SSMs: a hidden Markov model, and a novel dynamic Poisson Tucker decomposition model tailored to international relations data. We find that models built on the OMD recover interpretable ordered latent structure without forfeiting predictive performance. We suggest future applications to other domains where models with stochastic matrices are popular (e.g., topic modeling), and publish user-friendly code.


Perfect Sampling from Pairwise Comparisons

arXiv.org Artificial Intelligence

In this work, we study how to efficiently obtain perfect samples from a discrete distribution $\mathcal{D}$ given access only to pairwise comparisons of elements of its support. Specifically, we assume access to samples $(x, S)$, where $S$ is drawn from a distribution over sets $\mathcal{Q}$ (indicating the elements being compared), and $x$ is drawn from the conditional distribution $\mathcal{D}_S$ (indicating the winner of the comparison) and aim to output a clean sample $y$ distributed according to $\mathcal{D}$. We mainly focus on the case of pairwise comparisons where all sets $S$ have size 2. We design a Markov chain whose stationary distribution coincides with $\mathcal{D}$ and give an algorithm to obtain exact samples using the technique of Coupling from the Past. However, the sample complexity of this algorithm depends on the structure of the distribution $\mathcal{D}$ and can be even exponential in the support of $\mathcal{D}$ in many natural scenarios. Our main contribution is to provide an efficient exact sampling algorithm whose complexity does not depend on the structure of $\mathcal{D}$. To this end, we give a parametric Markov chain that mixes significantly faster given a good approximation to the stationary distribution. We can obtain such an approximation using an efficient learning from pairwise comparisons algorithm (Shah et al., JMLR 17, 2016). Our technique for speeding up sampling from a Markov chain whose stationary distribution is approximately known is simple, general and possibly of independent interest.


Machine Learning based prediction of Glucose Levels in Type 1 Diabetes Patients with the use of Continuous Glucose Monitoring Data

arXiv.org Artificial Intelligence

A task of vital clinical importance, within Diabetes management, is the prevention of hypo/hyperglycemic events. Increasingly adopted Continuous Glucose Monitoring (CGM) devices offer detailed, non-intrusive and real time insights into a patient's blood glucose concentrations. Leveraging advanced Machine Learning (ML) Models as methods of prediction of future glucose levels, gives rise to substantial quality of life improvements, as well as providing a vital tool for monitoring diabetes. A regression based prediction approach is implemented recursively, with a series of Machine Learning Models: Linear Regression, Hidden Markov Model, Long-Short Term Memory Network. By exploiting a patient's past 11 hours of blood glucose (BG) concentration measurements, a prediction of the 60 minutes is made. Results will be assessed using performance metrics including: Root Mean Squared Error (RMSE), normalised energy of the second-order differences (ESOD) and F1 score. Research of past and current approaches, as well as available dataset, led to the establishment of an optimal training methodology for the CITY dataset, which may be leveraged by future model development. Performance was aligned with similar state-of-art ML models, with LSTM having RMSE of 28.55, however no significant advantage was observed over classical Auto-regressive AR models. Compelling insights into LSTM prediction behaviour could increase public and legislative trust and understanding, progressing the certification of ML models in Artificial Pancreas Systems (APS).


Active Inference for Autonomous Decision-Making with Contextual Multi-Armed Bandits

arXiv.org Artificial Intelligence

In autonomous robotic decision-making under uncertainty, the tradeoff between exploitation and exploration of available options must be considered. If secondary information associated with options can be utilized, such decision-making problems can often be formulated as contextual multi-armed bandits (CMABs). In this study, we apply active inference, which has been actively studied in the field of neuroscience in recent years, as an alternative action selection strategy for CMABs. Unlike conventional action selection strategies, it is possible to rigorously evaluate the uncertainty of each option when calculating the expected free energy (EFE) associated with the decision agent's probabilistic model, as derived from the free-energy principle. We specifically address the case where a categorical observation likelihood function is used, such that EFE values are analytically intractable. We introduce new approximation methods for computing the EFE based on variational and Laplace approximations. Extensive simulation study results demonstrate that, compared to other strategies, active inference generally requires far fewer iterations to identify optimal options and generally achieves superior cumulative regret, for relatively low extra computational cost.


Finding Regularized Competitive Equilibria of Heterogeneous Agent Macroeconomic Models with Reinforcement Learning

arXiv.org Artificial Intelligence

We study a heterogeneous agent macroeconomic model with an infinite number of households and firms competing in a labor market. Each household earns income and engages in consumption at each time step while aiming to maximize a concave utility subject to the underlying market conditions. The households aim to find the optimal saving strategy that maximizes their discounted cumulative utility given the market condition, while the firms determine the market conditions through maximizing corporate profit based on the household population behavior. The model captures a wide range of applications in macroeconomic studies, and we propose a data-driven reinforcement learning framework that finds the regularized competitive equilibrium of the model. The proposed algorithm enjoys theoretical guarantees in converging to the equilibrium of the market at a sub-linear rate.