Goto

Collaborating Authors

 Markov Models


EnHMM: On the Use of Ensemble HMMs and Stack Traces to Predict the Reassignment of Bug Report Fields

arXiv.org Artificial Intelligence

Bug reports (BR) contain vital information that can help triaging teams prioritize and assign bugs to developers who will provide the fixes. However, studies have shown that BR fields often contain incorrect information that need to be reassigned, which delays the bug fixing process. There exist approaches for predicting whether a BR field should be reassigned or not. These studies use mainly BR descriptions and traditional machine learning algorithms (SVM, KNN, etc.). As such, they do not fully benefit from the sequential order of information in BR data, such as function call sequences in BR stack traces, which may be valuable for improving the prediction accuracy. In this paper, we propose a novel approach, called EnHMM, for predicting the reassignment of BR fields using ensemble Hidden Markov Models (HMMs), trained on stack traces. EnHMM leverages the natural ability of HMMs to represent sequential data to model the temporal order of function calls in BR stack traces. When applied to Eclipse and Gnome BR repositories, EnHMM achieves an average precision, recall, and F-measure of 54%, 76%, and 60% on Eclipse dataset and 41%, 69%, and 51% on Gnome dataset. We also found that EnHMM improves over the best single HMM by 36% for Eclipse and 76% for Gnome. Finally, when comparing EnHMM to Im.ML.KNN, a recent approach in the field, we found that the average F-measure score of EnHMM improves the average F-measure of Im.ML.KNN by 6.80% and improves the average recall of Im.ML.KNN by 36.09%. However, the average precision of EnHMM is lower than that of Im.ML.KNN (53.93% as opposed to 56.71%).


Quasi-Equivalence Discovery for Zero-Shot Emergent Communication

arXiv.org Artificial Intelligence

Effective communication is an important skill for enabling information exchange in multi-agent settings and emergent communication is now a vibrant field of research, with common settings involving discrete cheap-talk channels. Since, by definition, these settings involve arbitrary encoding of information, typically they do not allow for the learned protocols to generalize beyond training partners. In contrast, in this work, we present a novel problem setting and the Quasi-Equivalence Discovery (QED) algorithm that allows for zero-shot coordination (ZSC), i.e., discovering protocols that can generalize to independently trained agents. Real world problem settings often contain costly communication channels, e.g., robots have to physically move their limbs, and a non-uniform distribution over intents. We show that these two factors lead to unique optimal ZSC policies in referential games, where agents use the energy cost of the messages to communicate intent. Other-Play was recently introduced for learning optimal ZSC policies, but requires prior access to the symmetries of the problem. Instead, QED can iteratively discovers the symmetries in this setting and converges to the optimal ZSC policy.


RecSim NG: Toward Principled Uncertainty Modeling for Recommender Ecosystems

arXiv.org Artificial Intelligence

The development of recommender systems that optimize multi-turn interaction with users, and model the interactions of different agents (e.g., users, content providers, vendors) in the recommender ecosystem have drawn increasing attention in recent years. Developing and training models and algorithms for such recommenders can be especially difficult using static datasets, which often fail to offer the types of counterfactual predictions needed to evaluate policies over extended horizons. To address this, we develop RecSim NG, a probabilistic platform for the simulation of multi-agent recommender systems. RecSim NG is a scalable, modular, differentiable simulator implemented in Edward2 and TensorFlow. It offers: a powerful, general probabilistic programming language for agent-behavior specification; tools for probabilistic inference and latent-variable model learning, backed by automatic differentiation and tracing; and a TensorFlow-based runtime for running simulations on accelerated hardware. We describe RecSim NG and illustrate how it can be used to create transparent, configurable, end-to-end models of a recommender ecosystem, complemented by a small set of simple use cases that demonstrate how RecSim NG can help both researchers and practitioners easily develop and train novel algorithms for recommender systems.


Learning One Representation to Optimize All Rewards

arXiv.org Artificial Intelligence

We introduce the forward-backward (FB) representation of the dynamics of a reward-free Markov decision process. It provides explicit near-optimal policies for any reward specified a posteriori. During an unsupervised phase, we use reward-free interactions with the environment to learn two representations via off-the-shelf deep learning methods and temporal difference (TD) learning. In the test phase, a reward representation is estimated either from observations or an explicit reward description (e.g., a target state). The optimal policy for that reward is directly obtained from these representations, with no planning. The unsupervised FB loss is well-principled: if training is perfect, the policies obtained are provably optimal for any reward function. With imperfect training, the sub-optimality is proportional to the unsupervised approximation error. The FB representation learns long-range relationships between states and actions, via a predictive occupancy map, without having to synthesize states as in model-based approaches. This is a step towards learning controllable agents in arbitrary black-box stochastic environments. This approach compares well to goal-oriented RL algorithms on discrete and continuous mazes, pixel-based MsPacman, and the FetchReach virtual robot arm. We also illustrate how the agent can immediately adapt to new tasks beyond goal-oriented RL.


Simulation Studies on Deep Reinforcement Learning for Building Control with Human Interaction

arXiv.org Artificial Intelligence

The building sector consumes the largest energy in the world, and there have been considerable research interests in energy consumption and comfort management of buildings. Inspired by recent advances in reinforcement learning (RL), this paper aims at assessing the potential of RL in building climate control problems with occupant interaction. We apply a recent RL approach, called DDPG (deep deterministic policy gradient), for the continuous building control tasks and assess its performance with simulation studies in terms of its ability to handle (a) the partial state observability due to sensor limitations; (b) complex stochastic system with high-dimensional state-spaces, which are jointly continuous and discrete; (c) uncertainties due to ambient weather conditions, occupant's behavior, and comfort feelings. Especially, the partial observability and uncertainty due to the occupant interaction significantly complicate the control problem. Through simulation studies, the policy learned by DDPG demonstrates reasonable performance and computational tractability.


Build a Deep Learning Text Generator Project with Markov Chains

#artificialintelligence

Natural language processing (NLP) and deep learning are growing in popularity for their use in ML technologies like self-driving cars and speech recognition software. As more companies begin to implement deep learning components and other machine learning practices, the demand for software developers and data scientists with proficiency in deep learning is skyrocketing. Today, we will introduce you to a popular deep learning project, the Text Generator, to familiarize you with important, industry-standard NLP concepts, including Markov chains. By the end of this article, you'll understand how to build a Text Generator component for search engine systems and know how to implement Markov chains for faster predictive models.


Image Segmentation Methods for Non-destructive testing Applications

arXiv.org Artificial Intelligence

In this paper, we present new image segmentation methods based on hidden Markov random fields (HMRFs) and cuckoo search (CS) variants. HMRFs model the segmentation problem as a minimization of an energy function. CS algorithm is one of the recent powerful optimization techniques. Therefore, five variants of the CS algorithm are used to compute a solution. Through tests, we conduct a study to choose the CS variant with parameters that give good results (execution time and quality of segmentation). CS variants are evaluated and compared with non-destructive testing (NDT) images using a misclassification error (ME) criterion.


Optimal sequential decision making with probabilistic digital twins

arXiv.org Machine Learning

Digital twins are emerging in many industries, typically consisting of simulation models and data associated with a specific physical system. One of the main reasons for developing a digital twin, is to enable the simulation of possible consequences of a given action, without the need to interfere with the physical system itself. Physical systems of interest, and the environments they operate in, do not always behave deterministically. Moreover, information about the system and its environment is typically incomplete or imperfect. Probabilistic representations of systems and environments may therefore be called for, especially to support decisions in application areas where actions may have severe consequences. In this paper we introduce the probabilistic digital twin (PDT). We will start by discussing how epistemic uncertainty can be treated using measure theory, by modelling epistemic information via $\sigma$-algebras. Based on this, we give a formal definition of how epistemic uncertainty can be updated in a PDT. We then study the problem of optimal sequential decision making. That is, we consider the case where the outcome of each decision may inform the next. Within the PDT framework, we formulate this optimization problem. We discuss how this problem may be solved (at least in theory) via the maximum principle method or the dynamic programming principle. However, due to the curse of dimensionality, these methods are often not tractable in practice. To mend this, we propose a generic approximate solution using deep reinforcement learning together with neural networks defined on sets. We illustrate the method on a practical problem, considering optimal information gathering for the estimation of a failure probability.


Hippocampal formation-inspired probabilistic generative model

arXiv.org Artificial Intelligence

We constructed a hippocampal formation (HPF)-inspired probabilistic generative model (HPF-PGM) using the structure-constrained interface decomposition method. By modeling brain regions with PGMs, this model is positioned as a module that can be integrated as a whole-brain PGM. We discuss the relationship between simultaneous localization and mapping (SLAM) in robotics and the findings of HPF in neuroscience. Furthermore, we survey the modeling for HPF and various computational models, including brain-inspired SLAM, spatial concept formation, and deep generative models. The HPF-PGM is a computational model that is highly consistent with the anatomical structure and functions of the HPF, in contrast to typical conventional SLAM models. By referencing the brain, we suggest the importance of the integration of egocentric/allocentric information from the entorhinal cortex to the hippocampus and the use of discrete-event queues.


Symbolic Reinforcement Learning for Safe RAN Control

arXiv.org Artificial Intelligence

In order to express desired (SRL) architecture for safe control in Radio Access Network (RAN) specifications to the network into consideration, LTL is used applications. In our automated tool, a user can select a high-level (see [2, 10, 12, 13]), due to the fact that it provides a powerful mathematical safety specifications expressed in Linear Temporal Logic (LTL) to formalism for such purpose. Our proposed demonstration shield an RL agent running in a given cellular network with aim exhibits the following attributes: of optimizing network performance, as measured through certain (1) a general automatic framework from LTL specification user Key Performance Indicators (KPIs). In the proposed architecture, input to the derivation of the policy that fulfills it; at the same network safety shielding is ensured through model-checking techniques time, blocking the control actions that violate the specification; over combined discrete system models (automata) that are (2) novel system dynamics abstraction to companions Markov Decision abstracted through reinforcement learning. We demonstrate the Processes (MDP) which is computationally efficient; user interface (UI) helping the user set intent specifications to the (3) UI development that allows the user to graphically access all architecture and inspect the difference in allowed and blocked actions.