Undirected Networks
Rejoinder for the discussion of the paper "A novel algorithmic approach to Bayesian Logic Regression"
Hubin, Aliaksandr, Storvik, Geir, Frommlet, Florian
We would like to begin this rejoinder with expressing our sincere gratitude to all of the discussants for their interesting and thought-provoking comments and remarks. We also feel heartily thankful to the editorial board of Bayesian Analysis for giving us the opportunity to publish our paper entitled "A novel algorithmic approach to Bayesian logic regression" (Hubin et al., 2020a) as a discussion article. Logic regression is a tool to model nonlinear relationships between binary covariates and some response variable by constructing predictors as Boolean combinations. The number of possible logic expressions grows exponentially with the number of binary variables involved, making the model search significantly harder with the increasing complexity of Boolean combinations. Due to Boolean equivalence, it is in fact almost impossible to specify the full model space a priori even for a relatively small number of covariates.
Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning
Guo, Daniel, Pires, Bernardo Avila, Piot, Bilal, Grill, Jean-bastien, Altché, Florent, Munos, Rémi, Azar, Mohammad Gheshlaghi
Learning a good representation is an essential component for deep reinforcement learning (RL). Representation learning is especially important in multitask and partially observable settings where building a representation of the unknown environment is crucial to solve the tasks. Here we introduce Prediction of Bootstrap Latents (PBL), a simple and flexible self-supervised representation learning algorithm for multitask deep RL. PBL builds on multistep predictive representations of future observations, and focuses on capturing structured information about environment dynamics. Specifically, PBL trains its representation by predicting latent embeddings of future observations. These latent embeddings are themselves trained to be predictive of the aforementioned representations. These predictions form a bootstrapping effect, allowing the agent to learn more about the key aspects of the environment dynamics. In addition, by defining prediction tasks completely in latent space, PBL provides the flexibility of using multimodal observations involving pixel images, language instructions, rewards and more. We show in our experiments that PBL delivers across-the-board improved performance over state of the art deep RL agents in the DMLab-30 and Atari-57 multitask setting.
Improving Target-driven Visual Navigation with Attention on 3D Spatial Relationships
Lv, Yunlian, Xie, Ning, Shi, Yimin, Wang, Zijiao, Shen, Heng Tao
Embodied artificial intelligence (AI) tasks shift from tasks focusing on internet images to active settings involving embodied agents that perceive and act within 3D environments. In this paper, we investigate the target-driven visual navigation using deep reinforcement learning (DRL) in 3D indoor scenes, whose navigation task aims to train an agent that can intelligently make a series of decisions to arrive at a pre-specified target location from any possible starting positions only based on egocentric views. However, most navigation methods currently struggle against several challenging problems, such as data efficiency, automatic obstacle avoidance, and generalization. Generalization problem means that agent does not have the ability to transfer navigation skills learned from previous experience to unseen targets and scenes. To address these issues, we incorporate two designs into classic DRL framework: attention on 3D knowledge graph (KG) and target skill extension (TSE) module. On the one hand, our proposed method combines visual features and 3D spatial representations to learn navigation policy. On the other hand, TSE module is used to generate sub-targets which allow agent to learn from failures. Specifically, our 3D spatial relationships are encoded through recently popular graph convolutional network (GCN). Considering the real world settings, our work also considers open action and adds actionable targets into conventional navigation situations. Those more difficult settings are applied to test whether DRL agent really understand its task, navigating environment, and can carry out reasoning. Our experiments, performed in the AI2-THOR, show that our model outperforms the baselines in both SR and SPL metrics, and improves generalization ability across targets and scenes.
Learning DAGs with continuous optimization
As datasets continually increase in size and complexity, our ability to uncover meaningful insights from unstructured and unlabeled data is crucial. At the same time, a premium has been placed on delivering simple, human-interpretable, and trustworthy inferential models of data. One promising class of such models are graphical models, which have been used to extract relational information from massive datasets arising from a wide variety of domains including biology, medicine, business, and finance, just to name a few. Graphical models are families of multivariate distributions with compact representations expressed as graphs. In both undirected (Markov networks) and directed (Bayesian networks) graphical models, the graph structure guides the factorization of the joint distribution into smaller local specifications such as clique potentials or local conditionals of a variable given its "parent" variables.
Activity Detection from Wearable Electromyogram Sensors using Hidden Markov Model
Surface electromyography (sEMG) has gained significant importance during recent advancements in consumer electronics for healthcare systems, gesture analysis and recognition and sign language communication. For such a system, it is imperative to determine the regions of activity in a continuously recorded sEMG signal. The proposed work provides a novel activity detection approach based on Hidden Markov Models (HMM) using sEMG signals recorded when various hand gestures are performed. Detection procedure is designed based on a probabilistic outlook by making use of mathematical models. The requirement of a threshold for activity detection is obviated making it subject and activity independent. Correctness of the predicted outputs is asserted by classifying the signal segments around the detected transition regions as activity or rest. Classified outputs are compared with the transition regions in a stimulus given to the subject to perform the activity. The activity onsets are detected with an average of 96.25% accuracy whereas the activity termination regions with an average of 87.5% accuracy with the considered set of six activities and four subjects.
Verification of Markov Decision Processes with Risk-Sensitive Measures
We develop a method for computing policies in Markov decision processes with risk-sensitive measures subject to temporal logic constraints. Specifically, we use a particular risk-sensitive measure from cumulative prospect theory, which has been previously adopted in psychology and economics. The nonlinear transformation of the probabilities and utility functions yields a nonlinear programming problem, which makes computation of optimal policies typically challenging. We show that this nonlinear weighting function can be accurately approximated by the difference of two convex functions. This observation enables efficient policy computation using convex-concave programming. We demonstrate the effectiveness of the approach on several scenarios.
Macro-Action-Based Deep Multi-Agent Reinforcement Learning
Xiao, Yuchen, Hoffman, Joshua, Amato, Christopher
In real-world multi-robot systems, performing high-quality, collaborative behaviors requires robots to asynchronously reason about high-level action selection at varying time durations. Macro-Action Decentralized Partially Observable Markov Decision Processes (MacDec-POMDPs) provide a general framework for asynchronous decision making under uncertainty in fully cooperative multi-agent tasks. However, multi-agent deep reinforcement learning methods have only been developed for (synchronous) primitive-action problems. This paper proposes two Deep Q-Network (DQN) based methods for learning decentralized and centralized macro-action-value functions with novel macro-action trajectory replay buffers introduced for each case. Evaluations on benchmark problems and a larger domain demonstrate the advantage of learning with macro-actions over primitive-actions and the scalability of our approaches.
Deep Reinforcement Learning for Adaptive Learning Systems
Li, Xiao, Xu, Hanchen, Zhang, Jinming, Chang, Hua-hua
In this paper, we formulate the adaptive learning problem---the problem of how to find an individualized learning plan (called policy) that chooses the most appropriate learning materials based on learner's latent traits---faced in adaptive learning systems as a Markov decision process (MDP). We assume latent traits to be continuous with an unknown transition model. We apply a model-free deep reinforcement learning algorithm---the deep Q-learning algorithm---that can effectively find the optimal learning policy from data on learners' learning process without knowing the actual transition model of the learners' continuous latent traits. To efficiently utilize available data, we also develop a transition model estimator that emulates the learner's learning process using neural networks. The transition model estimator can be used in the deep Q-learning algorithm so that it can more efficiently discover the optimal learning policy for a learner. Numerical simulation studies verify that the proposed algorithm is very efficient in finding a good learning policy, especially with the aid of a transition model estimator, it can find the optimal learning policy after training using a small number of learners.
F2A2: Flexible Fully-decentralized Approximate Actor-critic for Cooperative Multi-agent Reinforcement Learning
Li, Wenhao, Jin, Bo, Wang, Xiangfeng, Yan, Junchi, Zha, Hongyuan
Traditional centralized multi-agent reinforcement learning (MARL) algorithms are sometimes unpractical in complicated applications, due to non-interactivity between agents, curse of dimensionality and computation complexity. Hence, several decentralized MARL algorithms are motivated. However, existing decentralized methods only handle the fully cooperative setting where massive information needs to be transmitted in training. The block coordinate gradient descent scheme they used for successive independent actor and critic steps can simplify the calculation, but it causes serious bias. In this paper, we propose a flexible fully decentralized actor-critic MARL framework, which can combine most of actor-critic methods, and handle large-scale general cooperative multi-agent setting. A primal-dual hybrid gradient descent type algorithm framework is designed to learn individual agents separately for decentralization. From the perspective of each agent, policy improvement and value evaluation are jointly optimized, which can stabilize multi-agent policy learning. Furthermore, our framework can achieve scalability and stability for large-scale environment and reduce information transmission, by the parameter sharing mechanism and a novel modeling-other-agents methods based on theory-of-mind and online supervised learning. Sufficient experiments in cooperative Multi-agent Particle Environment and StarCraft II show that our decentralized MARL instantiation algorithms perform competitively against conventional centralized and decentralized methods.