Undirected Networks
Early Stratification of Patients at Risk for Postoperative Complications after Elective Colectomy
Wang, Wen, Padman, Rema, Shah, Nirav
Stratifying patients at risk for postoperative complications may facilitate timely and accurate workups and reduce the burden of adverse events on patients and the health system. Currently, a widely-used surgical risk calculator created by the American College of Surgeons, NSQIP, uses 21 preoperative covariates to assess risk of postoperative complications, but lacks dynamic, real-time capabilities to accommodate postoperative information. We propose a new Hidden Markov Model sequence classifier for analyzing patients' postoperative temperature sequences that incorporates their time-invariant characteristics in both transition probability and initial state probability in order to develop a postoperative "real-time" complication detector. Data from elective Colectomy surgery indicate that our method has improved classification performance compared to 8 other machine learning classifiers when using the full temperature sequence associated with the patients' length of stay. Additionally, within 44 hours after surgery, the performance of the model is close to that of full-length temperature sequence.
Smoothed Analysis in Unsupervised Learning via Decoupling
Bhaskara, Aditya, Chen, Aidao, Perreault, Aidan, Vijayaraghavan, Aravindan
Smoothed analysis is a powerful paradigm in overcoming worst-case intractability in unsupervised learning and high-dimensional data analysis. While polynomial time smoothed analysis guarantees have been obtained for worst-case intractable problems like tensor decompositions and learning mixtures of Gaussians, such guarantees have been hard to obtain for several other important problems in unsupervised learning. A core technical challenge is obtaining lower bounds on the least singular value for random matrix ensembles with dependent entries, that are given by low-degree polynomials of a few base underlying random variables. In this work, we address this challenge by obtaining high-confidence lower bounds on the least singular value of new classes of structured random matrix ensembles of the above kind. We then use these bounds to obtain polynomial time smoothed analysis guarantees for the following three important problems in unsupervised learning: 1. Robust subspace recovery, when the fraction $\alpha$ of inliers in the d-dimensional subspace $T \subset \mathbb{R}^n$ is at least $\alpha > (d/n)^\ell$ for any constant integer $\ell>0$. This contrasts with the known worst-case intractability when $\alpha< d/n$, and the previous smoothed analysis result which needed $\alpha > d/n$ (Hardt and Moitra, 2013). 2. Higher order tensor decompositions, where we generalize the so-called FOOBI algorithm of Cardoso to find order-$\ell$ rank-one tensors in a subspace. This allows us to obtain polynomially robust decomposition algorithms for $2\ell$'th order tensors with rank $O(n^{\ell})$. 3. Learning overcomplete hidden markov models, where the size of the state space is any polynomial in the dimension of the observations. This gives the first polynomial time guarantees for learning overcomplete HMMs in a smoothed analysis model.
Restricted Boltzmann Machine with Multivalued Hidden Variables: a model suppressing over-fitting
Yokoyama, Yuuki, Katsumata, Tomu, Yasuda, Muneki
Generalization is one of the most important issues in machine learning problems. In this paper, we consider the generalization in restricted Boltzmann machines. We propose a restricted Boltzmann machine with multivalued hidden variables, which is a simple extension of conventional restricted Boltzmann machines. We demonstrate that our model is better than the conventional one via numerical experiments: experiments for a contrastive divergence learning with artificial data and for a classification problem with MNIST.
Transition-based versus State-based Reward Functions for MDPs with Value-at-Risk
In reinforcement learning, the reward function on current state and action is widely used. When the objective is about the expectation of the (discounted) total reward only, it works perfectly. However, if the objective involves the total reward distribution, the result will be wrong. This paper studies Value-at-Risk (VaR) problems in short- and long-horizon Markov decision processes (MDPs) with two reward functions, which share the same expectations. Firstly we show that with VaR objective, when the real reward function is transition-based (with respect to action and both current and next states), the simplified (state-based, with respect to action and current state only) reward function will change the VaR. Secondly, for long-horizon MDPs, we estimate the VaR function with the aid of spectral theory and the central limit theorem. Thirdly, since the estimation method is for a Markov reward process with the reward function on current state only, we present a transformation algorithm for the Markov reward process with the reward function on current and next states, in order to estimate the VaR function with an intact total reward distribution.
Autoconj: Recognizing and Exploiting Conjugacy Without a Domain-Specific Language
Hoffman, Matthew D., Johnson, Matthew J., Tran, Dustin
Deriving conditional and marginal distributions using conjugacy relationships can be time consuming and error prone. In this paper, we propose a strategy for automating such derivations. Unlike previous systems which focus on relationships between pairs of random variables, our system (which we call Autoconj) operates directly on Python functions that compute log-joint distribution functions. Autoconj provides support for conjugacy-exploiting algorithms in any Python-embedded PPL. This paves the way for accelerating development of novel inference algorithms and structure-exploiting modeling strategies.
A Structure-aware Online Learning Algorithm for Markov Decision Processes
Roy, Arghyadip, Borkar, Vivek, Karandikar, Abhay, Chaporkar, Prasanna
To overcome the curse of dimensionality and curse of modeling in Dynamic Programming (DP) methods for solving classical Markov Decision Process (MDP) problems, Reinforcement Learning (RL) algorithms are popular. In this paper, we consider an infinite-horizon average reward MDP problem and prove the optimality of the threshold policy under certain conditions. Traditional RL techniques do not exploit the threshold nature of optimal policy while learning. In this paper, we propose a new RL algorithm which utilizes the known threshold structure of the optimal policy while learning by reducing the feasible policy space. We establish that the proposed algorithm converges to the optimal policy. It provides a significant improvement in convergence speed and computational and storage complexity over traditional RL algorithms. The proposed technique can be applied to a wide variety of optimization problems that include energy efficient data transmission and management of queues. We exhibit the improvement in convergence speed of the proposed algorithm over other RL algorithms through simulations.
Few-Shot Generalization Across Dialogue Tasks
Vlasov, Vladimir, Drissner-Schmid, Akela, Nichol, Alan
Machine-learning based dialogue managers are able to learn complex behaviors in order to complete a task, but it is not straightforward to extend their capabilities to new domains. We investigate different policies' ability to handle uncooperative user behavior, and how well expertise in completing one task (such as restaurant reservations) can be reapplied when learning a new one (e.g. booking a hotel). We introduce the Recurrent Embedding Dialogue Policy (REDP), which embeds system actions and dialogue states in the same vector space. REDP contains a memory component and attention mechanism based on a modified Neural Turing Machine, and significantly outperforms a baseline LSTM classifier on this task. We also show that both our architecture and baseline solve the bAbI dialogue task, achieving 100% test accuracy.
Particle Probability Hypothesis Density Filter based on Pairwise Markov Chains
Liu, Jiangyi, Wang, Chunping, Wang, Wei
Most multi-target tracking filters assume that one target and its observation follow a Hidden Markov Chain (HMC) model, but the implicit independence assumption of HMC model is invalid in many practical applications, and a Pairwise Markov Chain (PMC) model is more universally suitable than traditional HMC model. A particle probability hypothesis density filter based on PMC model (PF-PMC-PHD) is proposed for the nonlinear multi-target tracking system. Simulation results show the effectiveness of PF-PMC-PHD filter, and that the tracking performance of PF-PMC-PHD filter is superior to the particle PHD filter based on HMC model in a scenario where we kept the local physical properties of nonlinear and Gaussian HMC models while relaxing their independence assumption.
Optimization of Information-Seeking Dialogue Strategy for Argumentation-Based Dialogue System
Katsumi, Hisao, Hiraoka, Takuya, Yoshino, Koichiro, Yamamoto, Kazeto, Motoura, Shota, Sadamasa, Kunihiko, Nakamura, Satoshi
Argumentation-based dialogue systems, which can handle and exchange arguments through dialogue, have been widely researched. It is required that these systems have sufficient supporting information to argue their claims rationally; however, the systems often do not have enough of such information in realistic situations. One way to fill in the gap is acquiring such missing information from dialogue partners (information-seeking dialogue). Existing information-seeking dialogue systems are based on handcrafted dialogue strategies that exhaustively examine missing information. However, the proposed strategies are not specialized in collecting information for constructing rational arguments. Moreover, the number of system's inquiry candidates grows in accordance with the size of the argument set that the system deal with. In this paper, we formalize the process of information-seeking dialogue as Markov decision processes (MDPs) and apply deep reinforcement learning (DRL) for automatically optimizing a dialogue strategy. By utilizing DRL, our dialogue strategy can successfully minimize objective functions, the number of turns it takes for our system to collect necessary information in a dialogue. We conducted dialogue experiments using two datasets from different domains of argumentative dialogue. Experimental results show that the proposed formalization based on MDP works well, and the policy optimized by DRL outperformed existing heuristic dialogue strategies.
What Should I Learn First: Introducing LectureBank for NLP Education and Prerequisite Chain Learning
Li, Irene, Fabbri, Alexander R., Tung, Robert R., Radev, Dragomir R.
Recent years have witnessed the rising popularity of Natural Language Processing (NLP) and related fields such as Artificial Intelligence (AI) and Machine Learning (ML). Many online courses and resources are available even for those without a strong background in the field. Often the student is curious about a specific topic but does not quite know where to begin studying. To answer the question of "what should one learn first," we apply an embedding-based method to learn prerequisite relations for course concepts in the domain of NLP. We introduce LectureBank, a dataset containing 1,352 English lecture files collected from university courses which are each classified according to an existing taxonomy as well as 208 manually-labeled prerequisite relation topics, which is publicly available. The dataset will be useful for educational purposes such as lecture preparation and organization as well as applications such as reading list generation. Additionally, we experiment with neural graph-based networks and non-neural classifiers to learn these prerequisite relations from our dataset.