Learning Graphical Models
Nearest Labelset Using Double Distances for Multi-label Classification
Gweon, Hyukjun, Schonlau, Matthias, Steiner, Stefan
Noname manuscript No. (will be inserted by the editor) Abstract Multi-label classification is a type of supervised learning where an instance may belong to multiple labels simultaneously. Predicting each label independently has been criticized for not exploiting any correlation between labels. In this paper we propose a novel approach, Nearest Labelset using Double Distances (NLDD), that predicts the labelset observed in the training data that minimizes a weighted sum of the distances in both the feature space and the label space to the new instance. The weights specify the relative tradeoff between the two distances. The weights are estimated from a binomial regression of the number of misclassified labels as a function of the two distances. Model parameters are estimated by maximum likelihood. NLDD only considers labelsets observed in the training data, thus implicitly taking into account label dependencies. Keywords Multi-label classification, Machine learning, Label correlations 1 Introduction In multi-label classification, an instance can belong to multiple labels at the same time. This is different from multi-class or binary classification, where an instance can only be associated with a single label.
Shape-aware Surface Reconstruction from Sparse 3D Point-Clouds
Bernard, Florian, Salamanca, Luis, Thunberg, Johan, Tack, Alexander, Jentsch, Dennis, Lamecker, Hans, Zachow, Stefan, Hertel, Frank, Goncalves, Jorge, Gemmar, Peter
The reconstruction of an object's shape or surface from a set of 3D points plays an important role in medical image analysis, e.g. in anatomy reconstruction from tomographic measurements or in the process of aligning intra-operative navigation and preoperative planning data. In such scenarios, one usually has to deal with sparse data, which significantly aggravates the problem of reconstruction. However, medical applications often provide contextual information about the 3D point data that allow to incorporate prior knowledge about the shape that is to be reconstructed. To this end, we propose the use of a statistical shape model (SSM) as a prior for surface reconstruction. The SSM is represented by a point distribution model (PDM), which is associated with a surface mesh. Using the shape distribution that is modelled by the PDM, we formulate the problem of surface reconstruction from a probabilistic perspective based on a Gaussian Mixture Model (GMM). In order to do so, the given points are interpreted as samples of the GMM. By using mixture components with anisotropic covariances that are "oriented" according to the surface normals at the PDM points, a surface-based fitting is accomplished. Estimating the parameters of the GMM in a maximum a posteriori manner yields the reconstruction of the surface from the given data points. We compare our method to the extensively used Iterative Closest Points method on several different anatomical datasets/SSMs (brain, femur, tibia, hip, liver) and demonstrate superior accuracy and robustness on sparse data.
13 Free Self-Study Books on Mathematics, Machine Learning & Deep Learning
Getting learners to read textbooks and use other teaching aids effectively can be tricky. Especially, when the books are just too dreary. In this post, we've compiled great e-resources for you digital natives looking to explore the exciting world of Machine Learning and Neural Networks. But before you dive into the deep end, you need to make sure you've got the fundamentals down pat. It doesn't matter what catches your fancy, machine learning, artificial intelligence, or deep learning; you need to know the basics of math and stats--linear algebra, calculus, optimization, probability--to get ahead.
Nonlinear Dynamic Boltzmann Machines for Time-Series Prediction
Dasgupta, Sakyasingha (IBM Research - Tokyo) | Osogami, Takayuki (IBM Research - Tokyo)
The dynamic Boltzmann machine (DyBM) has been proposed as a stochastic generative model of multi-dimensional time series, with an exact, learning rule that maximizes the log-likelihood of a given time series. The DyBM, however, is defined only for binary valued data, without any nonlinear hidden units. Here, in our first contribution, we extend the DyBM to deal with real valued data. We present a formulation called Gaussian DyBM, that can be seen as an extension of a vector autoregressive (VAR) model. This uses, in addition to standard (explanatory) variables, components that captures long term dependencies in the time series. In our second contribution, we extend the Gaussian DyBM model with a recurrent neural network (RNN) that controls the bias input to the DyBM units. We derive a stochastic gradient update rule such that, the output weights from the RNN can also be trained online along with other DyBM parameters. Furthermore, this acts as nonlinear hidden layer extending the capacity of DyBM and allows it to model nonlinear components in a given time-series. Numerical experiments with synthetic datasets show that the RNN-Gaussian DyBM improves predictive accuracy upon standard VAR by up to 35%. On real multi-dimensional time-series prediction, consisting of high nonlinearity and non-stationarity, we demonstrate that this nonlinear DyBM model achieves significant improvement upon state of the art baseline methods like VAR and long short-term memory (LSTM) networks at a reduced computational cost.
Efficient Dependency-Guided Named Entity Recognition
Jie, Zhanming (Singapore University of Technology and Design) | Muis, Aldrian Obaja (Singapore University of Technology and Design) | Lu, Wei (Singapore University of Technology and Design)
Named entity recognition (NER), which focuses on the extraction of semantically meaningful named entities and their semantic classes from text, serves as an indispensable component for several down-stream natural language processing (NLP) tasks such as relation extraction and event extraction. Dependency trees, on the other hand, also convey crucial semantic-level information. It has been shown previously that such information can be used to improve the performance of NER. In this work, we investigate on how to better utilize the structured information conveyed by dependency trees to improve the performance of NER. Specifically, unlike existing approaches which only exploit dependency information for designing local features, we show that certain global structured information of the dependency trees can be exploited when building NER models where such information can provide guided learning and inference. Through extensive experiments, we show that our proposed novel dependency-guided NER model performs competitively with models based on conventional semi-Markov conditional random fields, while requiring significantly less running time.
The Linearization of Belief Propagation on Pairwise Markov Random Fields
Gatterbauer, Wolfgang (Carnegie Mellon University)
Belief Propagation (BP) is an iterative message-passing algorithm for performing inference in graphical models Convergent message-passing algorithms. There has (GMs), such as Markov Random Fields (MRFs). BP calculates been much research on finding variations to the update equations the marginal distribution for each unobserved node, of BP that guarantee convergence. These algorithms conditional on any observed nodes (Pearl 1988). It achieves are often similar in structure to the nonconvergent algorithms, this by propagating the information from a few observed yet it can be proven that the value of the variational nodes throughout the network by iteratively passing information problem (or its dual) improves at each iteration (Hazan and between neighboring nodes. It is known that when Shashua 2008; Heskes 2006; Meltzer, Globerson, and Weiss the graphical model has a tree structure, then BP converges 2009). Another body of recent papers have suggested to to the true marginals (according to exact probabilistic inference) solve the convergence problems of MMinference by linearizing after a finite number of iterations.
Hindsight Optimization for Hybrid State and Action MDPs
Raghavan, Aswin (Oregon State University) | Sanner, Scott (University of Toronto) | Khardon, Roni (Tufts University) | Tadepalli, Prasad (Oregon State University) | Fern, Alan (Oregon State University)
Hybrid (mixed discrete and continuous) state and action Markov Decision Processes (HSA-MDPs) provide an expressive formalism for modeling stochastic and concurrent sequential decision-making problems. Existing solvers for HSA-MDPs are either limited to very restricted transition distributions, require knowledge of domain-specific basis functions to achieve good approximations, or do not scale. We explore a domain-independent approach based on the framework of hindsight optimization (HOP) for HSA-MDPs, which uses an upper bound on the finite-horizon action values for action selection. Our main contribution is a linear time reduction to a Mixed Integer Linear Program (MILP) that encodes the HOP objective, when the dynamics are specified as location-scale probability distributions parametrized by Piecewise Linear (PWL) functions of states and actions. In addition, we show how to use the same machinery to select actions based on a lower-bound generated by straight line plans. Our empirical results show that the HSA-HOP approach effectively scales to high-dimensional problems and outperforms baselines that are capable of scaling to such large hybrid MDPs.
Multi-Agent Path Finding with Delay Probabilities
Ma, Hang (University of Southern California) | Kumar, T. K. Satish (University of Southern California) | Koenig, Sven (University of Southern California)
Several recently developed Multi-Agent Path Finding (MAPF) solvers scale to large MAPF instances by searching for MAPF plans on 2 levels: The high-level search resolves collisions between agents, and the low-level search plans paths for single agents under the constraints imposed by the high-level search. We make the following contributions to solve the MAPF problem with imperfect plan execution with small average makespans: First, we formalize the MAPF Problem with Delay Probabilities (MAPF-DP), define valid MAPF-DP plans and propose the use of robust plan-execution policies for valid MAPF-DP plans to control how each agent proceeds along its path. Second, we discuss 2 classes of decentralized robust plan-execution policies (called Fully Synchronized Policies and Minimal Communication Policies) that prevent collisions during plan execution for valid MAPF-DP plans. Third, we present a 2-level MAPF-DP solver (called Approximate Minimization in Expectation) that generates valid MAPF-DP plans.
Adverse Drug Reaction Prediction with Symbolic Latent Dirichlet Allocation
Xiao, Cao (IBM T.J.Watson Research Center) | Zhang, Ping (IBM T.J.Watson Research Center) | Chaovalitwongse, W. Art (University of Arkansas) | Hu, Jianying (IBM T.J.Watson Research Center) | Wang, Fei (Cornell University)
Adverse drug reaction (ADR) is a major burden for patients and healthcare industry. It usually causes preventable hospitalizations and deaths, while associated with a huge amount of cost. Traditional preclinical in vitro safety profiling and clinical safety trials are restricted in terms of small scale, long duration, huge financial costs and limited statistical signifi- cance. The availability of large amounts of drug and ADR data potentially allows ADR predictions during the drugs’ early preclinical stage with data analytics methods to inform more targeted clinical safety tests. Despite their initial success, existing methods have trade-offs among interpretability, predictive power and efficiency. This urges us to explore methods that could have all these strengths and provide practical solutions for real world ADR predictions. We cast the ADR-drug relation structure into a three-layer hierarchical Bayesian model. We interpret each ADR as a symbolic word and apply latent Dirichlet allocation (LDA) to learn topics that may represent certain biochemical mechanism that relates ADRs with drug structures. Based on LDA, we designed an equivalent regularization term to incorporate the hierarchical ADR domain knowledge. Finally, we developed a mixed input model leveraging a fast collapsed Gibbs sampling method that the complexity of each iteration of Gibbs sampling proportional only to the number of positive ADRs. Experiments on real world data show our models achieved higher prediction accuracy and shorter running time than the state-of-the-art alternatives.
Heuristic Search Value Iteration for One-Sided Partially Observable Stochastic Games
Horák, Karel (Czech Technical University in Prague) | Bošanský, Branislav (Czech Technical University in Prague) | Pěchouček, Michal (Czech Technical University in Prague)
Security problems can be modeled as two-player partially observable stochastic games with one-sided partial observability and infinite horizon (one-sided POSGs). We seek for optimal strategies of player 1 that correspond to robust strategies against the worst-case opponent (player 2) that is assumed to have a perfect information about the game. We present a novel algorithm for approximately solving one-sided POSGs based on the heuristic search value iteration (HSVI) for POMDPs. Our results include (1) theoretical properties of one-sided POSGs and their value functions, (2) guarantees showing the convergence of our algorithm to optimal strategies, and (3) practical demonstration of applicability and scalability of our algorithm on three different domains: pursuit-evasion, patrolling, and search games.