Supervised Learning
Simultaneous Learning of Trees and Representations for Extreme Classification and Density Estimation
Jernite, Yacine, Choromanska, Anna, Sontag, David
We consider multi-class classification where the predictor has a hierarchical structure that allows for a very large number of labels both at train and test time. The predictive power of such models can heavily depend on the structure of the tree, and although past work showed how to learn the tree structure, it expected that the feature vectors remained static. We provide a novel algorithm to simultaneously perform representation learning for the input data and learning of the hierarchi- cal predictor. Our approach optimizes an objec- tive function which favors balanced and easily- separable multi-way node partitions. We theoret- ically analyze this objective, showing that it gives rise to a boosting style property and a bound on classification error. We next show how to extend the algorithm to conditional density estimation. We empirically validate both variants of the al- gorithm on text classification and language mod- eling, respectively, and show that they compare favorably to common baselines in terms of accu- racy and running time.
Learning from networked examples in a k-partite graph
Wang, Yuyi, Ramon, Jan, Guo, Zheng-Chu
Many machine learning algorithms are based on the assumption that training examples are drawn independently. However, this assumption does not hold anymore when learning from a networked sample where two or more training examples may share common features. We propose an efficient weighting method for learning from networked examples and show the sample error bound which is better than previous work.
Resource Constrained Structured Prediction
Bolukbasi, Tolga (Boston University) | Chang, Kai-Wei (University of Virginia) | Wang, Joseph (Boston University) | Saligrama, Venkatesh (Boston University)
We study the problem of structured prediction under test-time budget constraints. We propose a novel approach based on selectively acquiring computationally costly features during test-time in order to reduce the computational cost of pre- diction with minimal performance degradation. We formulate a novel empirical risk minimization (ERM) for policy learning. We show that policy learning can be reduced to a series of structured learning problems, resulting in efficient training using existing structured learning algorithms. This framework provides theoretical justification for several existing heuristic approaches found in literature. We evaluate our proposed adaptive system on two structured prediction tasks, optical character recognition and dependency parsing and show significant reduction in the feature costs without degrading accuracy.
Addressing Imbalance in Multi-Label Classification Using Structured Hellinger Forests
Daniels, Zachary Alan (Rutgers University) | Metaxas, Dimitris N. (Rutgers University)
The multi-label classification problem involves finding a model that maps a set of input features to more than one output label. Class imbalance is a serious issue in multi-label classification. We introduce an extension of structured forests, a type of random forest used for structured prediction, called Sparse Oblique Structured Hellinger Forests (SOSHF). We explore using structured forests in the general multi-label setting and propose a new imbalance-aware formulation by altering how the splitting functions are learned in two ways. First, we account for cost-sensitivity when converting the multi-label problem to a single-label problem at each node in the tree. Second, we introduce a new objective function for determining oblique splits based on the Hellinger distance, a splitting criterion that has been shown to be robust to class imbalance. We empirically validate our method on a number of benchmarks against standard and state-of-the-art multi-label classification algorithms with improved results.
Structured Prediction in Time Series Data
Li, Jia (University of Illinois at Chicago)
Time series data is common in a wide range of disciplines including finance, biology, sociology, and computer science. Analyzing and modeling time series data is fundamental for studying various problems in those fields. For instance, studying time series physiological data can be used to discriminate patientsโ abnormal recovery trajectories and normal ones (Hripcsak, Albers, and Perotte 2015). GPS data are useful for studying collective decision making of groupliving animals (Strandburg-Peshkin et al. 2015). There are different methods for studying time series data such as clustering, regression, and anomaly detection. In this proposal, we are interested in structured prediction problems in time series data. Structured prediction focuses on prediction task where the outputs are structured and interdependent, contrary to the non-structured prediction which assumes that the outputs are independent of other predicted outputs. Structured prediction is an important problem as there are structures inherently existing in time series data. One difficulty for structured prediction is that the number of possible outputs can be exponential which makes modeling all the potential outputs intractable.
Confidence-Rated Discriminative Partial Label Learning
Tang, Cai-Zhi (Southeast University) | Zhang, Min-Ling (Southeast University)
Partial label learning aims to induce a multi-class classifier from training examples where each of them is associated with a set of candidate labels, among which only one label is valid. The common discriminative solution to learn from partial label examples assumes one parametric model for each class label, whose predictions are aggregated to optimize specific objectives such as likelihood or margin over the training examples. Nonetheless, existing discriminative approaches treat the predictions from all parametric models in an equal manner, where the confidence of each candidate label being the ground-truth label is not differentiated. In this paper, a boosting-style partial label learning approach is proposed to enabling confidence-rated discriminative modeling. Specifically, the ground-truth confidence of each candidate label is maintained in each boosting round and utilized to train the base classifier. Extensive experiments on artificial as well as real-world partial label data sets validate the effectiveness of the confidence-rated discriminative modeling.
Embedding Tarskian Semantics in Vector Spaces
Sato, Taisuke (National Institute of Advanced Industrial Science and Technology (AIST))
We propose a new linear algebraic approach to the computation of Tarskian semantics in logic. We embed a finite model M in first-order logic with N entities in N-dimensional Euclidean space R^N by mapping entities of M to N dimensional one-hot vectors and k-ary relations to order-k adjacency tensors (multi-way arrays). Second given a logical formula F in prenex normal form, we compile F into a set Sigma_F of algebraic formulas in multi-linear algebra with a nonlinear operation. In this compilation, existential quantifiers are compiled into a specific type of tensors, e.g., identity matrices in the case of quantifying two occurrences of a variable. It is shown that a systematic evaluation of Sigma_F in R N gives the truth value, 1(true) or 0(false), of F in M. Based on this framework, we also propose an unprecedented way of computing the least models defined by Datalog programs in linear spaces via matrix equations and empirically show its effectiveness compared to state-of-the-art approaches.
Fast and Accurate Time Series Classification with WEASEL
Time series (TS) occur in many scientific and commercial applications, ranging from earth surveillance to industry automation to the smart grids. An important type of TS analysis is classification, which can, for instance, improve energy load forecasting in smart grids by detecting the types of electronic devices based on their energy consumption profiles recorded by automatic sensors. Such sensor-driven applications are very often characterized by (a) very long TS and (b) very large TS datasets needing classification. However, current methods to time series classification (TSC) cannot cope with such data volumes at acceptable accuracy; they are either scalable but offer only inferior classification quality, or they achieve state-of-the-art classification quality but cannot scale to large data volumes. In this paper, we present WEASEL (Word ExtrAction for time SEries cLassification), a novel TSC method which is both scalable and accurate. Like other state-of-the-art TSC methods, WEASEL transforms time series into feature vectors, using a sliding-window approach, which are then analyzed through a machine learning classifier. The novelty of WEASEL lies in its specific method for deriving features, resulting in a much smaller yet much more discriminative feature set. On the popular UCR benchmark of 85 TS datasets, WEASEL is more accurate than the best current non-ensemble algorithms at orders-of-magnitude lower classification and training times, and it is almost as accurate as ensemble classifiers, whose computational complexity makes them inapplicable even for mid-size datasets. The outstanding robustness of WEASEL is also confirmed by experiments on two real smart grid datasets, where it out-of-the-box achieves almost the same accuracy as highly tuned, domain-specific methods.
ERBlox: Combining Matching Dependencies with Machine Learning for Entity Resolution
Bahmani, Zeinab, Bertossi, Leopoldo, Vasiloglou, Nikolaos
Entity resolution (ER), an important and common data cleaning problem, is about detecting data duplicate representations for the same external entities, and merging them into single representations. Relatively recently, declarative rules called "matching dependencies" (MDs) have been proposed for specifying similarity conditions under which attribute values in database records are merged. In this work we show the process and the benefits of integrating four components of ER: (a) Building a classifier for duplicate/non-duplicate record pairs built using machine learning (ML) techniques; (b) Use of MDs for supporting the blocking phase of ML; (c) Record merging on the basis of the classifier results; and (d) The use of the declarative language "LogiQL" -an extended form of Datalog supported by the "LogicBlox" platform- for all activities related to data processing, and the specification and enforcement of MDs.
Discriminative Gaifman Models
We present discriminative Gaifman models, a novel family of relational machine learning models. Gaifman models learn feature representations bottom up from representations of locally connected and bounded-size regions of knowledge bases (KBs). Considering local and bounded-size neighborhoods of knowledge bases renders logical inference and learning tractable, mitigates the problem of overfitting, and facilitates weight sharing. Gaifman models sample neighborhoods of knowledge bases so as to make the learned relational models more robust to missing objects and relations which is a common situation in open-world KBs. We present the core ideas of Gaifman models and apply them to large-scale relational learning problems. We also discuss the ways in which Gaifman models relate to some existing relational machine learning approaches.