Regression
Conditional Linear Regression
Calderon, Diego (University of Arkansas) | Juba, Brendan (Washington University in St. Louis) | Li, Zongyi (Washington University in St. Louis) | Ruan, Lisa (M.I.T.)
In this case, we would be interested used in biological and social sciences to predict events and to in identifying a segment of the population for which describe possible relationships between variables. When addressing a linear rule is highly predictive of the price of certain cars, the task of prediction, machine learning and statistics whereas this linear rule may not provide a good prediction commonly focus on capturing the vast majority of data, overall in the larger population. Let us imagine that for this occasionally ignoring a segment of the population as "outliers" data set, and for a target fraction of the population, we found or "noise," which could be helpful to better understand a simple rule that describes the subpopulation, along with the data. Previous work by Juba (2016) gave an algorithm its linear fit.
RelNN: A Deep Neural Model for Relational Learning
Kazemi, Seyed Mehran (University of British Columbia) | Poole, David (University of British Columbia)
Statistical relational AI (StarAI) aims at reasoning and learning in noisy domains described in terms of objects and relationships by combining probability with first-order logic. With huge advances in deep learning in the current years, combining deep networks with first-order logic has been the focus of several recent studies. Many of the existing attempts, however, only focus on relations and ignore object properties. The attempts that do consider object properties are limited in terms of modelling power or scalability. In this paper, we develop relational neural networks (RelNNs) by adding hidden layers to relational logistic regression (the relational counterpart of logistic regression). We learn latent properties for objects both directly and through general rules. Back-propagation is used for training these models. A modular, layer-wise architecture facilitates utilizing the techniques developed within deep learning community to our architecture. Initial experiments on eight tasks over three real-world datasets show that RelNNs are promising models for relational learning.
Non-Parametric Outliers Detection in Multiple Time Series A Case Study: Power Grid Data Analysis
Zhou, Yuxun (University of California, Berkeley) | Zou, Han (University of California, Berkeley) | Arghandeh, Reza (Florida State University) | Gu, Weixi (Tsinghua University) | Spanos, Costas J. (University of California, Berkeley)
Signal processing based filtering methods. Those approaches Data sets collected from a wide variety of research disciplines, implicitly assume that the "normal" component including computer science, economic, biology and of the time series has a sparse representation in the frequency social science, are in the form of multiple co-evolving time or wavelet domain. Hence the outlier detection problem is reduced series. In this work, we consider the task of outlier (or novelty) to a spectral analysis using low pass or band pass filters, detection given the aforementioned data type. The core or is solved by denoising/signal reconstruction using spectral difficulty, however, is to integrate both the temporal dependence or wavelet techniques (Mallat 2008). It is worth pointing out and the interactions among correlated time series for that the signal-processing-based methods have close ties with overall modeling and learning.
Orthant-Wise Passive Descent Algorithms for Training L 1 -Regularized Models
Wangni, Jianqiao (Tencent AI Lab)
The L1-regularized models are widely used for sparse regression or classification tasks. In this paper, we propose the orthant-wise passive descent algorithm (OPDA) for solving L 1 -regularized models, as an improved substitute of proximal algorithms, which are the standard tools for optimizing the models nowadays. OPDA uses a stochastic variance-reduced gradient (SVRG) to initialize the descent direction, then apply a novel alignment operator to encourage each element keeping the same sign after one iteration of update, so the parameter remains in the same orthant as before. It also explicitly suppresses the magnitude of each element to impose sparsity. The quasi-Newton update can be utilized to incorporate curvature information and accelerate the speed. We prove a linear convergence rate for OPDA on general smooth and strongly-convex loss functions. By conducting experiments on L 1 -regularized logistic regression and convolutional neural networks, we show that OPDA outperforms state-of-the-art stochastic proximal algorithms, implying a wide range of applications in training sparse models.
Discriminative Semi-Supervised Feature Selection via Rescaled Least Squares Regression-Supplement
Yuan, Guowen (Shenzhen University) | Chen, Xiaojun (Shenzhen University) | Wang, Chen (Shenzhen University) | Nie, Feiping (Northwestern Polytechnical University) | Jing, Liping (Beijing Jiaotong University)
In this paper, we propose a Discriminative Semi-Supervised Feature Selection (DSSFS) method. In this method, a ε-dragging technique is introduced to the Rescaled Linear Square Regression in order to enlarge the distances between different classes. An iterative method is proposed to simultaneously learn the regression coefficients, ε-draggings matrix and predicting the unknown class labels. Experimental results show the superiority of DSSFS.
Investigating Active Learning for Concept Prerequisite Learning
Liang, Chen (Pennsylvania State University) | Ye, Jianbo (Pennsylvania State University) | Wang, Shuting (Pennsylvania State University) | Pursel, Bart (Pennsylvania State University) | Giles, C. Lee (Pennsylvania State University)
Concept prerequisite learning focuses on machine learning methods for measuring the prerequisite relation among concepts. With the importance of prerequisites for education, it has recently become a promising research direction. A major obstacle to extracting prerequisites at scale is the lack of large-scale labels which will enable effective data-driven solutions. We investigate the applicability of active learning to concept prerequisite learning.We propose a novel set of features tailored for prerequisite classification and compare the effectiveness of four widely used query strategies. Experimental results for domains including data mining, geometry, physics, and precalculus show that active learning can be used to reduce the amount of training data required. Given the proposed features, the query-by-committee strategy outperforms other compared query strategies.
Argument Mining for Improving the Automated Scoring of Persuasive Essays
Nguyen, Huy V. (University of Pittsburgh) | Litman, Diane J. (University of Pittsburgh)
End-to-end argument mining has enabled the development of new automated essay scoring (AES) systems that use argumentative features (e.g., number of claims, number of support relations) in addition to traditional legacy features (e.g., grammar, discourse structure) when scoring persuasive essays. While prior research has proposed different argumentative features as well as empirically demonstrated their utility for AES, these studies have all had important limitations. In this paper we identify a set of desiderata for evaluating the use of argument mining for AES, introduce an end-to-end argument mining system and associated argumentative feature sets, and present the results of several studies that both satisfy the desiderata and demonstrate the value-added of argument mining for scoring persuasive essays.
Few Shot Transfer Learning BetweenWord Relatedness and Similarity Tasks Using A Gated Recurrent Siamese Network
Neill, James O' (Insight Centre for Data Analytics, National University of Ireland, Galway) | (Insight Centre for Data Analytics, National University of Ireland, Galway) | Buitelaar, Paul
Word similarity and word relatedness are fundamental to natural language processing and more generally, understanding how humans relate concepts in semantic memory. A growing number of datasets are being proposed as evaluation benchmarks,however, the heterogeneity and focus of each respective dataset makes it difficult to draw plausible conclusions as to how a unified semantic model would perform. Additionally, we want to identify the transferability of knowledge obtained from one task to another, within the same domain and across domains. Hence, this paper first presents an evaluation and comparison of eight chosen datasets tested using the best performing regression models. As a baseline, we present regression models that incorporate both lexical featuresand word embeddings to produce consistent and competitive results compared to the state of the art.We present our main contribution, the best performing model across seven of the eight datasets - a Gated Recurrent Siamese Networkthat learns relationships between lexical word definitions.A parameter transfer learning strategy is employed for theSiamese Network. Subsequently, we present a secondary contribution which is the best performing non-sequential model:an Inductive and Transductive Transfer Learning strategy fortransferring decision trees within a Random Forest to a target task that is learned from only few instances. The method involves measuring semantic distance between hidden factored matrix representations of decision tree traversal matrices.
Feature-Induced Labeling Information Enrichment for Multi-Label Learning
Zhang, Qian-Wen (Tencent Smart Platform &) | Zhong, Yun (Products Department) | Zhang, Min-Ling (Southeast University)
In multi-label learning, each training example is represented by a single instance (feature vector) while associated with multiple class labels simultaneously. The task is to learn a predictive model from the training examples which can assign a set of proper labels for the unseen instance. Most existing approaches make use of multi-label training examples by exploiting their labeling information in a crisp manner, i.e. one class label is either fully relevant or irrelevant to the instance. In this paper, a novel multi-label learning approach is proposed which aims to enrich the labeling information by leveraging the structural information in feature space. Firstly, the underlying structure of feature space is characterized by conducting sparse reconstruction among the training examples. Secondly, the reconstruction information is conveyed from feature space to label space so as to enrich the original categorical labels into numerical ones. Thirdly, the multi-label predictive model is induced by learning from training examples with enriched labeling information. Extensive experiments on fifteen benchmark data sets clearly validate the effectiveness of the proposed feature-induced strategy for enhancing labeling information of multi-label examples.
Fourier Feature Approximations for Periodic Kernels in Time-Series Modelling
Tompkins, Anthony (The University of Sydney) | Ramos, Fabio (The University of Sydney)
Gaussian Processes (GPs) provide an extremely powerful mechanism to model a variety of problems but incur an O(N 3 ) complexity in the number of data samples. Common approximation methods rely on what are often termed inducing points but still typically incur an O(NM 2 ) complexity in the data and corresponding inducing points. Using Random Fourier Feature (RFF) maps, we overcome this by transforming the problem into a Bayesian Linear Regression formulation upon which we apply a Bayesian Variational treatment that also allows learning the corresponding kernel hyperparameters, likelihood and noise parameters. In this paper we introduce an alternative method using Fourier series to obtain spectral representations of common kernels, in particular for periodic warpings, which surprisingly have a convergent, non-random form using special functions, requiring fewer spectral features to approximate their corresponding kernel to high accuracy. Using this, we can fuse the Random Fourier Feature spectral representations of common kernels with their periodic counterparts to show how they can more effectively and expressively learn patterns in time-series for both interpolation and extrapolation. This method combines robustness, scalability and equally importantly, interpretability through a symbolic declarative grammar that is both functionally and humanly intuitive — a property that is crucial for explainable decision making. Using probabilistic programming and Variational Inference we are able to efficiently optimise over these rich functional representations. We show significantly improved Gram matrix approximation errors, and also demonstrate the method in several time-series problems comparing other commonly used approaches such as recurrent neural networks.