Goto

Collaborating Authors

 Banff


Summarizing Opinions: Aspect Extraction Meets Sentiment Prediction and They Are Both Weakly Supervised

arXiv.org Artificial Intelligence

A number of techniques have been proposed for aspect discovery using part of speech tagging (Hu and Liu, 2004), syntactic parsing (Lu et al., 2009), clustering (Mei et al., 2007; Titov and McDonald, 2008b), data mining (Ku et al., 2006), and information extraction (Popescu and Etzioni, 2005). Various lexicon and rule-based methods (Hu and Liu, 2004; Ku et al., 2006; Blair-Goldensohn et al., 2008) have been adopted for sentiment prediction together with a few learning approaches (Lu et al., 2009; Pappas and Popescu-Belis, 2017; Angelidis and Lapata, 2018). As for the summaries, a common format involves a list of aspects and the number of positive and negative opinions for each (Hu and Liu, 2004). While this format gives an overall idea of people's opinion, reading the actual text might be necessary to gain a better understanding of specific details. Textual summaries are created following mostly extractive methods (but see Ganesan et al. 2010 for an abstractive approach), and various formats ranging from lists of words (Popescu and Etzioni, 2005), to phrases (Lu et al., 2009), and sentences (Mei et al., 2007; Blair-Goldensohn et al., 2008; Lerman et al., 2009; Wang and Ling, 2016). In this paper, we present a neural framework for opinion extraction from product reviews. We follow the standard architecture for aspect-based summarization, while taking advantage of the success of neural network models in learning continuous features without recourse to preprocessing tools or linguistic annotations.


Active Learning for Regression Using Greedy Sampling

arXiv.org Machine Learning

Regression problems are pervasive in real-world applications. Generally a substantial amount of labeled samples are needed to build a regression model with good generalization ability. However, many times it is relatively easy to collect a large number of unlabeled samples, but time-consuming or expensive to label them. Active learning for regression (ALR) is a methodology to reduce the number of labeled samples, by selecting the most beneficial ones to label, instead of random selection. This paper proposes two new ALR approaches based on greedy sampling (GS). The first approach (GSy) selects new samples to increase the diversity in the output space, and the second (iGS) selects new samples to increase the diversity in both input and output spaces. Extensive experiments on 12 UCI and CMU StatLib datasets from various domains, and on 15 subjects on EEG-based driver drowsiness estimation, verified their effectiveness and robustness.


Affect Estimation in 3D Space Using Multi-Task Active Learning for Regression

arXiv.org Machine Learning

Acquisition of labeled training samples for affective computing is usually costly and time-consuming, as affects are intrinsically subjective, subtle and uncertain, and hence multiple human assessors are needed to evaluate each affective sample. Particularly, for affect estimation in the 3D space of valence, arousal and dominance, each assessor has to perform the evaluations in three dimensions, which makes the labeling problem even more challenging. Many sophisticated machine learning approaches have been proposed to reduce the data labeling requirement in various other domains, but so far few have considered affective computing. This paper proposes two multi-task active learning for regression approaches, which select the most beneficial samples to label, by considering the three affect primitives simultaneously. Experimental results on the VAM corpus demonstrated that our optimal sample selection approaches can result in better estimation performance than random selection and several traditional single-task active learning approaches. Thus, they can help alleviate the data labeling problem in affective computing, i.e., better estimation performance can be obtained from fewer labeling queries.


Transfer Learning for Brain-Computer Interfaces: An Euclidean Space Data Alignment Approach

arXiv.org Machine Learning

Almost all EEG-based brain-computer interfaces (BCIs) need some labeled subject-specific data to calibrate a new subject, as neural responses are different across subjects to even the same stimulus. So, a major challenge in developing high-performance and user-friendly BCIs is to cope with such individual differences so that the calibration can be reduced or even completely eliminated. This paper focuses on the latter. More specifically, we consider an offline application scenario, in which we have unlabeled EEG trials from a new subject, and would like to accurately label them by leveraging auxiliary labeled EEG trials from other subjects in the same task. To accommodate the individual differences, we propose a novel unsupervised approach to align the EEG trials from different subjects in the Euclidean space to make them more consistent. It has three desirable properties: 1) the aligned trial lie in the Euclidean space, which can be used by any Euclidean space signal processing and machine learning approach; 2) it can be computed very efficiently; and, 3) it does not need any labeled trials from the new subject. Experiments on motor imagery and event-related potentials demonstrated the effectiveness and efficiency of our approach.


Variational Information Bottleneck on Vector Quantized Autoencoders

arXiv.org Machine Learning

In this paper, we provide an information-theoretic interpretation of the Vector Quantized-Variational Autoencoder(VQ-VAE). We show that the loss function of the original VQ-VAE [1] can be derived from the variational deterministic information bottleneck (VDIB) principle [2]. On the other hand, the VQ-VAE trained by the Expectation Maximization (EM) algorithm [3] can be viewed as an approximation to the variational information bottleneck(VIB) principle [4]. I Introduction The recent advances of variational autoencoder(VAE) provide new unsupervised approaches to learn hidden structure of the data [5]. The variational autoencoder is a powerful generative model which allows inference of the learned latent representation. However, the classic VAEs are prone to the "posterior collapse "phenomenon that the latent representations are ignored due to the powerful decoder. Vector quantized variational autoencoder (VQ-VAE) learns discrete representations by incorporating the idea of vector quantization into the bottleneck stage and the "posterior collapse "can be avoided [1].


Jointly learning relevant subgraph patterns and nonlinear models of their indicators

arXiv.org Machine Learning

Classification and regression in which the inputs are graphs of arbitrary size and shape have been paid attention in various fields such as computational chemistry and bioinformatics. Subgraph indicators are often used as the most fundamental features, but the number of possible subgraph patterns are intractably large due to the combinatorial explosion. We propose a novel efficient algorithm to jointly learn relevant subgraph patterns and nonlinear models of their indicators. Previous methods for such joint learning of subgraph features and models are based on search for single best subgraph features with specific pruning and boosting procedures of adding their indicators one by one, which result in linear models of subgraph indicators. In contrast, the proposed approach is based on directly learning regression trees for graph inputs using a newly derived bound of the total sum of squares for data partitions by a given subgraph feature, and thus can learn nonlinear models through standard gradient boosting. An illustrative example we call the Graph-XOR problem to consider nonlinearity, numerical experiments with real datasets, and scalability comparisons to naive approaches using explicit pattern enumeration are also presented.


Making Machine Learning Robust Against Adversarial Inputs

Communications of the ACM

Machine learning has advanced radically over the past 10 years, and machine learning algorithms now achieve human-level performance or better on a number of tasks, including face recognition,31 optical character recognition,8 object recognition,29 and playing the game Go.26 Yet machine learning algorithms that exceed human performance in naturally occurring scenarios are often seen as failing dramatically when an adversary is able to modify their input data even subtly. Machine learning is already used for many highly important applications and will be used in even more of even greater importance in the near future. Search algorithms, automated financial trading algorithms, data analytics, autonomous vehicles, and malware detection are all critically dependent on the underlying machine learning algorithms that interpret their respective domain inputs to provide intelligent outputs that facilitate the decision-making process of users or automated systems. As machine learning is used in more contexts where malicious adversaries have an incentive to interfere with the operation of a given machine learning system, it is increasingly important to provide protections, or "robustness guarantees," against adversarial manipulation. The modern generation of machine learning services is a result of nearly 50 years of research and development in artificial intelligence--the study of computational algorithms and systems that reason about their environment to make predictions.25 A subfield of artificial intelligence, most modern machine learning, as used in production, can essentially be understood as applied function approximation; when there is some mapping from an input x to an output y that is difficult for a programmer to describe through explicit code, a machine learning algorithm can learn an approximation of the mapping by analyzing a dataset containing several examples of inputs and their corresponding outputs. Google's image-classification system, Inception, has been trained with millions of labeled images.28 It can classify images as cats, dogs, airplanes, boats, or more complex concepts on par or improving on human accuracy. Increases in the size of machine learning models and their accuracy is the result of recent advancements in machine learning algorithms,17 particularly to advance deep learning.7 One focus of the machine learning research community has been on developing models that make accurate predictions, as progress was in part measured by results on benchmark datasets. In this context, accuracy denotes the fraction of test inputs that a model processes correctly--the proportion of images that an object-recognition algorithm recognizes as belonging to the correct class, and the proportion of executables that a malware detector correctly designates as benign or malicious. The estimate of a model's accuracy varies greatly with the choice of the dataset used to compute the estimate.


Sparse Stochastic Zeroth-Order Optimization with an Application to Bandit Structured Prediction

arXiv.org Machine Learning

Stochastic zeroth-order (SZO), or gradient-free, optimization allows to optimize arbitrary functions by relying only on function evaluations under parameter perturbations, however, the iteration complexity of SZO methods suffers a factor proportional to the dimensionality of the perturbed function. We show that in scenarios with natural sparsity patterns as in structured prediction applications, this factor can be reduced to the expected number of active features over input-output pairs. We give a general proof that applies sparse SZO optimization to Lipschitz-continuous, nonconvex, stochastic objectives, and present an experimental evaluation on linear bandit structured prediction tasks with sparse word-based feature representations that confirm our theoretical results.


GritNet: Student Performance Prediction with Deep Learning

arXiv.org Machine Learning

Student performance prediction - where a machine forecasts the future performance of students as they interact with online coursework - is a challenging problem. Reliable early-stage predictions of a student's future performance could be critical to facilitate timely educational interventions during a course. However, very few prior studies have explored this problem from a deep learning perspective. In this paper, we recast the student performance prediction problem as a sequential event prediction problem and propose a new deep learning based algorithm, termed GritNet, which builds upon the bidirectional long short term memory (BLSTM). Our results, from real Udacity students' graduation predictions, show that the GritNet not only consistently outperforms the standard logistic-regression based method, but that improvements are substantially pronounced in the first few weeks when accurate predictions are most challenging.


Rademacher Complexity Bounds for a Penalized Multi-class Semi-supervised Algorithm

Journal of Artificial Intelligence Research

We propose Rademacher complexity bounds for multi-class classifiers trained with a two-step semi-supervised model. In the first step, the algorithm partitions the partially labeled data and then identifies dense clusters containing κ predominant classes using the labeled training examples such that the proportion of their non-predominant classes is below a fixed threshold stands for clustering consistency. In the second step, a classifier is trained by minimizing a margin empirical loss over the labeled training set and a penalization term measuring the disability of the learner to predict the κ predominant classes of the identified clusters. The resulting data-dependent generalization error bound involves the margin distribution of the classifier, the stability of the clustering technique used in the first step and Rademacher complexity terms corresponding to partially labeled training data. Our theoretical result exhibit convergence rates extending those proposed in the literature for the binary case, and experimental results on different multi-class classification problems show empirical evidence that supports the theory.