Statistical Learning
Unsupervised Alignment of Natural Language Instructions with Video Segments
Naim, Iftekhar (University of Rochester) | Song, Young Chol (University of Rochester) | Liu, Qiguang (University of Rochester) | Kautz, Henry (University of Rochester) | Luo, Jiebo (University of Rochester) | Gildea, Daniel (University of Rochester)
We propose an unsupervised learning algorithm for automatically inferring the mappings between English nouns and corresponding video objects. Given a sequence of natural language instructions and an unaligned video recording, we simultaneously align each instruction to its corresponding video segment, and also align nouns in each instruction to their corresponding objects in video. While existing grounded language acquisition algorithms rely on pre-aligned supervised data (each sentence paired with corresponding image frame or video segment), our algorithm aims to automatically infer the alignment from the temporal structure of the video and parallel text instructions. We propose two generative models that are closely related to the HMM and IBM 1 word alignment models used in statistical machine translation. We evaluate our algorithm on videos of biological experiments performed in wetlabs, and demonstrate its capability of aligning video segments to text instructions and matching video objects to nouns in the absence of any direct supervision.
Adaptive Multi-Compositionality for Recursive Neural Models with Applications to Sentiment Analysis
Dong, Li (Beihang University) | Wei, Furu (Microsoft Research) | Zhou, Ming (Microsoft Research) | Xu, Ke (Beihang University)
Recursive neural models have achieved promising results in many natural language processing tasks. The main difference among these models lies in the composition function, i.e., how to obtain the vector representation for a phrase or sentence using the representations of words it contains. This paper introduces a novel Adaptive Multi-Compositionality (AdaMC) layer to recursive neural models. The basic idea is to use more than one composition functions and adaptively select them depending on the input vectors. We present a general framework to model each semantic composition as a distribution over these composition functions. The composition functions and parameters used for adaptive selection are learned jointly from data. We integrate AdaMC into existing recursive neural models and conduct extensive experiments on the Stanford Sentiment Treebank. The results illustrate that AdaMC significantly outperforms state-of-the-art sentiment classification methods. It helps push the best accuracy of sentence-level negative/positive classification from 85.4% up to 88.5%.
SenticNet 3: A Common and Common-Sense Knowledge Base for Cognition-Driven Sentiment Analysis
Cambria, Erik (Nanyang Technological University) | Olsher, Daniel (Carnegie Mellon University) | Rajagopal, Dheeraj (National University of Singapore)
SenticNet is a publicly available semantic and affective resource for concept-level sentiment analysis. Rather than using graph-mining and dimensionality-reduction techniques, SenticNet 3 makes use of "energy flows" to connect various parts of extended common and common-sense knowledge representations to one another. SenticNet 3 models nuanced semantics and sentics (that is, the conceptual and affective information associated with multi-word natural language expressions), representing information with a symbolic opacity of an intermediate nature between that of neural networks and typical symbolic systems.
Feature Selection at the Discrete Limit
Zhang, Miao (University of Texas at Arlington) | Ding, Chris (University of Texas at Arlington) | Zhang, Ya (Shanghai Jiao Tong University) | Nie, Feiping (University of Texas at Arlington)
Feature selection plays an important role in many machine learning and data mining applications. In this paper, we propose to use L2,p norm for feature selection with emphasis on small p. As p approaches 0, feature selection becomes discrete feature selection problem. We provide two algorithms, proximal gradient algorithm and rank one update algorithm, which is more efficient at large regularization. We provide closed form solutions of the proximal operator at p = 0, 1/2. Experiments onreal life datasets show that features selected at small p consistently outperform features selected at p = 1, the standard L2,1 approach and other popular feature selection methods.
Decomposing Activities of Daily Living to Discover Routine Clusters
Yรผrรผten, Onur (รcole polytechnique fรฉdรฉrale de Lausanne) | Zhang, Jiyong (รcole polytechnique fรฉdรฉrale de Lausanne) | Pu, Pearl (รcole polytechnique fรฉdรฉrale de Lausanne)
The modern sensor technology helps us collect time series data for activities of daily living (ADLs), which in turn can be used to infer broad patterns, such as common daily routines. Most of the existing approaches either rely on a model trained by a preselected and manually labeled set of activities, or perform micro-pattern analysis with manually selected length and number of micro-patterns. Since real life ADL datasets are massive, such approaches would be too costly to apply. Thus, there is a need to formulate unsupervised methods that can be applied to different time scales.We propose a novel approach to discover clusters of daily activity routines.We use a matrix decomposition method to isolate routines and deviations to obtain two different sets of clusters. We obtain the final memberships via the cross product of these sets. We validate our approach using two real-life ADL datasets and a well-known artificial dataset. Based on average silhouette width scores, our approach can capture strong structures in the underlying data. Furthermore, results show that our approach improves on the accuracy of the baseline algorithms by 12% with a statistical significance (p < 0.05) using the Wilcoxon signed-rank comparison test.
Privacy and Regression Model Preserved Learning
Yi, Jinfeng (IBM Thomas J. Watson Research Center) | Wang, Jun (IBM Thomas J. Watson Research Center) | Jin, Rong (Michigan State University)
Sensitive data such as medical records and business reports usually contains valuable information that can be used to build prediction models. However, designing learning models by directly using sensitive data might result in severe privacy and copyright issues. In this paper, we propose a novel matrix completion based framework that aims to tackle two challenging issues simultaneously: i) handling missing and noisy sensitive data, and ii) preserving the privacy of the sensitive data during the learning process. In particular, the proposed framework is able to mask the sensitive data while ensuring that the transformed data are still usable for training regression models. We show that two key properties, namely model preserving and privacy preserving, are satisfied by the transformed data obtained from the proposed framework. In model preserving, we guarantee that the linear regression model built from the masked data approximates the regression model learned from the original data in a perfect way. In privacy preserving, we ensure that the original sensitive data cannot be recovered since the transformation procedure is irreversible. Given these two characteristics, the transformed data can be safely released to any learners for designing prediction models without revealing any private content. Our empirical studies with a synthesized dataset and multiple sensitive benchmark datasets verify our theoretical claim as well as the effectiveness of the proposed framework.
Adaptive Knowledge Transfer for Multiple Instance Learning in Image Classification
Wang, Qifan (Purdue University) | Ruan, Lingyun (Purdue University) | Si, Luo (Purdue University)
Multiple Instance Learning (MIL) is a popular learning technique in various vision tasks including image classification. However, most existing MIL methods do not consider the problem of insufficient examples in the given target category. In this case, it is difficult for traditional MIL methods to build an accurate classifier due to the lack of training examples. Motivated by the empirical success of transfer learning, this paper proposes a novel approach of Adaptive Knowledge Transfer for Multiple Instance Learning (AKT-MIL) in image classification. The new method transfers cross-category knowledge from source categories under multiple instance setting for boosting the learning process. A unified learning framework with a data-dependent mixture model is designed to adaptively combine the transferred knowledge from sources with a weak classifier built in the target domain. Based on this framework, an iterative coordinate descent method with Constraint Concave-Convex Programming (CCCP) is proposed as the optimization procedure. An extensive set of experimental results demonstrate that the proposed AKT-MIL approach substantially outperforms several state-of-the-art algorithms on two benchmark datasets, especially in the scenario when very few training examples are available in the target domain.
Globally and Locally Consistent Unsupervised Projection
Wang, Hua (Colorado School of Mines) | Nie, Feiping (University of Texas at Arlington) | Huang, Heng (University of Texas at Arlington)
In this paper, we propose an unsupervised projection method for feature extraction to preserve both global and local consistencies of the input data in the projected space. Traditional unsupervised feature extraction methods, such as principal component analysis (PCA) and locality preserving projections (LPP), can only explore either the global or local geometric structures of the input data, but not the both at the same time. In our new method, we introduce a new measurement using the neighborhood data variances to assess the data locality, by which we propose to learn an optimal projection by rewarding both the global and local structures of the input data. The formulated optimization problem is challenging to solve, because it ends up a trace ratio minimization problem. In this paper, as an important theoretical contribution, we propose a simple yet efficient optimization algorithm to solve the trace ratio problem with theoretically proved convergence. Extensive experiments have been performed on six benchmark data sets, where the promising results validate the proposed method.
Robust Distance Metric Learning in the Presence of Label Noise
Wang, Dong (Nanjing University of Aeronautics and Astronautics) | Tan, Xiaoyang (Nanjing University of Aeronautics and Astronautics)
Many distance learning algorithms have been developed in recent years. However, few of them consider the problem when the class labels of training data are noisy, and this may lead to serious performance deterioration. In this paper, we present a robust distance learning method in the presence of label noise, by extending a previous non-parametric discriminative distance learning algorithm, i.e., Neighbourhood Components Analysis (NCA). Particularly, we analyze the effect of label noise on the derivative of likelihood with respect to the transformation matrix, and propose to model the conditional probability of the true label of each point so as to reduce that effect. The model is then optimized within the EM framework, with additional regularization used to avoid overfitting. Our experiments on several UCI datasets and a real dataset with unknown noise patterns show that the proposed RNCA is more tolerant to class label noise compared to the original NCA method.
Identifying Differences in Physician Communication Styles with a Log-Linear Transition Component Model
Wallace, Byron C (Brown University) | Dahabreh, Issa J (Brown University) | Trikalinos, Thomas A (Brown University) | Laws, Michael Barton (Brown University) | Wilson, Ira (Brown University) | Charniak, Eugene (Brown University)
We consider the task of grouping doctors with respect to communication patterns exhibited in outpatient visits. We propose a novel approach toward this end in which we model speech act transitions in conversations via a log-linear model incorporating physician specific components. We train this model over transcripts of outpatient visits annotated with speech act codes and then cluster physicians in (a transformation of) this parameter space. We find significant correlations between the induced groupings and patient survey response data comprising ratings of physician communication. Furthermore, the novel sequential component model we leverage to induce this clustering allows us to explore differences across these groups. This work demonstrates how statistical AI might be used to better understand (and ultimately improve) physician communication.