Northeastern University
Reports on the 2018 AAAI Spring Symposium Series
Amato, Christopher (Northeastern University) | Ammar, Haitham Bou (PROWLER.io) | Churchill, Elizabeth (Google) | Karpas, Erez (Technion - Israel Institute of Technology) | Kido, Takashi (Stanford University) | Kuniavsky, Mike (Parc) | Lawless, W. F. (Paine College) | Rossi, Francesca (IBM T. J. Watson Research Center and University of Padova) | Oliehoek, Frans A. (TU Delft) | Russell, Stephen (US Army Research Laboratory) | Takadama, Keiki (University of Electro-Communications) | Srivastava, Siddharth (Arizona State University) | Tuyls, Karl (Google DeepMind) | Allen, Philip Van (Art Center College of Design) | Venable, K. Brent (Tulane University and IHMC) | Vrancx, Peter (PROWLER.io) | Zhang, Shiqi (Cleveland State University)
The Association for the Advancement of Artificial Intelligence, in cooperation with Stanford Universityโs Department of Computer Science, presented the 2018 Spring Symposium Series, held Monday through Wednesday, March 26โ28, 2018, on the campus of Stanford University. The seven symposia held were AI and Society: Ethics, Safety and Trustworthiness in Intelligent Agents; Artificial Intelligence for the Internet of Everything; Beyond Machine Intelligence: Understanding Cognitive Bias and Humanity for Well-Being AI; Data Efficient Reinforcement Learning; The Design of the User Experience for Artificial Intelligence (the UX of AI); Integrated Representation, Reasoning, and Learning in Robotics; Learning, Inference, and Control of Multi-Agent Systems. This report, compiled from organizers of the symposia, summarizes the research of five of the symposia that took place.
Action Prediction From Videos via Memorizing Hard-to-Predict Samples
Kong, Yu (Northeastern University ) | Gao, Shangqian (Northeastern University ) | Sun, Bin (Northeastern University ) | Fu, Yun (Northeastern University)
Action prediction based on video is an important problem in computer vision field with many applications, such as preventing accidents and criminal activities. It's challenging to predict actions at the early stage because of the large variations between early observed videos and complete ones. Besides, intra-class variations cause confusions to the predictors as well. In this paper, we propose a mem-LSTM model to predict actions in the early stage, in which a memory module is introduced to record several "hard-to-predict" samples and a variety of early observations. Our method uses Convolution Neural Network (CNN) and Long Short-Term Memory (LSTM) to model partial observed video input. We augment LSTM with a memory module to remember challenging video instances. With the memory module, our mem-LSTM model not only achieves impressive performance in the early stage but also makes predictions without the prior knowledge of observation ratio. Information in future frames is also utilized using a bi-directional layer of LSTM. Experiments on UCF-101 and Sports-1M datasets show that our method outperforms state-of-the-art methods.
VSE-ens: Visual-Semantic Embeddings with Efficient Negative Sampling
Guo, Guibing (Northeastern University) | Zhai, Songlin (Northeastern University) | Yuan, Fajie (University of Glasgow) | Liu, Yuan (Northeastern University) | Wang, Xingwei (Northeastern University)
Jointing visual-semantic embeddings (VSE) have become a research hotpot for the task of image annotation, which suffers from the issue of semantic gap, i.e., the gap between images' visual features (low-level) and labels' semantic features (high-level). This issue will be even more challenging if visual features cannot be retrieved from images, that is, when images are only denoted by numerical IDs as given in some real datasets. The typical way of existing VSE methods is to perform a uniform sampling method for negative examples that violate the ranking order against positive examples, which requires a time-consuming search in the whole label space. In this paper, we propose a fast adaptive negative sampler that can work well in the settings of no figure pixels available. Our sampling strategy is to choose the negative examples that are most likely to meet the requirements of violation according to the latent factors of images. In this way, our approach can linearly scale up to large datasets. The experiments demonstrate that our approach converges 5.02x faster than the state-of-the-art approaches on OpenImages, 2.5x on IAPR-TCI2 and 2.06x on NUS-WIDE datasets, as well as better ranking accuracy across datasets.
Learning Transferable Subspace for Human Motion Segmentation
Wang, Lichen (Northeastern University) | Ding, Zhengming (Northeastern University) | Fu, Yun (Northeastern University)
Temporal data clustering is a challenging task. Existing methods usually explore data self-representation strategy, which may hinder the clustering performance in insufficient or corrupted data scenarios. In real-world applications, we are easily accessible to a large amount of related labeled data. To this end, we propose a novel transferable subspace clustering approach by exploring useful information from relevant source data to enhance clustering performance in target temporal data. We manage to transform the original data into a shared low-dimensional and distinctive feature space by jointly seeking an effective domain-invariant projection. In this way, the well-labeled source knowledge can help obtain a more discriminative target representation. Moreover, a graph regularizer is designed to incorporate temporal information to preserve more sequence knowledge into the learned representation. Extensive experiments based on three human motion datasets illustrate that our approach is able to outperform state-of-the-art temporal data clustering methods.
Discriminative Semi-Coupled Projective Dictionary Learning for Low-Resolution Person Re-Identification
Li, Kai (Northeastern University) | Ding, Zhengming (Northeastern University) | Li, Sheng (Adobe Research, USA) | Fu, Yun (Northeastern University)
Person re-identification (re-ID) is a fundamental task in automated video surveillance. In real-world visual surveillance systems, a person is often captured in quite low resolutions. So we often need to perform low-resolution person re-ID, where images captured by different cameras have great resolution divergences. Existing methods cope problem via some complicated and time-consuming strategies, making them less favorable in practice, and their performances are far from satisfactory. In this paper, we design a novel Discriminative Semi-coupled Projective Dictionary Learning (DSPDL) model to effectively and efficiently solve this problem. Specifically, we propose to jointly learn a pair of dictionaries and a mapping to bridge the gap across low(er) and high(er) resolution person images. Besides, we develop a novel graph regularizer to incorporate positive and negative image pair information in a parameterless fashion. Meanwhile, we adopt the efficient and powerful projective dictionary learning technique to boost the our efficiency. Experiments on three public datasets show the superiority of the proposed method to the state-of-the-art ones.
Predictive Coding Machine for Compressed Sensing and Image Denoising
Li, Jun (Northeastern University) | Liu, Hongfu (Northeastern University) | Fu, Yun (Northeastern University)
Sparse and low rank coding has widely received much attention in machine learning, multimedia and computer vision. Unfortunately, expensive inference restricts the power of coding models in real-world applications, e.g., compressed sensing and image deblurring. In order to avoid the expensive inference, we propose a predictive coding machine (PCM) which aims to train a deep neural network (DNN) encoder to approximate the codes. By this means, a test sample can be fast approximated by the well-trained DNN. However, DNN leads PCM to be a non-convex and non-smooth optimization problem, which is extremely hard to solve. To address this challenge, we extend accelerated proximal gradient for PCM by steering gradient descent of DNN. To the best of our knowledge, we are the first to propose a gradient descent algorithm guided by accelerated proximal gradient for solving the PCM problem. Besides, a sufficient condition is provided to ensure the convergence to a critical point. Moreover, when the coding models are convex in PCM, the convergence rate O (1/( m 2 โ t )) can be held in which m is the iteration number of accelerated proximal gradient, and t is the epoch of training DNN. Numerical results verify the promising advantages of PCM in terms of effectiveness, efficiency and robustness.
High Rank Matrix Completion With Side Information
Wang, Yugang (University of Electronic Science and Technology of China) | Elhamifar, Ehsan (Northeastern University)
We address the problem of high-rank matrix completion with side information. In contrast to existing work dealing with side information, which assume that the data matrix is low-rank, we consider the more general scenario where the columns of the data matrix are drawn from a union of low-dimensional subspaces, which can lead to a high rank matrix. Our goal is to complete the matrix while taking advantage of the side information. To do so, we use the self-expressive property of the data, searching for a sparse representation of each column of matrix as a combination of a few other columns. More specifically, we propose a factorization of the data matrix as the product of side information matrices with an unknown interaction matrix, under which each column of the data matrix can be reconstructed using a sparse combination of other columns. As our proposed optimization, searching for missing entries and sparse coefficients, is non-convex and NP-hard, we propose a lifting framework, where we couple sparse coefficients and missing values and define an equivalent optimization that is amenable to convex relaxation. We also propose a fast implementation of our convex framework using a Linearized Alternating Direction Method. By extensive experiments on both synthetic and real data, and, in particular, by studying the problem of multi-label learning, we demonstrate that our method outperforms existing techniques in both low-rank and high-rank data regimes.
Contrastive Training for Models of Information Cascades
Xu, Shaobin (Northeastern University) | Smith, David A. (Northeastern University)
This paper proposes a model of information cascades as directed spanning trees (DSTs) over observed documents. In addition, we propose a contrastive training procedure that exploits partial temporal ordering of node infections in lieu of labeled training links. This combination of model and unsupervised training makes it possible to improve on models that use infection times alone and to exploit arbitrary features of the nodes and of the text content of messages in information cascades. With only basic node and time lag features similar to previous models, the DST model achieves performance with unsupervised training comparable to strong baselines on a blog network inference task. Unsupervised training with additional content features achieves significantly better results, reaching half the accuracy of a fully supervised model.
Latent Discriminant Subspace Representations for Multi-View Outlier Detection
Li, Kai (Northeastern University) | Li, Sheng (Adobe Research, USA) | Ding, Zhengming (Northeastern University) | Zhang, Weidong (JD.COM) | Fu, Yun (American Technologies Corporation)
Identifying multi-view outliers is challenging because of the complex data distributions across different views. Existing methods cope this problem by exploiting pairwise constraints across different views to obtain new feature representations,based on which certain outlier score measurements are defined. Due to the use of pairwise constraint, it is complicated and time-consuming for existing methods to detect outliers from three or more views. In this paper, we propose a novel method capable of detecting outliers from any number of dataviews. Our method first learns latent discriminant representations for all view data and defines a novel outlier score function based on the latent discriminant representations. Specifically, we represent multi-view data by a global low-rank representation shared by all views and residual representations specific to each view. Through analyzing the view-specific residual representations of all views, we can get the outlier score for every sample. Moreover, we raise the problem of detectinga third type of multi-view outliers which are neglected by existing methods. Experiments on six datasets show our method outperforms the existing ones in identifying all types of multi-view outliers, often by large margins.
An Interpretable Joint Graphical Model for Fact-Checking From Crowds
Nguyen, An T. (University of Texas at Austin) | Kharosekar, Aditya (University of Texas at Austin) | Lease, Matthew (University of Texas at Austin) | Wallace, Byron (Northeastern University)
Assessing the veracity of claims made on the Internet is an important, challenging, and timely problem. While automated fact-checking models have potential to help people better assess what they read, we argue such models must be explainable, accurate, and fast to be useful in practice; while prediction accuracy is clearly important, model transparency is critical in order for users to trust the system and integrate their own knowledge with model predictions. To achieve this, we propose a novel probabilistic graphical model (PGM) which combines machine learning with crowd annotations. Nodes in our model correspond to claim veracity, article stance regarding claims, reputation of news sources, and annotator reliabilities. We introduce a fast variational method for parameter estimation. Evaluation across two real-world datasets and three scenarios shows that: (1) joint modeling of sources, claims and crowd annotators in a PGM improves the predictive performance and interpretability for predicting claim veracity; and (2) our variational inference method achieves scalably fast parameter estimation, with only modest degradation in performance compared to Gibbs sampling. Regarding model transparency, we designed and deployed a prototype fact-checker Web tool, including a visual interface for explaining model predictions. Results of a small user study indicate that model explanations improve user satisfaction and trust in model predictions. We share our web demo, model source code, and the 13K crowd labels we collected.