Goto

Collaborating Authors

 Oceania


Encoding High Dimensional Local Features by Sparse Coding Based Fisher Vectors

Neural Information Processing Systems

Deriving from the gradient vector of a generative model of local features, Fisher vector coding (FVC) has been identified as an effective coding method for image classification. Most, if not all, FVC implementations employ the Gaussian mixture model (GMM) to characterize the generation process of local features. This choice has shown to be sufficient for traditional low dimensional local features, e.g., SIFT; and typically, good performance can be achieved with only a few hundred Gaussian distributions. However, the same number of Gaussians is insufficient to model the feature space spanned by higher dimensional local features, which have become popular recently. In order to improve the modeling capacity for high dimensional features, it turns out to be inefficient and computationally impractical to simply increase the number of Gaussians. In this paper, we propose a model in which each local feature is drawn from a Gaussian distribution whose mean vector is sampled from a subspace. With certain approximation, this model can be converted to a sparse coding procedure and the learning/inference problems can be readily solved by standard sparse coding methods. By calculating the gradient vector of the proposed model, we derive a new fisher vector encoding strategy, termed Sparse Coding based Fisher Vector Coding (SCFVC). Moreover, we adopt the recently developed Deep Convolutional Neural Network (CNN) descriptor as a high dimensional local feature and implement image classification with the proposed SCFVC. Our experimental evaluations demonstrate that our method not only significantly outperforms the traditional GMM based Fisher vector encoding but also achieves the state-of-the-art performance in generic object recognition, indoor scene, and fine-grained image classification problems.


Robust Bayesian Max-Margin Clustering

Neural Information Processing Systems

We present max-margin Bayesian clustering (BMC), a general and robust framework that incorporates the max-margin criterion into Bayesian clustering models, as well as two concrete models of BMC to demonstrate its flexibility and effectiveness in dealing with different clustering tasks. The Dirichlet process max-margin Gaussian mixture is a nonparametric Bayesian clustering model that relaxes the underlying Gaussian assumption of Dirichlet process Gaussian mixtures by incorporating max-margin posterior constraints, and is able to infer the number of clusters from data. We further extend the ideas to present max-margin clustering topic model, which can learn the latent topic representation of each document while at the same time cluster documents in the max-margin fashion. Extensive experiments are performed on a number of real datasets, and the results indicate superior clustering performance of our methods compared to related baselines.


(Almost) No Label No Cry

Neural Information Processing Systems

In Learning with Label Proportions (LLP), the objective is to learn a supervised classifier when, instead of labels, only label proportions for bags of observations are known. This setting has broad practical relevance, in particular for privacy preserving data processing. We first show that the mean operator, a statistic which aggregates all labels, is minimally sufficient for the minimization of many proper scoring losses with linear (or kernelized) classifiers without using labels. We provide a fast learning algorithm that estimates the mean operator via a manifold regularizer with guaranteed approximation bounds. Then, we present an iterative learning algorithm that uses this as initialization. We ground this algorithm in Rademacher-style generalization bounds that fit the LLP setting, introducing a generalization of Rademacher complexity and a Label Proportion Complexity measure. This latter algorithm optimizes tractable bounds for the corresponding bag-empirical risk. Experiments are provided on fourteen domains, whose size ranges up to 300K observations. They display that our algorithms are scalable and tend to consistently outperform the state of the art in LLP. Moreover, in many cases, our algorithms compete with or are just percents of AUC away from the Oracle that learns knowing all labels. On the largest domains, half a dozen proportions can suffice, i.e. roughly 40K times less than the total number of labels.


Worst-Case Linear Discriminant Analysis as Scalable Semidefinite Feasibility Problems

arXiv.org Artificial Intelligence

In this paper, we propose an efficient semidefinite programming (SDP) approach to worst-case linear discriminant analysis (WLDA). Compared with the traditional LDA, WLDA considers the dimensionality reduction problem from the worst-case viewpoint, which is in general more robust for classification. However, the original problem of WLDA is non-convex and difficult to optimize. In this paper, we reformulate the optimization problem of WLDA into a sequence of semidefinite feasibility problems. To efficiently solve the semidefinite feasibility problems, we design a new scalable optimization method with quasi-Newton methods and eigen-decomposition being the core components. The proposed method is orders of magnitude faster than standard interior-point based SDP solvers. Experiments on a variety of classification problems demonstrate that our approach achieves better performance than standard LDA. Our method is also much faster and more scalable than standard interior-point SDP solvers based WLDA. The computational complexity for an SDP with $m$ constraints and matrices of size $d$ by $d$ is roughly reduced from $\mathcal{O}(m^3+md^3+m^2d^2)$ to $\mathcal{O}(d^3)$ ($m>d$ in our case).


Approximate evaluation of marginal association probabilities with belief propagation

arXiv.org Artificial Intelligence

Data association, the problem of reasoning over correspondence between targets and measurements, is a fundamental problem in tracking. This paper presents a graphical model formulation of data association and applies an approximate inference method, belief propagation (BP), to obtain estimates of marginal association probabilities. We prove that BP is guaranteed to converge, and bound the number of iterations necessary. Experiments reveal a favourable comparison to prior methods in terms of accuracy and computational complexity.


Capturing Triadic Conversations — A Visual Director System for Dynamic Interactive Narratives

AAAI Conferences

Film cinematography has been developed and applied for more than a century to involve and engage the viewer in visual storytelling. Interactive storytelling games can benefit from these cinematic conventions to enhance visual experience. However, even conversation scenes in games are highly dynamic, and pre-authoring camera parameters using cinematography principles is often insufficient. This paper proposes an automatic Visual Director System focused on dynamic conversation scenes involving three characters and reports on work in progress on a prototype applied to the recreation of a movie scene. Based on principles of cinematography and the study of film scenes, cinematic conventions for triadic conversations are encoded modularly as an artificial intelligence game component that selects suitable shots for dynamic scenes.


An Interactive Narrative System for Narrative-Based Games for Health

AAAI Conferences

This paper presents an interactive narrative framework we have designed for games that promote health behavior change. The framework aims to address two key issues: player engagement with the game, and player adherence to the health behavior change-related homework they receive in the game. In this paper, we describe our narrative system that tackles these issues and a prototype game that promotes physical activity in which our narrative system is integrated.


Discovering and Characterizing Emerging Events in Big Data

AAAI Conferences

We describe a novel system for discovering and characterizing emerging events. We define event emergence to be a developing situation comprised of a series of sub-events. To detect sub-events from a very large, continuous textual input stream, we use two techniques: (1) frequency-based detection of sub-events that are potentially entailed by an emerging event; and (2) anomaly-based detection of other sub-events that are potentially indicative of an emerging event. Identifying emerging events from detected sub-events involves connecting sub-events to each other and to the relevant emerging events within the event models and estimating the likelihood of possible emerging events. Each sub-event can be part of a number of emerging events and supports various event models to varying degrees. We adopt a coherent and compact model that probabilistically identifies emerging events. The innovative aspect of our work is a well-defined framework where statistical Big Data techniques are informed by event semantics and inference techniques (and vice versa). Our work is strongly grounded in semantics and knowledge representation, which enables us to produce more reliable results than would otherwise be possible with a purely statistical approach.


Shared Awareness, Autonomy and Trust in Human-Robot Teamwork

AAAI Conferences

Teamwork requires mutual trust among team members. Establishing and maintaining trust depends upon alignment of mental models, an aspect of shared awareness. We present a theory of how maintenance of model alignment is integral to fluid changes in relative control authority (i.e., adaptive autonomy) in human-robot teamwork.


Reasoning about Topological and Cardinal Direction Relations Between 2-Dimensional Spatial Objects

Journal of Artificial Intelligence Research

Increasing the expressiveness of qualitative spatial calculi is an essential step towards meeting the requirements of applications. This can be achieved by combining existing calculi in a way that we can express spatial information using relations from multiple calculi. The great challenge is to develop reasoning algorithms that are correct and complete when reasoning over the combined information. Previous work has mainly studied cases where the interaction between the combined calculi was small, or where one of the two calculi was very simple. In this paper we tackle the important combination of topological and directional information for extended spatial objects. We combine some of the best known calculi in qualitative spatial reasoning, the RCC8 algebra for representing topological information, and the Rectangle Algebra (RA) and the Cardinal Direction Calculus (CDC) for directional information. We consider two different interpretations of the RCC8 algebra, one uses a weak connectedness relation, the other uses a strong connectedness relation. In both interpretations, we show that reasoning with topological and directional information is decidable and remains in NP. Our computational complexity results unveil the significant differences between RA and CDC, and that between weak and strong RCC8 models. Take the combination of basic RCC8 and basic CDC constraints as an example: we show that the consistency problem is in P only when we use the strong RCC8 algebra and explicitly know the corresponding basic RA constraints.