Goto

Collaborating Authors

 Deep Learning


Sherlock: Scalable Fact Learning in Images

AAAI Conferences

We study scalable and uniform understanding of facts in images. Existing visual recognition systems are typically modeled differently for each fact type such as objects, actions, and interactions. We propose a setting where all these facts can be modeled simultaneously with a capacity to understand an ย unbounded number of facts in a structured way. The training data comes as structured facts in images, including (1) objects (e.g., <boy>), (2) attributes (e.g., <boy, tall>), (3) actions (e.g., <boy, playing>), and (4) interactions (e.g., <boy, riding, a horse >). Each fact has a semantic language view (e.g., < boy, playing>) and a visual view (an image with this fact). We show that learning visual facts in a structured way enables not only a uniform but also generalizable visual understanding. We propose and investigate recent and strong approaches from the multiview learning literature and also introduce two learning representation models as potential baselines. We applied the investigated methods on several datasets that we augmented with structured facts and a large scale dataset of more than 202,000 facts and 814,000 images. Our experiments show the advantage of relating facts by the structure by the proposed models compared to the designed baselines on bidirectional fact retrieval.


Deep Manifold Learning of Symmetric Positive Definite Matrices with Application to Face Recognition

AAAI Conferences

In this paper, we aim to construct a deep neural network which embeds high dimensional symmetric positive definite (SPD) matrices into a more discriminative low dimensional SPD manifold. To this end, we develop two types of basic layers: a 2D fully connected layer which reduces the dimensionality of the SPD matrices, and a symmetrically clean layer which achieves non-linear mapping. Specifically, we extend the classical fully connected layer such that it is suitable for SPD matrices, and we further show that SPD matrices with symmetric pair elements setting zero operations are still symmetric positive definite. Finally, we complete the construction of the deep neural network for SPD manifold learning by stacking the two layers. Experiments on several face datasets demonstrate the effectiveness of the proposed method.


Deep Correlated Metric Learning for Sketch-based 3D Shape Retrieval

AAAI Conferences

The explosive growth of 3D models has led to the pressing demand for an efficient searching system. Traditional model-based search is usually not convenient, since people don't always have 3D model available by side. The sketch-based 3D shape retrieval is a promising candidate due to its simpleness and efficiency. The main challenge for sketch-based 3D shape retrieval is the discrepancy across different domains. In the paper, we propose a novel deep correlated metric learning (DCML) method to mitigate the discrepancy between sketch and 3D shape domains. The proposed DCML trains two distinct deep neural networks (one for each domain) jointly with one loss, which learns two deep nonlinear transformations to map features from both domains into a nonlinear feature space. The proposed loss, including discriminative loss and correlation loss, aims to increase the discrimination of features within each domain as well as the correlation between different domains. In the transfered space, the discriminative loss minimizes the intra-class distance of the deep transformed features and maximizes the inter-class distance of the deep transformed features at least a predefined margin within each domain, while the correlation loss focuses on minimizing the distribution discrepancy across different domains. Our proposed method is evaluated on SHREC 2013 and 2014 benchmarks, and the experimental results demonstrate the superiority of our proposed method over the state-of-the-art methods.


A Multi-Task Deep Network for Person Re-Identification

AAAI Conferences

Person re-identification (ReID) focuses on identifying people across different scenes in video surveillance, which is usually formulated as a binary classification task or a ranking task in current person ReID approaches. In this paper, we take both tasks into account and propose a multi-task deep network (MTDnet) that makes use of their own advantages and jointly optimize the two tasks simultaneously for person ReID. To the best of our knowledge, we are the first to integrate both tasks in one network to solve the person ReID. We show that our proposed architecture significantly boosts the performance. Furthermore, deep architecture in general requires a sufficient dataset for training, which is usually not met in person ReID. To cope with this situation, we further extend the MTDnet and propose a cross-domain architecture that is capable of using an auxiliary set to assist training on small target sets. In the experiments, our approach outperforms most of existing person ReID algorithms on representative datasets including CUHK03, CUHK01, VIPeR, iLIDS and PRID2011, which clearly demonstrates the effectiveness of the proposed approach.


Reference Based LSTM for Image Captioning

AAAI Conferences

Image captioning is an important problem in artificial intelligence, related to both computer vision and natural language processing. There are two main problems in existing methods: in the training phase, it is difficult to find which parts of the captions are more essential to the image; in the caption generation phase, the objects or the scenes are sometimes misrecognized. In this paper, we consider the training images as the references and propose a Reference based Long Short Term Memory (R-LSTM) model, aiming to solve these two problems in one goal. When training the model, we assign different weights to different words, which enables the network to better learn the key information of the captions. When generating a caption, the consensus score is utilized to exploit the reference information of neighbor images, which might fix the misrecognition and make the descriptions more natural-sounding. The proposed R-LSTM model outperforms the state-of-the-art approaches on the benchmark dataset MS COCO and obtains top 2 position on 11 of the 14 metrics on the online test server.


Collective Deep Quantization for Efficient Cross-Modal Retrieval

AAAI Conferences

Cross-modal similarity retrieval is a problem about designing a retrieval system that supports querying across content modalities, e.g., using an image to retrieve for texts. This paper presents a compact coding solution for efficient cross-modal retrieval, with a focus on the quantization approach which has already shown the superior performance over the hashing solutions in single-modal similarity retrieval. We propose a collective deep quantization (CDQ) approach, which is the first attempt to introduce quantization in end-to-end deep architecture for cross-modal retrieval. The major contribution lies in jointly learning deep representations and the quantizers for both modalities using carefully-crafted hybrid networks and well-specified loss functions. In addition, our approach simultaneously learns the common quantizer codebook for both modalities through which the cross-modal correlation can be substantially enhanced. CDQ enables efficient and effective cross-modal retrieval using inner product distance computed based on the common codebook with fast distance table lookup. Extensive experiments show that CDQ yields state of the art cross-modal retrieval results on standard benchmarks.


Unit Dependency Graph and Its Application to Arithmetic Word Problem Solving

AAAI Conferences

Math word problems provide a natural abstraction to a range of natural language understanding problems that involve reasoning about quantities, such as interpreting election results, news about casualties, and the financial section of a newspaper. Units associated with the quantities often provide information that is essential to support this reasoning. This paper proposes a principled way to capture and reason about units and shows how it can benefit an arithmetic word problem solver. This paper presents the concept of Unit Dependency Graphs (UDGs), which provides a compact representation of the dependencies between units of numbers mentioned in a given problem. Inducing the UDG alleviates the brittleness of the unit extraction system and allows for a natural way to leverage domain knowledge about unit compatibility, for word problem solving. We introduce a decomposed model for inducing UDGs with minimal additional annotations, and use it to augment the expressions used in the arithmetic word problem solver of (Roy and Roth 2015) via a constrained inference framework. We show that introduction of UDGs reduces the error of the solver by over 10 %, surpassing all existing systems for solving arithmetic word problems. In addition, it also makes the system more robust to adaptation to new vocabulary and equation forms .


Unit Dependency Graph and Its Application to Arithmetic Word Problem Solving

AAAI Conferences

Math word problems provide a natural abstraction to a range of natural language understanding problems that involve reasoning about quantities, such as interpreting election results, news about casualties, and the financial section of a newspaper. Units associated with the quantities often provide information that is essential to support this reasoning. This paper proposes a principled way to capture and reason about units and shows how it can benefit an arithmetic word problem solver. This paper presents the concept of Unit Dependency Graphs (UDGs), which provides a compact representation of the dependencies between units of numbers mentioned in a given problem. Inducing the UDG alleviates the brittleness of the unit extraction system and allows for a natural way to leverage domain knowledge about unit compatibility, for word problem solving. We introduce a decomposed model for inducing UDGs with minimal additional annotations, and use it to augment the expressions used in the arithmetic word problem solver of (Roy and Roth 2015) via a constrained inference framework. We show that introduction of UDGs reduces the error of the solver by over 10 %, surpassing all existing systems for solving arithmetic word problems. In addition, it also makes the system more robust to adaptation to new vocabulary and equation forms .


Unit Dependency Graph and Its Application to Arithmetic Word Problem Solving

AAAI Conferences

Math word problems provide a natural abstraction to a range of natural language understanding problems that involve reasoning about quantities, such as interpreting election results, news about casualties, and the financial section of a newspaper. Units associated with the quantities often provide information that is essential to support this reasoning. This paper proposes a principled way to capture and reason about units and shows how it can benefit an arithmetic word problem solver. This paper presents the concept of Unit Dependency Graphs (UDGs), which provides a compact representation of the dependencies between units of numbers mentioned in a given problem. Inducing the UDG alleviates the brittleness of the unit extraction system and allows for a natural way to leverage domain knowledge about unit compatibility, for word problem solving. We introduce a decomposed model for inducing UDGs with minimal additional annotations, and use it to augment the expressions used in the arithmetic word problem solver of (Roy and Roth 2015) via a constrained inference framework. We show that introduction of UDGs reduces the error of the solver by over 10 %, surpassing all existing systems for solving arithmetic word problems. In addition, it also makes the system more robust to adaptation to new vocabulary and equation forms .


Unsupervised Feature Learning for 3D Scene Reconstruction with Occupancy Maps

AAAI Conferences

This paper addresses the task of unsupervised feature learning for three-dimensional occupancy mapping, as a way to segment higher-level structures based on raw unorganized point cloud data. In particular, we focus on detecting planar surfaces, which are common in most structured or semi-structured environments. This segmentation is then used to minimize the amount of parameters necessary to properly create a 3D occupancy model of the surveyed space, thus increasing computational speed and decreasing memory requirements. As the 3D modeling tool, an extension to Hilbert Maps was selected, since it naturally uses a feature-based representation of the environment to achieve real-time performance. Experiments conducted in simulated and real large-scale datasets show a substantial gain in performance, while decreasing the amount of stored information by orders of magnitude without sacrificing accuracy.