Goto

Collaborating Authors

 Huazhong University of Science and Technology


LSTD: A Low-Shot Transfer Detector for Object Detection

AAAI Conferences

Recent advances in object detection are mainly driven by deep learning with large-scale detection benchmarks. However, the fully-annotated training set is often limited for a target detection task, which may deteriorate the performance of deep detectors. To address this challenge, we propose a novel low-shot transfer detector (LSTD) in this paper, where we leverage rich source-domain knowledge to construct an effective target-domain detector with very few training examples. The main contributions are described as follows. First, we design a flexible deep architecture of LSTD to alleviate transfer difficulties in low-shot detection. This architecture can integrate the advantages of both SSD and Faster RCNN in a unified deep framework. Second, we introduce a novel regularized transfer learning framework for low-shot detection, where the transfer knowledge (TK) and background depression (BD) regularizations are proposed to leverage object knowledge respectively from source and target domains, in order to further enhance fine-tuning with a few target images. Finally, we examine our LSTD on a number of challenging low-shot detection experiments, where LSTD outperforms other state-of-the-art approaches. The results demonstrate that LSTD is a preferable deep detector for low-shot scenarios.


GraphGAN: Graph Representation Learning With Generative Adversarial Nets

AAAI Conferences

The goal of graph representation learning is to embed each vertex in a graph into a low-dimensional vector space. Existing graph representation learning methods can be classified into two categories: generative models that learn the underlying connectivity distribution in the graph, and discriminative models that predict the probability of edge existence between a pair of vertices. In this paper, we propose GraphGAN, an innovative graph representation learning framework unifying above two classes of methods, in which the generative model and discriminative model play a game-theoretical minimax game. Specifically, for a given vertex, the generative model tries to fit its underlying true connectivity distribution over all other vertices and produces "fake" samples to fool the discriminative model, while the discriminative model tries to detect whether the sampled vertex is from ground truth or generated by the generative model. With the competition between these two models, both of them can alternately and iteratively boost their performance. Moreover, when considering the implementation of generative model, we propose a novel graph softmax to overcome the limitations of traditional softmax function, which can be proven satisfying desirable properties of normalization, graph structure awareness, and computational efficiency. Through extensive experiments on real-world datasets, we demonstrate that GraphGAN achieves substantial gains in a variety of applications, including link prediction, node classification, and recommendation, over state-of-the-art baselines.


End-to-End United Video Dehazing and Detection

AAAI Conferences

The recent development of CNN-based image dehazing has revealed the effectiveness of end-to-end modeling. However, extending the idea to end-to-end video dehazing has not been explored yet. In this paper, we propose an End-to-End Video Dehazing Network (EVD-Net), to exploit the temporal consistency between consecutive video frames. A thorough study has been conducted over a number of structure options, to identify the best temporal fusion strategy. Furthermore, we build an End-to-End United Video Dehazing and Detection Network (EVDD-Net), which concatenates and jointly trains EVD-Net with a video object detection model. The resulting augmented end-to-end pipeline has demonstrated much more stable and accurate detection results in hazy video.


A Two-Stage MaxSAT Reasoning Approach for the Maximum Weight Clique Problem

AAAI Conferences

MaxSAT reasoning is an effective technology used in modern branch-and-bound (BnB) algorithms for the Maximum Weight Clique problem (MWC) to reduce the search space. However, the current MaxSAT reasoning approach for MWC is carried out in a blind manner and is not guided by any relevant strategy. In this paper, we describe a new BnB algorithm for MWC that incorporates a novel two-stage MaxSAT reasoning approach. In each stage, the MaxSAT reasoning is specialised and guided for different tasks. Experiments on an extensive set of graphs show that the new algorithm implementing this approach significantly outperforms relevant exact and heuristic MWC algorithms in both small/medium and massive real-world graphs.


An Exact Algorithm for the Maximum Weight Clique Problem in Large Graphs

AAAI Conferences

We describe an exact branch-and-bound algorithm for the maximum weight clique problem (MWC), called WLMC, that is especially suited for large vertex-weighted graphs. WLMC incorporates two original contributions: a preprocessing to derive an initial vertex ordering and to reduce the size of the graph, and incremental vertex-weight splitting to reduce the number of branches in the search space. Experiments on representative large graphs from real-world applications show that WLMC greatly outperforms relevant exact and heuristic MWC algorithms, and refute the prevailing hypothesis that exact MWC algorithms are less adequate for large graphs than heuristic algorithms.


TextBoxes: A Fast Text Detector with a Single Deep Neural Network

AAAI Conferences

This paper presents an end-to-end trainable fast scene text detector, named TextBoxes, which detects scene text with both high accuracy and efficiency in a single network forward pass, involving no post-process except for a standard non-maximum suppression. TextBoxes outperforms competing methods in terms of text localization accuracy and is much faster, taking only 0.09s per image in a fast implementation. Furthermore, combined with a text recognizer, TextBoxes significantly outperforms state-of-the-art approaches on word spotting and end-to-end text recognition tasks.


Multidimensional Scaling on Multiple Input Distance Matrices

AAAI Conferences

Multidimensional Scaling (MDS) is a classic technique that seeks vectorial representations for data points, given the pairwise distances between them. In recent years, data are usually collected from diverse sources or have multiple heterogeneous representations. However, how to do multidimensional scaling on multiple input distance matrices is still unsolved to our best knowledge. In this paper, we first define this new task formally. Then, we propose a new algorithm called Multi-View Multidimensional Scaling (MVMDS) by considering each input distance matrix as one view. The proposed algorithm can learn the weights of views (i.e., distance matrices) automatically by exploring the consensus information and complementary nature of views. Experimental results on synthetic as well as real datasets demonstrate the effectiveness of MVMDS. We hope that our work encourages a wider consideration in many domains where MDS is needed.


Regularized Diffusion Process for Visual Retrieval

AAAI Conferences

Diffusion process has advanced visual retrieval greatly owing to its capacity in capturing the geometry structure of the underlying manifold. Recent studies (Donoser and Bischof 2013) have experimentally demonstrated that diffusion process on the tensor product graph yields better retrieval performances than that on the original affinity graph. However, the principle behind this kind of diffusion process remains unclear, i.e., what kind of manifold structure is captured and how it is reflected. In this paper, we propose a new variant o diffusion process, which also operates on a tensor product graph. It is defined in three equivalent formulations (regularization framework, iterative framework and limit framework, respectively). Based on our study, three insightful conclusions are drawn which theoretically explain how this kind of diffusion process can better reveal the intrinsic relationship between objects. Besides, extensive experimental results on various retrieval tasks testify the validity of the proposed method.


Query Answering with Inconsistent Existential Rules under Stable Model Semantics

AAAI Conferences

Classical inconsistency-tolerant query answering relies on selecting maximal components of an ABox/database which are consistent with the ontology. However, some rules in ontologies might be unreliable if they are extracted from ontology learning or written by unskillful knowledge engineers. In this paper we present a framework of handling inconsistent existential rules under stable model semantics, which is defined by a notion called rule repairs to select maximal components of the existential rules. Surprisingly, for R-acyclic existential rules with R-stratified or guarded existential rules with stratified negations, both the data complexity and combined complexity of query answering under the rule repair semantics remain the same as that under the conventional query answering semantics. This leads us to propose several approaches to handle the rule repair semantics by calling answer set programming solvers. An experimental evaluation shows that these approaches have good scalability of query answering under rule repairs on realistic cases.


Graph-without-cut: An Ideal Graph Learning for Image Segmentation

AAAI Conferences

Graph-based image segmentation organizes the image elements into graphs and partitions an image based on the graph. It has been widely used and many promising results are obtained. Since the segmentation performance highly depends on the graph, most of existing methods focus on obtaining a precise similarity graph or on designing efficient cutting/merging strategies. However, these two components are often conducted in two separated steps, and thus the obtained graph similarity may not be the optimal one for segmentation and this may lead to suboptimal results. In this paper, we propose a novel framework, Graph-Without-Cut (GWC), for learning the similarity graph and image segmentations simultaneously. GWC learns the similarity graph by assigning adaptive and optimal neighbors to each vertex based on the spatial and visual information. Meanwhile, the new rank constraint is imposed to the Laplacian matrix of the similarity graph, such that the connected components in the resulted similarity graph are exactly equal to the region number. Extensive empirical results on three public data sets (i.e, BSDS300, BSDS500 and MSRC) show that our unsupervised GWC achieves state-of-the-art performance compared with supervised and unsupervised image segmentation approaches.