Goto

Collaborating Authors

 He, Tao


Exploring & Exploiting High-Order Graph Structure for Sparse Knowledge Graph Completion

arXiv.org Artificial Intelligence

Sparse knowledge graph (KG) scenarios pose a challenge for previous Knowledge Graph Completion (KGC) methods, that is, the completion performance decreases rapidly with the increase of graph sparsity. This problem is also exacerbated because of the widespread existence of sparse KGs in practical applications. To alleviate this challenge, we present a novel framework, LR-GCN, that is able to automatically capture valuable long-range dependency among entities to supplement insufficient structure features and distill logical reasoning knowledge for sparse KGC. The proposed approach comprises two main components: a GNN-based predictor and a reasoning path distiller. The reasoning path distiller explores high-order graph structures such as reasoning paths and encodes them as rich-semantic edges, explicitly compositing long-range dependencies into the predictor. This step also plays an essential role in densifying KGs, effectively alleviating the sparse issue. Furthermore, the path distiller further distills logical reasoning knowledge from these mined reasoning paths into the predictor. These two components are jointly optimized using a well-designed variational EM algorithm. Extensive experiments and analyses on four sparse benchmarks demonstrate the effectiveness of our proposed method.


Detecting Rotated Objects as Gaussian Distributions and Its 3-D Generalization

arXiv.org Artificial Intelligence

Existing detection methods commonly use a parameterized bounding box (BBox) to model and detect (horizontal) objects and an additional rotation angle parameter is used for rotated objects. We argue that such a mechanism has fundamental limitations in building an effective regression loss for rotation detection, especially for high-precision detection with high IoU (e.g. 0.75). Instead, we propose to model the rotated objects as Gaussian distributions. A direct advantage is that our new regression loss regarding the distance between two Gaussians e.g. Kullback-Leibler Divergence (KLD), can well align the actual detection performance metric, which is not well addressed in existing methods. Moreover, the two bottlenecks i.e. boundary discontinuity and square-like problem also disappear. We also propose an efficient Gaussian metric-based label assignment strategy to further boost the performance. Interestingly, by analyzing the BBox parameters' gradients under our Gaussian-based KLD loss, we show that these parameters are dynamically updated with interpretable physical meaning, which help explain the effectiveness of our approach, especially for high-precision detection. We extend our approach from 2-D to 3-D with a tailored algorithm design to handle the heading estimation, and experimental results on twelve public datasets (2-D/3-D, aerial/text/face images) with various base detectors show its superiority.


The Multi-Modal Video Reasoning and Analyzing Competition

arXiv.org Artificial Intelligence

In this paper, we introduce the Multi-Modal Video Reasoning and Analyzing Competition (MMVRAC) workshop in conjunction with ICCV 2021. This competition is composed of four different tracks, namely, video question answering, skeleton-based action recognition, fisheye video-based action recognition, and person re-identification, which are based on two datasets: SUTD-TrafficQA and UAV-Human. We summarize the top-performing methods submitted by the participants in this competition and show their results achieved in the competition.


A Computer-Aided Diagnosis System for Breast Pathology: A Deep Learning Approach with Model Interpretability from Pathological Perspective

arXiv.org Artificial Intelligence

Objective: We develop a computer-aided diagnosis (CAD) system using deep learning approaches for lesion detection and classification on whole-slide images (WSIs) with breast cancer. The deep features being distinguishing in classification from the convolutional neural networks (CNN) are demonstrated in this study to provide comprehensive interpretability for the proposed CAD system using pathological knowledge. Methods: In the experiment, a total of 186 slides of WSIs were collected and classified into three categories: Non-Carcinoma, Ductal Carcinoma in Situ (DCIS), and Invasive Ductal Carcinoma (IDC). Instead of conducting pixel-wise classification into three classes directly, we designed a hierarchical framework with the multi-view scheme that performs lesion detection for region proposal at higher magnification first and then conducts lesion classification at lower magnification for each detected lesion. Results: The slide-level accuracy rate for three-category classification reaches 90.8% (99/109) through 5-fold cross-validation and achieves 94.8% (73/77) on the testing set. The experimental results show that the morphological characteristics and co-occurrence properties learned by the deep learning models for lesion classification are accordant with the clinical rules in diagnosis. Conclusion: The pathological interpretability of the deep features not only enhances the reliability of the proposed CAD system to gain acceptance from medical specialists, but also facilitates the development of deep learning frameworks for various tasks in pathology. Significance: This paper presents a CAD system for pathological image analysis, which fills the clinical requirements and can be accepted by medical specialists with providing its interpretability from the pathological perspective.


Deep Region Hashing for Generic Instance Search from Images

AAAI Conferences

Instance Search (INS) is a fundamental problem for many applications, while it is more challenging comparing to traditional image search since the relevancy is defined at the instance level. Existing works have demonstrated the success of many complex ensemble systems that are typically conducted by firstly generating object proposals, and then extracting handcrafted and/or CNN features of each proposal for matching. However, object bounding box proposals and feature extraction are often conducted in two separated steps, thus the effectiveness of these methods collapses. Also, due to the large amount of generated proposals, matching speed becomes the bottleneck that limits its application to large-scale datasets. To tackle these issues, in this paper we propose an effective and efficient Deep Region Hashing (DRH) approach for large-scale INS using an image patch as the query. Specifically, DRH is an end-to-end deep neural network which consists of object proposal, feature extraction, and hash code generation. DRH shares full-image convolutional feature map with the region proposal network, thus enabling nearly cost-free region proposals. Also, each high-dimensional, real-valued region features are mapped onto a low-dimensional, compact binary codes for the efficient object region level matching on large-scale dataset. Experimental results on four datasets show that our DRH can achieve even better performance than the state-of-the-arts in terms of mAP, while the efficiency is improved by nearly 100 times.


Binary Generative Adversarial Networks for Image Retrieval

AAAI Conferences

The most striking successes in image retrieval using deep hashing have mostly involved discriminative models, which require labels. In this paper, we use binary generative adversarial networks (BGAN) to embed images to binary codes in an unsupervised way. By restricting the input noise variable of generative adversarial networks (GAN) to be binary and conditioned on the features of each input image, BGAN can simultaneously learn a binary representation per image, and generate an image plausibly similar to the original one. In the proposed framework, we address two main problems: 1) how to directly generate binary codes without relaxation? 2) how to equip the binary representation with the ability of accurate image retrieval? We resolve these problems by proposing new sign-activation strategy and a loss function steering the learning process, which consists of new models for adversarial loss, a content loss, and a neighborhood structure loss. Experimental results on standard datasets (CIFAR-10, NUSWIDE, and Flickr) demonstrate that our BGAN significantly outperforms existing hashing methods by up to 107% in terms of mAP (See Table 2).


Learning to Appreciate the Aesthetic Effects of Clothing

AAAI Conferences

How do people describe clothing? The words like “formal”or "casual" are usually used. However, recent works often focus on recognizing or extracting visual features (e.g., sleeve length, color distribution and clothing pattern) from clothing images accurately. How can we bridge the gap between the visual features and the aesthetic words? In this paper, we formulate this task to a novel three-level framework: visual features(VF) - image-scale space (ISS) - aesthetic words space(AWS). Leveraging the art-field image-scale space served as an intermediate layer, we first propose a Stacked Denoising Autoencoder Guided by CorrelativeLabels (SDAE-GCL) to map the visual features to the image-scale space; and then according to the semantic distances computed byWordNet::Similarity, we map the most often used aesthetic words in online clothing shops to the image-scale space too. Employing upper body menswear images downloaded from several global online clothing shops as experimental data, the results indicate that the proposed three-level framework can help to capture the subtle relationship between visual features and aesthetic words better compared to several baselines. To demonstrate that our three-level framework and its implementation methods are universally applicable, we finally present some interesting analyses on the fashion trend of menswear in the last 10 years.