Wang, Changhu
Time-Aware Neighbor Sampling for Temporal Graph Networks
Wang, Yiwei, Cai, Yujun, Liang, Yuxuan, Ding, Henghui, Wang, Changhu, Hooi, Bryan
We present a new neighbor sampling method on temporal graphs. In a temporal graph, predicting different nodes' time-varying properties can require the receptive neighborhood of various temporal scales. In this work, we propose the TNS (Time-aware Neighbor Sampling) method: TNS learns from temporal information to provide an adaptive receptive neighborhood for every node at any time. Learning how to sample neighbors is non-trivial, since the neighbor indices in time order are discrete and not differentiable. To address this challenge, we transform neighbor indices from discrete values to continuous ones by interpolating the neighbors' messages. TNS can be flexibly incorporated into popular temporal graph networks to improve their effectiveness without increasing their time complexity. TNS can be trained in an end-to-end manner. It needs no extra supervision and is automatically and implicitly guided to sample the neighbors that are most beneficial for prediction. Empirical results on multiple standard datasets show that TNS yields significant gains on edge prediction and node classification.
Objects in Semantic Topology
Yang, Shuo, Sun, Peize, Jiang, Yi, Xia, Xiaobo, Zhang, Ruiheng, Yuan, Zehuan, Wang, Changhu, Luo, Ping, Xu, Min
A more realistic object detection paradigm, Open-World Object Detection, has arisen increasing research interests in the community recently. A qualified openworld object detector can not only identify objects of known categories, but also discover unknown objects, and incrementally learn to categorize them when their annotations progressively arrive. Previous works rely on independent modules to recognize unknown categories and perform incremental learning, respectively. In this paper, we provide a unified perspective: Semantic Topology. During the life-long learning of an open-world object detector, all object instances from the same category are assigned to their corresponding pre-defined node in the semantic topology, including the'unknown' category. This constraint builds up discriminative feature representations and consistent relationships among objects, thus enabling the detector to distinguish unknown objects out of the known categories, as well as making learned features of known objects undistorted when learning new categories incrementally. Extensive experiments demonstrate that semantic topology, either randomly-generated or derived from a well-trained language model, could outperform the current state-of-the-art open-world object detectors by a large margin, e.g., the absolute open-set error is reduced from 7832 to 2546, exhibiting the inherent superiority of semantic topology on open-world object detection. Object detection, which aims at localizing and classifying objects in a given scene (Felzenszwalb et al., 2010; Everingham et al., 2010; Lin et al., 2014), is one of the most iconic abilities of biological intelligence. It was introduced to the artificial intelligence field to endow an intelligence agent with the ability of scene understanding.
Unifying Nonlocal Blocks for Neural Networks
Zhu, Lei, She, Qi, Li, Duo, Lu, Yanye, Kang, Xuejing, Hu, Jie, Wang, Changhu
The nonlocal-based blocks are designed for capturing long-range spatial-temporal dependencies in computer vision tasks. Although having shown excellent performance, they still lack the mechanism to encode the rich, structured information among elements in an image or video. In this paper, to theoretically analyze the property of these nonlocal-based blocks, we provide a new perspective to interpret them, where we view them as a set of graph filters generated on a fully-connected graph. Specifically, when choosing the Chebyshev graph filter, a unified formulation can be derived for explaining and analyzing the existing nonlocal-based blocks (e.g., nonlocal block, nonlocal stage, double attention block). Furthermore, by concerning the property of spectral, we propose an efficient and robust spectral nonlocal block, which can be more robust and flexible to catch long-range dependencies when inserted into deep neural networks than the existing nonlocal blocks. Experimental results demonstrate the clear-cut improvements and practical applicabilities of our method on image classification, action recognition, semantic segmentation, and person re-identification tasks.
How Incomplete is Contrastive Learning? An Inter-intra Variant Dual Representation Method for Self-supervised Video Recognition
Zhang, Lin, She, Qi, Shen, Zhengyang, Wang, Changhu
Contrastive learning applied to self-supervised representation learning has seen a resurgence in deep models. In this paper, we find that existing contrastive learning based solutions for self-supervised video recognition focus on inter-variance encoding but ignore the intra-variance existing in clips within the same video. We thus propose to learn dual representations for each clip which (i) encode intra-variance through a shuffle-rank pretext task; (ii) encode inter-variance through a temporal coherent contrastive loss. Experiment results show that our method plays an essential role in balancing inter and intra variances and brings consistent performance gains on multiple backbones and contrastive learning frameworks. Integrated with SimCLR and pretrained on Kinetics-400, our method achieves 82.0% and 51.2% downstream classification accuracy on UCF101 and HMDB51 test sets respectively and 46.1% video retrieval accuracy on UCF101, outperforming both pretext-task based and contrastive learning based counterparts.
Incorporating Vision Bias into Click Models for Image-oriented Search Engine
Xu, Ningxin, Yang, Cheng, Zhu, Yixin, Hu, Xiaowei, Wang, Changhu
Most typical click models assume that the probability of a document to be examined by users only depends on position, such as PBM and UBM. It works well in various kinds of search engines. However, in a search engine where massive candidate documents display images as responses to the query, the examination probability should not only depend on position. The visual appearance of an image-oriented document also plays an important role in its opportunity to be examined. In this paper, we assume that vision bias exists in an image-oriented search engine as another crucial factor affecting the examination probability aside from position. Specifically, we apply this assumption to classical click models and propose an extended model, to better capture the examination probabilities of documents. We use regression-based EM algorithm to predict the vision bias given the visual features extracted from candidate documents. Empirically, we evaluate our model on a dataset developed from a real-world online image-oriented search engine, and demonstrate that our proposed model can achieve significant improvements over its baseline model in data fitness and sparsity handling.
Slimmable Generative Adversarial Networks
Hou, Liang, Yuan, Zehuan, Huang, Lei, Shen, Huawei, Cheng, Xueqi, Wang, Changhu
Generative adversarial networks (GANs) have achieved remarkable progress in recent years, but the continuously growing scale of models make them challenging to deploy widely in practical applications. In particular, for real-time generation tasks, different devices require generators of different sizes due to varying computing power. In this paper, we introduce slimmable GANs (SlimGANs), which can flexibly switch the width of the generator to accommodate various quality-efficiency trade-offs at runtime. Specifically, we leverage multiple discriminators that share partial parameters to train the slimmable generator. To facilitate the consistency between generators of different widths, we present a stepwise inplace distillation technique that encourages narrow generators to learn from wide ones. As for class-conditional generation, we propose a sliceable conditional batch normalization that incorporates the label information into different widths. Our methods are validated, both quantitatively and qualitatively, by extensive experiments and a detailed ablation study.
Building Effective Representations for Sketch Recognition
Guo, Jun (Sun Yat-sen University) | Wang, Changhu (Microsoft Research) | Chao, Hongyang (Sun Yat-sen University)
As the popularity of touch-screen devices, understanding a user's hand-drawn sketch has become an increasingly important research topic in artificial intelligence and computer vision. However, different from natural images, the hand-drawn sketches are often highly abstract, with sparse visual information and large intra-class variance, making the problem more challenging. In this work, we study how to build effective representations for sketch recognition. First, to capture saliency patterns of different scales and spatial arrangements, a Gabor-based low-level representation is proposed. Then, based on this representation, to discovery more complex patterns in a sketch, a Hybrid Multilayer Sparse Coding (HMSC) model is proposed to learn mid-level representations. An improved dictionary learning algorithm is also leveraged in HMSC to reduce overfitting to common but trivial patterns. Extensive experiments show that the proposed representations are highly discriminative and lead to large improvements over the state of the arts.
Sketch Recognition with Natural Correction and Editing
Wu, Jie (Shanghai Jiao Tong University) | Wang, Changhu (Microsoft Research, Beijing) | Zhang, Liqing (Shanghai Jiao Tong University) | Rui, Yong (Microsoft Research, Beijing)
In this paper, we target at the problem of sketch recognition. We systematically study how to incorporate users' correction and editing into isolated and full sketch recognition. This is a natural and necessary interaction in real systems such as Visio where very similar shapes exist. First, a novel algorithm is proposed to mine the prior shape knowledge for three editing modes. Second, to differentiate visually similar shapes, a novel symbol recognition algorithm is introduced by leveraging the learnt shape knowledge. Then, a novel editing detection algorithm is proposed to facilitate symbol recognition. Furthermore, both of the symbol recognizer and the editing detector are systematically incorporated into the full sketch recognition. Finally, based on the proposed algorithms, a real-time sketch recognition system is built to recognize hand-drawn flowcharts and diagrams with flexible interactions. Extensive experiments show the effectiveness of the proposed algorithms.