Institute of Information Engineering, Chinese Academy of Sciences
Audio Visual Attribute Discovery for Fine-Grained Object Recognition
Zhang, Hua (Institute of Information Engineering, Chinese Academy of Sciences) | Cao, Xiaochun (Institute of Information Engineering, Chinese Academy of Sciences) | Wang, Rui (Institute of Information Engineering, Chinese Academy of Sciences)
Current progresses on fine-grained recognition are mainly focus on learning the discriminative feature representation via introducing the visual supervisions e.g. part labels. However, it is time-consuming and needs the professional knowledge to obtain the accuracy annotations. Different from these existing methods based on the visual supervisions, in this paper, we introduce a novel feature named audio visual attributes via discovering the correlations between the visual and audio representations. Specifically, our unified framework is training with video-level category label, which consists of two important modules, the encoder module and the attribute discovery module, to encode the image and audio into vectors and learn the correlations between audio and images, respectively. On the encoder module, we present two types of feed forward convolutional neural network for the image and audio modalities. While an attention driven framework based on recurrent neural network is developed to generate the audio visual attribute representation. Thus, our proposed architecture can be implemented end-to-end in the step of inference. We exploit our models for the problem of fine-grained bird recognition on the CUB200-211 benchmark. The experimental results demonstrate that with the help of audio visual attribute, we achieve the superior or comparable performance to that of strongly supervised approaches on the bird recognition.
Cross-Domain Human Parsing via Adversarial Feature and Label Adaptation
Liu, Si (Institute of Information Engineering, Chinese Academy of Sciences) | Sun, Yao (Institute of Information Engineering, Chinese Academy of Sciences) | Zhu, Defa (Institute of Information Engineering, Chinese Academy of Sciences) | Ren, Guanghui (Institute of Information Engineering, Chinese Academy of Sciences) | Chen, Yu (JD.com) | Feng, Jiashi (National University of Singapore) | Han, Jizhong (Institute of Information Engineering, Chinese Academy of Sciences)
Human parsing has been extensively studied recently due to its wide applications in many important scenarios. Mainstream fashion parsing models (i.e., parsers) focus on parsing the high-resolution and clean images. However, directly applying the parsers trained on benchmarks of high-quality samples to a particular application scenario in the wild, e.g., a canteen, airport or workplace, often gives non-satisfactory performance due to domain shift. In this paper, we explore a new and challenging cross-domain human parsing problem: taking the benchmark dataset with extensive pixel-wise labeling as the source domain, how to obtain a satisfactory parser on a new target domain without requiring any additional manual labeling? To this end, we propose a novel and efficient cross-domain human parsing model to bridge the cross-domain differences in terms of visual appearance and environment conditions and fully exploit commonalities across domains. Our proposed model explicitly learns a feature compensation network, which is specialized for mitigating the cross-domain differences. A discriminative feature adversarial network is introduced to supervise the feature compensation to effectively reduces the discrepancy between feature distributions of two domains. Besides, our proposed model also introduces a structured label adversarial network to guide the parsing results of the target domain to follow the high-order relationships of the structured labels shared across domains. The proposed framework is end-to-end trainable, practical and scalable in real applications. Extensive experiments are conducted where LIP dataset is the source domain and 4 different datasets including surveillance videos, movies and runway shows without any annotations, are evaluated as target domains. The results consistently confirm data efficiency and performance advantages of the proposed method for the challenging cross-domain human parsing problem.
Multi-Facet Network Embedding: Beyond the General Solution of Detection and Representation
Yang, Liang (Hebei University of Technology) | Guo, Yuanfang (Institute of Information Engineering, Chinese Academy of Sciences) | Cao, Xiaochun (Institute of Information Engineering, Chinese Academy of Sciences)
In network analysis, community detection and network embedding are two important topics. Community detection tends to obtain the most noticeable partition, while network embedding aims at seeking node representations which contains as many diverse properties as possible. We observe that the current community detection and network embedding problems are being resolved by a general solution, i.e., "maximizing the consistency between similar nodes while maximizing the distance between the dissimilar nodes." This general solution only exploits the most noticeable structure (facet) of the network, which effectively satisfies the demands of the community detection. Unfortunately, most of the specific embedding algorithms, which are developed from the general solution, cannot achieve the goal of network embedding by exploring only one facet of the network. To improve the general solution for better modeling the real network, we propose a novel network embedding method, Multi-facet Network Embedding (MNE), to capture the multiple facets of the network. MNE learns multiple embeddings simultaneously, with the Hilbert Schmidt Independence Criterion (HSIC) being the a diversity constraint. To efficiently solve the optimization problem, we propose a Binary HSIC with linear complexity and solve the MNE objective function by adopting the Augmented Lagrange Multiplier (ALM) method. The overall complexity is linear with the scale of the network. Extensive results demonstrate that MNE gives efficient performances and outperforms the state-of-the-art network embedding methods.
Knowledge Graph Embedding With Iterative Guidance From Soft Rules
Guo, Shu (Institute of Information Engineering, Chinese Academy of Sciences) | Wang, Quan (Institute of Information Engineering, Chinese Academy of Sciences) | Wang, Lihong (National Computer Network Emergency Response Technical Team &) | Wang, Bin (Coordination Center of China) | Guo, Li (Institute of Information Engineering, Chinese Academy of Sciences)
Embedding knowledge graphs (KGs) into continuous vector spaces is a focus of current research. Combining such an embedding model with logic rules has recently attracted increasing attention. Most previous attempts made a one-time injection of logic rules, ignoring the interactive nature between embedding learning and logical inference. And they focused only on hard rules, which always hold with no exception and usually require extensive manual effort to create or validate. In this paper, we propose Rule-Guided Embedding (RUGE), a novel paradigm of KG embedding with iterative guidance from soft rules. RUGE enables an embedding model to learn simultaneously from 1) labeled triples that have been directly observed in a given KG, 2) unlabeled triples whose labels are going to be predicted iteratively, and 3) soft rules with various confidence levels extracted automatically from the KG. In the learning process, RUGE iteratively queries rules to obtain soft labels for unlabeled triples, and integrates such newly labeled triples to update the embedding model. Through this iterative procedure, knowledge embodied in logic rules may be better transferred into the learned embeddings. We evaluate RUGE in link prediction on Freebase and YAGO. Experimental results show that: 1) with rule knowledge injected iteratively, RUGE achieves significant and consistent improvements over state-of-the-art baselines; and 2) despite their uncertainties, automatically extracted soft rules are highly beneficial to KG embedding, even those with moderate confidence levels. The code and data used for this paper can be obtained from https://github.com/iieir-km/RUGE.
Stochastic Non-Convex Ordinal Embedding With Stabilized Barzilai-Borwein Step Size
Ma, Ke (Institute of Information Engineering, Chinese Academy of Sciences) | Zeng, Jinshan (School of Cyber Security, University of Chinese Academy of Sciences) | Xiong, Jiechao (School of Computer Information Engineering, Jiangxi Normal University) | Xu, Qianqian (Hong Kong University of Science and Technology) | Cao, Xiaochun (Tencent AI Lab) | Liu, Wei (Institute of Information Engineering, Chinese Academy of Sciences) | Yao, Yuan (Institute of Information Engineering, Chinese Academy of Sciences)
Learning representation from relative similarity comparisons, often called ordinal embedding, gains rising attention in recent years. Most of the existing methods are batch methods designed mainly based on the convex optimization, say, the projected gradient descent method. However, they are generally time-consuming due to that the singular value decomposition (SVD) is commonly adopted during the update, especially when the data size is very large. To overcome this challenge, we propose a stochastic algorithm called SVRG-SBB, which has the following features: (a) SVD-free via dropping convexity, with good scalability by the use of stochastic algorithm, i.e., stochastic variance reduced gradient (SVRG), and (b) adaptive step size choice via introducing a new stabilized Barzilai-Borwein (SBB) method as the original version for convex problems might fail for the considered stochastic non-convex optimization problem. Moreover, we show that the proposed algorithm converges to a stationary point at a rate O (1/ T ) in our setting, where T is the number of total iterations. Numerous simulations and real-world data experiments are conducted to show the effectiveness of the proposed algorithm via comparing with the state-of-the-art methods, particularly, much lower computational cost with good prediction performance.
Generalization Analysis for Ranking Using Integral Operator
Liu, Yong (Institute of Information Engineering, Chinese Academy of Sciences) | Liao, Shizhong (Tianjin University) | Lin, Hailun (Institute of Information Engineering, Chinese Academy of Sciences) | Yue, Yinliang (Institute of Information Engineering, Chinese Academy of Sciences) | Wang, Weiping (Institute of Information Engineering, Chinese Academy of Sciences)
The study on generalization performance of ranking algorithms is one of the fundamental issues in ranking learning theory. Although several generalization bounds have been proposed based on different measures, the convergence rates of the existing bounds are usually at most O (√1/ n ), where n is the size of data set. In this paper, we derive novel generalization bounds for the regularized ranking in reproducing kernel Hilbert space via integral operator of kernel function. We prove that the rates of our bounds are much faster than (√1/ n ). Specifically, we first introduce a notion of local Rademacher complexity for ranking, called local ranking Rademacher complexity, which is used to measure the complexity of the space of loss functions of the ranking. Then, we use the local ranking Rademacher complexity to obtain a basic generalization bound. Finally, we establish the relationship between the local Rademacher complexity and the eigenvalues of integral operator, and further derive sharp generalization bounds of faster convergence rate.
Infinite Kernel Learning: Generalization Bounds and Algorithms
Liu, Yong (Institute of Information Engineering, Chinese Academy of Sciences) | Liao, Shizhong (Tianjin University) | Lin, Hailun (Institute of Information Engineering, Chinese Academy of Sciences) | Yue, Yinliang (Institute of Information Engineering, Chinese Academy of Sciences) | Wang, Weiping (Institute of Information Engineering, Chinese Academy of Sciences)
Kernel learning is a fundamental problem both in recent research and application of kernel methods. Existing kernel learning methods commonly use some measures of generalization errors to learn the optimal kernel in a convex (or conic) combination of prescribed basic kernels. However, the generalization bounds derived by these measures usually have slow convergence rates, and the basic kernels are finite and should be specified in advance. In this paper, we propose a new kernel learning method based on a novel measure of generalization error, called principal eigenvalue proportion (PEP), which can learn the optimal kernel with sharp generalization bounds over the convex hull of a possibly infinite set of basic kernels. We first derive sharp generalization bounds based on the PEP measure. Then we design two kernel learning algorithms for finite kernels and infinite kernels respectively, in which the derived sharp generalization bounds are exploited to guarantee faster convergence rates, moreover, basic kernels can be learned automatically for infinite kernel learning instead of being prescribed in advance. Theoretical analysis and empirical results demonstrate that the proposed kernel learning method outperforms the state-of-the-art kernel learning methods.
On the Minimum Differentially Resolving Set Problem for Diffusion Source Inference in Networks
Zhou, Chuan (Institute of Information Engineering, Chinese Academy of Sciences) | Lu, Wei-Xue (Academy of Mathematics and Systems Science, Chinese Academy of Sciences) | Zhang, Peng (University of Technology, Sydney) | Wu, Jia (Centre for Quantum Computation &) | Hu, Yue (Intelligent Systems, University of Technology, Sydney) | Guo, Li (Institute of Information Engineering, Chinese Academy of Sciences)
In this paper we theoretically study the minimum Differentially Resolving Set (DRS) problem derived from the classical sensor placement optimization problem in network source locating. A DRS of a graph G = ( V, E ) is defined as a subset S ⊆ V where any two elements in V can be distinguished by their different differential characteristic sets defined on S. The minimum DRS problem aims to find a DRS S in the graph G with minimum total weight Σ v∈S w ( v ). In this paper we establish a group of Integer Linear Programming (ILP) models as the solution. By the weighted set cover theory, we propose an approximation algorithm with the Θ(ln n ) approximability for the minimum DRS problem on general graphs, where n is the graph size.