Goto

Collaborating Authors

 Wang, Tao


GM-PLL: Graph Matching based Partial Label Learning

arXiv.org Machine Learning

Partial Label Learning (PLL) aims to learn from the data where each training example is associated with a set of candidate labels, among which only one is correct. The key to deal with such problem is to disambiguate the candidate label sets and obtain the correct assignments between instances and their candidate labels. In this paper, we interpret such assignments as instance-to-label matchings, and reformulate the task of PLL as a matching selection problem. To model such problem, we propose a novel Graph Matching based Partial Label Learning (GM-PLL) framework, where Graph Matching (GM) scheme is incorporated owing to its excellent capability of exploiting the instance and label relationship. Meanwhile, since conventional one-to-one GM algorithm does not satisfy the constraint of PLL problem that multiple instances may correspond to the same label, we extend a traditional one-to-one probabilistic matching algorithm to the many-to-one constraint, and make the proposed framework accommodate to the PLL problem. Moreover, we also propose a relaxed matching prediction model, which can improve the prediction accuracy via GM strategy. Extensive experiments on both artificial and real-world data sets demonstrate that the proposed method can achieve superior or comparable performance against the state-of-the-art methods.


Image Classification at Supercomputer Scale

arXiv.org Machine Learning

Deep learning is extremely computationally intensive, and hardware vendors have responded by building faster accelerators in large clusters. Training deep learning models at petaFLOPS scale requires overcoming both algorithmic and systems software challenges. In this paper, we discuss three systems-related optimizations: (1) distributed batch normalization to control per-replica batch sizes, (2) input pipeline optimizations to sustain model throughput, and (3) 2-D torus all-reduce to speed up gradient summation. We combine these optimizations to train ResNet-50 on ImageNet to 76.3% accuracy in 2.2 minutes on a 1024-chip TPU v3 Pod with a training throughput of over 1.05 million images/second and no accuracy drop.


A Self-paced Regularization Framework for Partial-Label Learning

arXiv.org Artificial Intelligence

Partial label learning (PLL) aims to solve the problem where each training instance is associated with a set of candidate labels, one of which is the correct label. Most PLL algorithms try to disambiguate the candidate label set, by either simply treating each candidate label equally or iteratively identifying the true label. Nonetheless, existing algorithms usually treat all labels and instances equally, and the complexities of both labels and instances are not taken into consideration during the learning stage. Inspired by the successful application of self-paced learning strategy in machine learning field, we integrate the self-paced regime into the partial label learning framework and propose a novel Self-Paced Partial-Label Learning (SP-PLL) algorithm, which could control the learning process to alleviate the problem by ranking the priorities of the training examples together with their candidate labels during each learning iteration. Extensive experiments and comparisons with other baseline methods demonstrate the effectiveness and robustness of the proposed method.


Path Following with Adaptive Path Estimation for Graph Matching

AAAI Conferences

Graph matching plays an important role in many fields in computer vision. It is a well-known general NP-hard problem and has been investigated for decades. Among the large amount of algorithms for graph matching, the algorithms utilizing the path following strategy exhibited state-of-art performances. However, the main drawback of this category of algorithms lies in their high computational burden. In this paper, we propose a novel path following strategy for graph matching aiming to improve its computation efficiency. We first propose a path estimation method to reduce the computational cost at each iteration, and subsequently a method of adaptive step length to accelerate the convergence. The proposed approach is able to be integrated into all the algorithms that utilize the path following strategy. To validate our approach, we compare our approach with several recently proposed graph matching algorithms on three benchmark image datasets. Experimental results show that, our approach improves significantly the computation efficiency of the original algorithms, and offers similar or better matching results.


Convolutional Neural Networks over Tree Structures for Programming Language Processing

AAAI Conferences

Programming language processing (similar to natural language processing) is a hot research topic in the field of software engineering; it has also aroused growing interest in the artificial intelligence community. However, different from a natural language sentence, a program contains rich, explicit, and complicated structural information. Hence, traditional NLP models may be inappropriate for programs. In this paper, we propose a novel tree-based convolutional neural network (TBCNN) for programming language processing, in which a convolution kernel is designed over programs' abstract syntax trees to capture structural information. TBCNN is a generic architecture for programming language processing; our experiments show its effectiveness in two different program analysis tasks: classifying programs according to functionality, and detecting code snippets of certain patterns. TBCNN outperforms baseline methods, including several neural models for NLP.


Stable Dual Dynamic Programming

Neural Information Processing Systems

Recently, we have introduced a novel approach to dynamic programming and reinforcement learning that is based on maintaining explicit representations of stationary distributions instead of value functions. In this paper, we investigate the convergence properties of these dual algorithms both theoretically and empirically, and show how they can be scaled up by incorporating function approximation.


Stable Dual Dynamic Programming

Neural Information Processing Systems

Recently, we have introduced a novel approach to dynamic programming and reinforcement learningthat is based on maintaining explicit representations of stationary distributions instead of value functions. In this paper, we investigate the convergence properties of these dual algorithms both theoretically and empirically, and show how they can be scaled up by incorporating function approximation.