Learning Vector-valued Functions with Local Rademacher Complexity

Sep-11-2019–arXiv.org Machine Learning

Abstract--We consider a general family of problems of which the output space admits vector-valued structure, covering a broad family of important domains, e.g. By using local Rademacher complexity and unlabeled data, we derived novel data-dependent excess risk bounds for vector-valued functions in both linear space and kernel space. The proposed bounds are much sharper than existing bounds and can be applied into specific vector-valued tasks in terms of different hypotheses sets and loss functions. Theoretical analysis motivates us to devise a unified learning framework for vector-valued functions based which is solved by proximal gradient descent on the primal, achieving a much better tradeoff between accuracy and efficiency . Empirical results on several benchmark datasets show that the proposed algorithm outperforms compared methods significantly, which coincides with our theoretical analysis. Index Terms --Statistical Learning Theory, Local Rademacher Complexity, Vector-Valued Functions, Semi-Supervised Learning.null 1 I NTRODUCTION I N the supervised learning, learning vector-valued functions is to learn a predict model from training data with vector-valued labels instead of scalar-valued labels, including a wide range of important tasks, such as multi-task learning [1], [2], [3], multi-label learning [4], [5], multi-class classification [6], [7], ranking [8], [9] and so on. The first unified learning framework for vector-valued functions in reproducing kernel Hilbert space (RKHS) was proposed in [10]. Then, the unified framework was further developed in [11], [12] and extended to semi-supervised learning by manifold regularization [13], [14], [15]. While current research about vector-valued functions mainly focus on the algorithmic front, we study vector-valued functions from both theoretical perspective and algorithmic perspective. In this paper, we integrate our previous works in multi-classification [7], [16] and generalize the idea into vector-valued settings. W e make the paper a significant improvement based on those two conference papers, with clearer and more general theoretical results, additional technical details,and a unified learning framework. Wang are with Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China. As the most common and successful data-dependent tool, Rademacher complexity was firstly used to analysis generalization performance of multi-class tasks in [22] and further studied in [23], [24]. The convergence rate of Rademacher complexity based error bounds are usually O ( K/ n), where K and n are the number of classes and the size of labeled samples, respectively .

artificial intelligence, machine learning, rademacher complexity, (18 more...)

arXiv.org Machine Learning

Sep-11-2019

arXiv.org PDF

Add feedback

Country:
- Asia > China > Beijing > Beijing (0.24)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found