crammer
Transfer Learning via Minimizing the Performance Gap Between Domains
Boyu Wang, Jorge Mendez, Mingbo Cai, Eric Eaton
To address this issue, we present the first analysis for instance weighting transfer learning that considers the presence of labeled target examples. The contribution of our work is two-fold.1. We address the question ofhow to measure the divergence between two domains given label informationforthetargetdomain.
c1285fcadc52c0d3dc8813fc2c2e2b2a-AuthorFeedback.pdf
Our4 results certify that there exists an optimal linear pre-conditioner for quadratically convex constraint sets. As such,5 adaptivegradient methods can be minimax (rate) optimal. Inonline algorithms, the common practice [4,5,6,7,2]6 is to measure regret with respect to the "best" post-hoc regularizer (i.e. In this setting, the constraint set corresponds to the set of classifiers of interest, and the geometry of the gradients34 corresponds tothegeometry ofthefeatures (orcovariates). A generalized online mirror descent with applications to classification and52 regression.
Learning Sparse Confidence-Weighted Classifier on Very High Dimensional Data
Tan, Mingkui (University of Adelaide) | Yan, Yan (University of Technology Sydney) | Wang, Li (University of Illinois at Chicago) | Hengel, Anton Van Den (University of Adelaide) | Tsang, Ivor W. (University of Technology Sydney) | Shi, Qinfeng (Javen) (University of Adelaide)
Confidence-weighted (CW) learning is a successful online learning paradigm which maintains a Gaussian distribution over classifier weights and adopts a covariancematrix to represent the uncertainties of the weight vectors. However, there are two deficiencies in existing full CW learning paradigms, these being the sensitivity to irrelevant features, and the poor scalability to high dimensional data due to the maintenance of the covariance structure. In this paper, we begin by presenting an online-batch CW learning scheme, and then present a novel paradigm to learn sparse CW classifiers. The proposed paradigm essentially identifies feature groups and naturally builds a block diagonal covariance structure, making it very suitable for CW learning over very high-dimensional data.Extensive experimental results demonstrate the superior performance of the proposed methods over state-of-the-art counterparts on classification and feature selection tasks.
Finding One's Best Crowd: Online Learning By Exploiting Source Similarity
Liu, Yang (University of Michigan, Ann Arbor) | Liu, Mingyan (University of Michigan, Ann Arbor)
We consider an online learning problem (classification or prediction) involving disparate sources of sequentially arriving data, whereby a user over time learns the best set of data sources to use in constructing the classifier by exploiting their similarity. We first show that, when (1) the similarity information among data sources is known, and (2) data from different sources can be acquired without cost, then a judicious selection of data from different sources can effectively enlarge the training sample size compared to using a single data source, thereby improving the rate and performance of learning; this is achieved by bounding the classification error of the resulting classifier. We then relax assumption (1) and characterize the loss in learning performance when the similarity information must also be acquired through repeated sampling. We further relax both (1) and (2) and present a cost-efficient algorithm that identifies a best crowd from a potentially large set of data sources in terms of both classifier performance and data acquisition cost. This problem has various applications, including online prediction systems with time series data of various forms, such as financial markets, advertisement and network measurement.
Risk Minimization in the Presence of Label Noise
Gao, Wei (Nanjing University and Collaborative Innovation Center of Novel Software Technology and Industrialization) | Wang, Lu (Nanjing University and Collaborative Innovation Center of Novel Software Technology and Industrialization) | li, Yu-Feng (Nanjing University and Collaborative Innovation Center of Novel Software Technology and Industrialization) | Zhou, Zhi-Hua (Nanjing University and Collaborative Innovation Center of Novel Software Technology and Industrialization)
Matrix concentration inequalities have attracted much attention in diverse applications such as linear algebra, statistical estimation, combinatorial optimization, etc. In this paper, we present new Bernstein concentration inequalities depending only on the first moments of random matrices, whereas previous Bernstein inequalities are heavily relevant to the first and second moments. Based on those results, we analyze the empirical risk minimization in the presence of label noise. We find that many popular losses used in risk minimization can be decomposed into two parts, where the first part won't be affected and only the second part will be affected by noisy labels. We show that the influence of noisy labels on the second part can be reduced by our proposed LICS (Labeled Instance Centroid Smoothing) approach. The effectiveness of the LICS algorithm is justified both theoretically and empirically.