metric learning algorithm
- Europe > United Kingdom > England > Greater London > London (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
Provably Robust Metric Learning
Metric learning is an important family of algorithms for classification and similarity search, but the robustness of learned metrics against small adversarial perturbations is less studied. In this paper, we show that existing metric learning algorithms, which focus on boosting the clean accuracy, can result in metrics that are less robust than the Euclidean distance. To overcome this problem, we propose a novel metric learning algorithm to find a Mahalanobis distance that is robust against adversarial perturbations, and the robustness of the resulting model is certifiable. Experimental results show that the proposed metric learning algorithm improves both certified robust errors and empirical robust errors (errors under adversarial attacks). Furthermore, unlike neural network defenses which usually encounter a trade-off between clean and robust errors, our method does not sacrifice clean errors compared with previous metric learning methods.
- Europe > United Kingdom > England > Greater London > London (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (3 more...)
Sample Complexity of Learning Mahalanobis Distance Metrics
Metric learning seeks a transformation of the feature space that enhances prediction quality for a given task. In this work we provide P AC-style sample complexity rates for supervised metric learning. We give matching lower-and upper-bounds showing that sample complexity scales with the representation dimension when no assumptions are made about the underlying data distribution. In addition, by leveraging the structure of the data distribution, we provide rates fine-tuned to a specific notion of the intrinsic complexity of a given dataset, allowing us to relax the dependence on representation dimension. We show both theoretically and empirically that augmenting the metric learning optimization criterion with a simple norm-based regularization is important and can help adapt to a dataset's intrinsic complexity yielding better generalization, thus partly explaining the empirical success of similar regularizations reported in previous works.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
Provably Robust Metric Learning
Metric learning is an important family of algorithms for classification and similarity search, but the robustness of learned metrics against small adversarial perturbations is less studied. In this paper, we show that existing metric learning algorithms, which focus on boosting the clean accuracy, can result in metrics that are less robust than the Euclidean distance. To overcome this problem, we propose a novel metric learning algorithm to find a Mahalanobis distance that is robust against adversarial perturbations, and the robustness of the resulting model is certifiable. Experimental results show that the proposed metric learning algorithm improves both certified robust errors and empirical robust errors (errors under adversarial attacks). Furthermore, unlike neural network defenses which usually encounter a trade-off between clean and robust errors, our method does not sacrifice clean errors compared with previous metric learning methods.
Large Margin Multi-Task Metric Learning
Multi-task learning (MTL) improves the prediction performance on multiple, different but related, learning problems through shared parameters or representations. One of the most prominent multi-task learning algorithms is an extension to support vector machines (svm) by Evgeniou et al. [15]. Although very elegant, multi-task svm is inherently restricted by the fact that support vector machines require each class to be addressed explicitly with its own weight vector which, in a multi-task setting, requires the different learning tasks to share the same set of classes. This paper proposes an alternative formulation for multi-task learning by extending the recently published large margin nearest neighbor (lmnn) algorithm to the MTL paradigm. Instead of relying on separating hyperplanes, its decision function is based on the nearest neighbor rule which inherently extends to many classes and becomes a natural fit for multi-task learning. We evaluate the resulting multi-task lmnn on real-world insurance data and speech classification problems and show that it consistently outperforms single-task kNN under several metrics and state-of-the-art MTL classifiers.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Missouri > St. Louis County > St. Louis (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (3 more...)
Sample Complexity of Learning Mahalanobis Distance Metrics
Metric learning seeks a transformation of the feature space that enhances prediction quality for a given task. In this work we provide PAC-style sample complexity rates for supervised metric learning. We give matching lower-and upper-bounds showing that sample complexity scales with the representation dimension when no assumptions are made about the underlying data distribution. In addition, by leveraging the structure of the data distribution, we provide rates fine-tuned to a specific notion of the intrinsic complexity of a given dataset, allowing us to relax the dependence on representation dimension. We show both theoretically and empirically that augmenting the metric learning optimization criterion with a simple norm-based regularization is important and can help adapt to a dataset's intrinsic complexity yielding better generalization, thus partly explaining the empirical success of similar regularizations reported in previous works.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
Semi-Crowdsourced Clustering: Generalizing Crowd Labeling by Robust Distance Metric Learning
One of the main challenges in data clustering is to define an appropriate similarity measure between two objects. Crowdclustering addresses this challenge by defining the pairwise similarity based on the manual annotations obtained through crowdsourcing. Despite its encouraging results, a key limitation of crowdclustering is that it can only cluster objects when their manual annotations are available. To address this limitation, we propose a new approach for clustering, called \textit{semi-crowdsourced clustering} that effectively combines the low-level features of objects with the manual annotations of a subset of the objects obtained via crowdsourcing. The key idea is to learn an appropriate similarity measure, based on the low-level features of objects, from the manual annotations of only a small portion of the data to be clustered.
A Riemannian Primal-dual Algorithm Based on Proximal Operator and its Application in Metric Learning
Wang, Shijun, Zhu, Baocheng, Ma, Lintao, Qi, Yuan
In this paper, we consider optimizing a smooth, convex, lower semicontinuous function in Riemannian space with constraints. To solve the problem, we first convert it to a dual problem and then propose a general primal-dual algorithm to optimize the primal and dual variables iteratively. In each optimization iteration, we employ a proximal operator to search optimal solution in the primal space. We prove convergence of the proposed algorithm and show its non-asymptotic convergence rate. By utilizing the proposed primal-dual optimization technique, we propose a novel metric learning algorithm which learns an optimal feature transformation matrix in the Riemannian space of positive definite matrices. Preliminary experimental results on an optimal fund selection problem in fund of funds (FOF) management for quantitative investment showed its efficacy.
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States (0.04)
- Asia > Middle East > Jordan (0.04)
- (2 more...)
Semi-Crowdsourced Clustering: Generalizing Crowd Labeling by Robust Distance Metric Learning
Yi, Jinfeng, Jin, Rong, Jain, Shaili, Yang, Tianbao, Jain, Anil K.
One of the main challenges in data clustering is to define an appropriate similarity measure between two objects. Crowdclustering addresses this challenge by defining the pairwise similarity based on the manual annotations obtained through crowdsourcing. Despite its encouraging results, a key limitation of crowdclustering is that it can only cluster objects when their manual annotations are available. To address this limitation, we propose a new approach for clustering, called \textit{semi-crowdsourced clustering} that effectively combines the low-level features of objects with the manual annotations of a subset of the objects obtained via crowdsourcing. The key idea is to learn an appropriate similarity measure, based on the low-level features of objects, from the manual annotations of only a small portion of the data to be clustered.