Escaping the Curse of Dimensionality in Similarity Learning: Efficient Frank-Wolfe Algorithm and Generalization Bounds

Oct-31-2018–arXiv.org Machine Learning

High-dimensional and sparse data are commonly encountered in many applications of machine learning, such as computer vision, bioinformatics, text mining and behavioral targeting. To classify, cluster or rank data points, it is important to be able to compute semantically meaningful similarities between them. However, defining an appropriate similarity measure for a given task is often difficult as only a small and unknown subset of all features are actually relevant. For instance, in drug discovery studies, chemical compounds are typically represented by a large number of sparse features describing their 2D and 3D properties, and only a few of them play in role in determining whether the compound will bind to a particular target receptor (Leach and Gillet, 2007). In text classification and clustering, a document is often represented as a sparse bag of words, and only a small subset of the dictionary is generally useful to discriminate between documents about different topics. Another example is targeted advertising, where ads are selected based on fine-grained user history (Chen et al., 2009). Similarity and metric learning (Bellet et al., 2015) offers principled approaches to construct a taskspecific similarity measure by learning it from weakly supervised data, and has been used in many application domains. The main theme in these methods is to learn the parameters of a similarity (or distance) function such that it agrees with task-specific similarity judgments (e.g., of the form "data point x should

algorithm, optimization problem, us government, (22 more...)

arXiv.org Machine Learning

Oct-31-2018

arXiv.org PDF

Add feedback

Country:
- North America > United States > California (0.14)

Genre:
- Research Report (0.63)

Industry:
- Government
  - Military (0.93)
  - Regional Government > North America Government
    - United States Government (0.67)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning
      - Learning in High Dimensional Spaces (0.64)
      - Performance Analysis > Accuracy (0.46)
      - Statistical Learning > Support Vector Machines (0.46)
    - Representation & Reasoning > Optimization (1.00)
  - Data Science > Data Mining (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found