Similarity Learning for High-Dimensional Sparse Data

Oct-21-2015–arXiv.org Machine Learning

In many applications, such as text processing, computer vision or biology, data is represented as very highdimensional but sparse vectors. The ability to compute meaningful similarity scores between these objects is crucial to many tasks, such as classification, clustering or ranking. However, handcrafting a relevant similarity measure for such data is challenging because it is usually the case that only a small, often unknown subset of features is actually relevant to the task at hand. For instance, in drug discovery, chemical compounds can be represented as sparse features describing their 3D properties, and only a few of them play an role in determining whether the compound will bind to a target receptor (Guyon et al., 2004). In text classification, where each document is represented as a sparse bag of words, only a small subset of the words is generally sufficient to discriminate among documents of different topics. A principled way to obtain a similarity measure tailored to the problem of interest is to learn it from data. This line of research, known as similarity and distance metric learning, has been successfully applied to many application domains (see Kulis, 2012; Bellet et al., 2013, for recent surveys). The basic idea is to learn the parameters of a similarity (or distance) function such that it satisfies proximity-based constraints, requiring for instance that some data instance x be more similar to y than to z according to the learned function.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Machine Learning

Oct-21-2015

arXiv.org PDF

Add feedback

Country:
- North America > United States > California (0.14)

Genre:
- Research Report (0.64)

Industry:
- Government
  - Military (0.93)
  - Regional Government > North America Government
    - United States Government (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Text Processing (1.00)
  - Representation & Reasoning > Optimization (0.93)
  - Machine Learning > Statistical Learning
    - Support Vector Machines (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found