Structural Similarity and Distance in Learning
Wang, Joseph, Saligrama, Venkatesh, Castañón, David A.
The notion of distance, or more generally, similarity between observations, is at the root of most learning algorithms such as manifold learning, unsupervised clustering, semi-supervised learning, and anomaly detection. Most methods, at a basic level, are based on some function of Euclidean distance, such as the radial basis functions prevalent in supervised classification. Graphbased learning methods employ Euclidean distances to describe local neighborhoods for observations. K-nearest neighbors and ǫ-neighborhoods based on Euclidean distances are used in manifold learning (Isomap [1] and LLE [2]), spectral clustering [3], anomaly detection algorithms [4], [5] and in label propagation algorithms for semi-supervised learning [6]. In many cases, notably sparsely sampled sets of data, Euclidean neighborhoods are not sufficient to represent underlying structure. Consider data drawn from two independent structures. Ideally, a graph would capture the structure of the data with minimal connection between observations on separate structures.
Oct-26-2011