Many algorithms, whether supervised or unsupervised, make use of distance measures. These measures, such as euclidean distance or cosine similarity, can often be found in algorithms such as k-NN, UMAP, HDBSCAN, etc. Understanding the field of distance measures is more important than you might realize. Take k-NN for example, a technique often used for supervised learning. As a default, it often uses euclidean distance. However, what if your data is highly dimensional?

At the core of cluster analysis is the concept of measuring distances between a variety of different data point dimensions. For example, when considering k-means clustering, there is a need to measure a) distances between individual data point dimensions and the corresponding cluster centroid dimensions of all clusters, and b) distances between cluster centroid dimensions and all resulting cluster member data point dimensions. While k-means, the simplest and most prominent clustering algorithm, generally uses Euclidean distance as its similarity distance measurement, contriving innovative or variant clustering algorithms which, among other alterations, utilize different distance measurements is not a stretch. It is thus a judgment of orientation and not magnitude: two vectors with the same orientation have a cosine similarity of 1, two vectors at 90 have a similarity of 0, and two vectors diametrically opposed have a similarity of -1, independent of their magnitude.