Chehreghani, Morteza Haghir (Xerox Research Centre Europe)

Minimax distance measures provide an effective way to capture the unknown underlying patterns and classes of the data in a non-parametric way. We develop a general-purpose framework to employ Minimax distances with any classification method that performs on numerical data. For this purpose, we establish a two-step strategy. First, we compute the pairwise Minimax distances between the objects, using the equivalence of Minimax distances over a graph and over a minimum spanning tree constructed on that. Then, we perform an embedding of the pairwise Minimax distances into a new vector space, such that their squared Euclidean distances in the new space are equal to their Minimax distances in the original space. We also consider the cases where multiple pairwise Minimax matrices are given, instead of a single one. Thereby, we propose an embedding via first summing up the centered matrices and then performing an eigenvalue decomposition. We experimentally validate our framework on different synthetic and real-world datasets.

Chehreghani, Morteza Haghir (Xerox Research Centre Europe)

Minimax distance measures provide an effective way to capture the unknown underlying patterns and classes of the data in a non-parametric way. We develop a general-purpose framework to employ Minimax distances with any classification method that performs on numerical data. For this purpose, we establish a two-step strategy. First, we compute the pairwise Minimax distances between the objects, using the equivalence of Minimax distances over a graph and over a minimum spanning tree constructed on that. Then, we perform an embedding of the pairwise Minimax distances into a new vector space, such that their squared Euclidean distances in the new space are equal to their Minimax distances in the original space. We also consider the cases where multiple pairwise Minimax matrices are given, instead of a single one. Thereby, we propose an embedding via first summing up the centered matrices and then performing an eigenvalue decomposition.

Chehreghani, Morteza Haghir, Chehreghani, Mostafa Haghir

We study nonparametric feature extraction from hierarchies. The commonly used Minimax distance measures correspond to building a dendrogram with single linkage criterion, with the definition of specific forms of a level function and a distance function over that. Therefore, we develop a generalized framework wherein different distance measures can be inferred from different types of dendrograms, level functions and distance functions. Via an appropriate embedding, we compute a vector-based representation of the inferred distances, in order to enable many numerical machine learning algorithms to employ such distances. Then, we study the aggregation of different dendrogram-based distances respectively in solution space and in representation space in the spirit of deep learning models. Finally, we demonstrate the effectiveness of our approach via numerical studies.

Kim, Kye-Hyeon (Pohang University of Science and Technology) | Choi, Seungjin (Pohang University of Science and Technology)

Link-based dissimilarity measures, such as shortest path or Euclidean commute time distance, base their distance on paths between nodes of a weighted graph. These measures are known to be better suited to data manifold with nonconvex-shaped clusters, compared to Euclidean distance, so that k -nearest neighbor (NN) search is improved in such metric spaces. In this paper we present a new link-based dissimilarity measure based on minimax paths between nodes. Two main benefits of minimax path-based dissimilarity measure are: (1) only a subset of paths is considered to make it scalable, while Euclidean commute time distance considers all possible paths; (2) it better captures nonconvex-shaped cluster structure, compared to shortest path distance. We define the total cost assigned to a path between nodes as L p norm of intermediate costs of edges involving the path, showing that minimax path emerges from our L p norm over paths framework. We also define minimax distance as the intermediate cost of the longest edge on the minimax path, then present a greedy algorithm to compute k smallest minimax distances between a query and N data points in O(log N + k log k) time. Numerical experiments demonstrate that our minimax k-NN algorithm reduce the search time by several orders of magnitude, compared to existing methods, while the quality of k -NN search is significantly improved over Euclidean distance.

Koolen, Wouter M., Malek, Alan, Bartlett, Peter L.

We consider online prediction problems where the loss between the prediction and the outcome is measured by the squared Euclidean distance and its generalization, the squared Mahalanobis distance. We derive the minimax solutions for the case where the prediction and action spaces are the simplex (this setup is sometimes called the Brier game) and the $\ell_2$ ball (this setup is related to Gaussian density estimation). We show that in both cases the value of each sub-game is a quadratic function of a simple statistic of the state, with coefficients that can be efficiently computed using an explicit recurrence relation. The resulting deterministic minimax strategy and randomized maximin strategy are linear functions of the statistic.