We investigate the use of Minimax distances to extract in a nonparametric way the features that capture the unknown underlying patterns and structures in the data. We develop a general-purpose framework to employ Minimax distances with many machine learning methods that perform on numerical data. For this purpose, first, we compute the pairwise Minimax distances between the objects, using the equivalence of Minimax distances over a graph and over a minimum spanning tree constructed on that. Then, we perform an embedding of the pairwise Minimax distances into a new vector space, such that their squared Euclidean distances in the new space equal to the pairwise Minimax distances in the original space. In the following, we study the case of having multiple pairwise Minimax matrices, instead of a single one. Thereby, we propose an embedding via first summing up the centered matrices and then performing an eigenvalue decomposition. Finally, we perform several experimental studies to illustrate the effectiveness of our framework.
We provide a unified treatment of a broad class of noisy structure recovery problems, known as structured normal means problems. In this setting, the goal is to identify, from a finite collection of Gaussian distributions with different means, the distribution that produced some observed data. Recent work has studied several special cases including sparse vectors, biclusters, and graph-based structures. We establish nearly matching upper and lower bounds on the minimax probability of error for any structured normal means problem, and we derive an optimality certificate for the maximum likelihood estimator, which can be applied to many instantiations. We also consider an experimental design setting, where we generalize our minimax bounds and derive an algorithm for computing a design strategy with a certain optimality property. We show that our results give tight minimax bounds for many structure recovery problems and consider some consequences for interactive sampling.
Minimax distance measures provide an effective way to capture the unknown underlying patterns and classes of the data in a non-parametric way. We develop a general-purpose framework to employ Minimax distances with any classification method that performs on numerical data. For this purpose, we establish a two-step strategy. First, we compute the pairwise Minimax distances between the objects, using the equivalence of Minimax distances over a graph and over a minimum spanning tree constructed on that. Then, we perform an embedding of the pairwise Minimax distances into a new vector space, such that their squared Euclidean distances in the new space are equal to their Minimax distances in the original space. We also consider the cases where multiple pairwise Minimax matrices are given, instead of a single one. Thereby, we propose an embedding via first summing up the centered matrices and then performing an eigenvalue decomposition. We experimentally validate our framework on different synthetic and real-world datasets.
We consider the problem of accurately estimating the reliability of workers based on noisy labels they provide, which is a fundamental question in crowdsourcing. We propose a novel lower bound on the minimax estimation error which applies to any estimation procedure. We further propose Triangular Estimation (TE), an algorithm for estimating the reliability of workers. TE has low complexity, may be implemented in a streaming setting when labels are provided by workers in real time, and does not rely on an iterative procedure. We prove that TE is minimax optimal and matches our lower bound. We conclude by assessing the performance of TE and other state-of-the-art algorithms on both synthetic and real-world data.
The inverse covariance matrix provides considerable insight for understanding statistical models in the multivariate setting. In particular, when the distribution over variables is assumed to be multivariate normal, the sparsity pattern in the inverse covariance matrix, commonly referred to as the precision matrix, corresponds to the adjacency matrix representation of the Gauss-Markov graph, which encodes conditional independence statements between variables. Minimax results under the spectral norm have previously been established for covariance matrices, both sparse and banded, and for sparse precision matrices. We establish minimax estimation bounds for estimating banded precision matrices under the spectral norm. Our results greatly improve upon the existing bounds; in particular, we find that the minimax rate for estimating banded precision matrices matches that of estimating banded covariance matrices.