Plotting

 Fotakis, Dimitris


Efficient Algorithms for Learning from Coarse Labels

arXiv.org Machine Learning

For many learning problems one may not have access to fine grained label information; e.g., an image can be labeled as husky, dog, or even animal depending on the expertise of the annotator. In this work, we formalize these settings and study the problem of learning from such coarse data. Instead of observing the actual labels from a set $\mathcal{Z}$, we observe coarse labels corresponding to a partition of $\mathcal{Z}$ (or a mixture of partitions). Our main algorithmic result is that essentially any problem learnable from fine grained labels can also be learned efficiently when the coarse data are sufficiently informative. We obtain our result through a generic reduction for answering Statistical Queries (SQ) over fine grained labels given only coarse labels. The number of coarse labels required depends polynomially on the information distortion due to coarsening and the number of fine labels $|\mathcal{Z}|$. We also investigate the case of (infinitely many) real valued labels focusing on a central problem in censored and truncated statistics: Gaussian mean estimation from coarse data. We provide an efficient algorithm when the sets in the partition are convex and establish that the problem is NP-hard even for very simple non-convex sets.


Aggregating Incomplete and Noisy Rankings

arXiv.org Machine Learning

Aggregating a collection of (possibly noisy and incomplete) ranked preferences into a complete ranking over a set of alternatives is a fundamental and extensively studied problem with numerous applications. Ranking aggregation has received considerable research attention in several fields, for decades and from virtually all possible aspects. Most relevant, Statistics investigates the properties of ranking distributions, which provide principled ways to generate noisy rankings from structural information about the alternatives' relative order. Best known among them are the distance-based model of Mallows [1957] and the parametric models of Thurstone [1927], Smith [1950], Bradley and Terry [1952], Plackett [1975] and Luce [2012]. Moreover, Machine Learning and Statistical Learning Theory aim to develop (statistically and computationally) efficient ways of retrieving the true ordering of the alternatives from noisy (and possibly incomplete) rankings (see e.g., [Xia, 2019] and the references therein). Virtually all previous work in the latter research direction assumes that the input is a collection of either complete rankings (chosen adversarially, e.g., Ailon et al. [2008], Kenyon-Mathieu and Schudy [2007], or drawn from an unknown ranking distribution, e.g., [Caragiannis et al., 2013, Busa-Fekete et al., 2019]), or outcomes of noisy pairwise comparisons (see e.g., [Feige et al., 1994, Mao et al., 2018a]). Due to a significant volume of relatively recent research, the computational and statistical complexity of determining the best ranking based on either complete rankings or pairwise comparisons are well understood. However, in most modern applications of ranking aggregation, the input consists of incomplete rankings of more than two alternatives. E.g., think of e-commerce or media streaming services, with a huge collection of alternatives, which generate personalized recommendations based on rankings aggregated by user ratings (see also Hajek et al. [2014]).


Efficient Parameter Estimation of Truncated Boolean Product Distributions

arXiv.org Machine Learning

We study the problem of estimating the parameters of a Boolean product distribution in $d$ dimensions, when the samples are truncated by a set $S \subset \{0, 1\}^d$ accessible through a membership oracle. This is the first time that the computational and statistical complexity of learning from truncated samples is considered in a discrete setting. We introduce a natural notion of fatness of the truncation set $S$, under which truncated samples reveal enough information about the true distribution. We show that if the truncation set is sufficiently fat, samples from the true distribution can be generated from truncated samples. A stunning consequence is that virtually any statistical task (e.g., learning in total variation distance, parameter estimation, uniformity or identity testing) that can be performed efficiently for Boolean product distributions, can also be performed from truncated samples, with a small increase in sample complexity. We generalize our approach to ranking distributions over $d$ alternatives, where we show how fatness implies efficient parameter estimation of Mallows models from truncated samples. Exploring the limits of learning discrete models from truncated samples, we identify three natural conditions that are necessary for efficient identifiability: (i) the truncation set $S$ should be rich enough; (ii) $S$ should be accessible through membership queries; and (iii) the truncation by $S$ should leave enough randomness in all directions. By carefully adapting the Stochastic Gradient Descent approach of (Daskalakis et al., FOCS 2018), we show that these conditions are also sufficient for efficient learning of truncated Boolean product distributions.


Optimal Learning of Mallows Block Model

arXiv.org Machine Learning

The Mallows model, introduced in the seminal paper of Mallows 1957, is one of the most fundamental ranking distribution over the symmetric group $S_m$. To analyze more complex ranking data, several studies considered the Generalized Mallows model defined by Fligner and Verducci 1986. Despite the significant research interest of ranking distributions, the exact sample complexity of estimating the parameters of a Mallows and a Generalized Mallows Model is not well-understood. The main result of the paper is a tight sample complexity bound for learning Mallows and Generalized Mallows Model. We approach the learning problem by analyzing a more general model which interpolates between the single parameter Mallows Model and the $m$ parameter Mallows model. We call our model Mallows Block Model -- referring to the Block Models that are a popular model in theoretical statistics. Our sample complexity analysis gives tight bound for learning the Mallows Block Model for any number of blocks. We provide essentially matching lower bounds for our sample complexity results. As a corollary of our analysis, it turns out that, if the central ranking is known, one single sample from the Mallows Block Model is sufficient to estimate the spread parameters with error that goes to zero as the size of the permutations goes to infinity. In addition, we calculate the exact rate of the parameter estimation error.


The Power of Verification for Greedy Mechanism Design

Journal of Artificial Intelligence Research

Greedy algorithms are known to provide, in polynomial time, near optimal approximation guarantees for Combinatorial Auctions (CAs) with multidimensional bidders. It is known that truthful greedy-like mechanisms for CAs with multi-minded bidders do not achieve good approximation guarantees. In this work, we seek a deeper understanding of greedy mechanism design and investigate under which general assumptions, we can have efficient and truthful greedy mechanisms for CAs. Towards this goal, we use the framework of priority algorithms and weak and strong verification, where the bidders are not allowed to overbid on their winning set or on any subset of this set, respectively. We provide a complete characterization of the power of weak verification showing that it is sufficient and necessary for any greedy fixed priority algorithm to become truthful with the use of money or not, depending on the ordering of the bids. Moreover, we show that strong verification is sufficient and necessary to obtain a 2-approximate truthful mechanism with money, based on a known greedy algorithm, for the problem of submodular CAs in finite bidding domains. Our proof is based on an interesting structural analysis of the strongly connected components of the declaration graph.