Goto

Collaborating Authors

 Karzand, Mina


Score Design for Multi-Criteria Incentivization

arXiv.org Artificial Intelligence

We present a framework for designing scores to summarize performance metrics. Our design has two multi-criteria objectives: (1) improving on scores should improve all performance metrics, and (2) achieving pareto-optimal scores should achieve pareto-optimal metrics. We formulate our design to minimize the dimensionality of scores while satisfying the objectives. We give algorithms to design scores, which are provably minimal under mild assumptions on the structure of performance metrics. This framework draws motivation from real-world practices in hospital rating systems, where misaligned scores and performance metrics lead to unintended consequences.


Active Learning in the Overparameterized and Interpolating Regime

arXiv.org Artificial Intelligence

Overparameterized models that interpolate training data often display surprisingly good generalization properties. Specifically, minimum norm solutions have been shown to generalize well in the overparameterized, interpolating regime. This paper introduces a new framework for active learning based on the notion of minimum norm interpolators. We analytically study its properties and behavior in the kernel-based setting, and present experimental studies with kernel methods and neural networks. In general, active learning algorithms adaptively select examples for labeling that (1) rule-out as many (incompatible) classifiers as possible at each step and/or (2) discover cluster structure in unlabeled data and label representative examples from each cluster. We show that our new active learning approach based on a minimum norm heuristic automatically exploits both these strategies. The success of deep learning systems has sparked interest in understanding how and why overparameterized models that interpolate the training data often display surprisingly good generalization properties [27, 11, 38, 12, 8, 1, 10, 23]. Notably, it is now understood that mininum norm solutions have the potential to generalize well in the overparameterized, interpolating regime [11, 9, 23, 25].


Regret Bounds and Regimes of Optimality for User-User and Item-Item Collaborative Filtering

arXiv.org Machine Learning

We consider an online model for recommendation systems, with each user being recommended an item at each time-step and providing 'like' or 'dislike' feedback. A latent variable model specifies the user preferences: both users and items are clustered into types. All users of a given type have identical preferences for the items, and similarly, items of a given type are either all liked or all disliked by a given user. The model captures structure in both the item and user spaces, and in this paper, we assume that the type preference matrix is randomly generated. We describe two algorithms inspired by user-user and item-item collaborative filtering (CF), modified to explicitly make exploratory recommendations, and prove performance guarantees in terms of their expected regret. For two regimes of model parameters, with structure only in item space or only in user space, we prove information-theoretic lower bounds on regret that match our upper bounds up to logarithmic factors. Our analysis elucidates system operating regimes in which existing CF algorithms are nearly optimal.


Learning a Tree-Structured Ising Model in Order to Make Predictions

arXiv.org Machine Learning

We study the problem of learning a tree graphical model from samples such that low-order marginals are accurate. We define a distance ("small set TV" or ssTV) between distributions $P$ and $Q$ by taking the maximum, over all subsets $\mathcal{S}$ of a given size, of the total variation between the marginals of P and Q on $\mathcal{S}$. Approximating a distribution to within small ssTV allows making predictions based on partial observations. Focusing on pairwise marginals and tree-structured Ising models on $p$ nodes with maximum edge strength $\beta$, we prove that $\max\{e^{2\beta}\log p, \eta^{-2}\log(p/\eta)\} $ i.i.d. samples suffices to get a distribution (from the same class) with ssTV at most $\eta$ from the one generating the samples.