Gregory Valiant
Learning Populations of Parameters
Kevin Tian, Weihao Kong, Gregory Valiant
Learning Overcomplete HMMs
Vatsal Sharan, Sham M. Kakade, Percy S. Liang, Gregory Valiant
We study the problem of learning overcomplete HMMs--those that have many hidden states but a small output alphabet. Despite having significant practical importance, such HMMs are poorly understood with no known positive or negative results for efficient learning. In this paper, we present several new results--both positive and negative--which help define the boundaries between the tractable and intractable settings. Specifically, we show positive results for a large subclass of HMMs whose transition matrices are sparse, well-conditioned, and have small probability mass on short cycles. On the other hand, we show that learning is impossible given only a polynomial number of samples for HMMs with a small output alphabet and whose transition matrices are random regular graphs with large degree. We also discuss these results in the context of learning HMMs which can capture long-term dependencies.
Estimating Learnability in the Sublinear Data Regime
Weihao Kong, Gregory Valiant
A Spectral View of Adversarially Robust Features
Shivam Garg, Vatsal Sharan, Brian Zhang, Gregory Valiant
Making AI Forget You: Data Deletion in Machine Learning
Antonio Ginart, Melody Guan, Gregory Valiant, James Y. Zou
Intense recent discussions have focused on how to provide individuals with control over when their data can and cannot be used -- the EU's Right To Be Forgotten regulation is an example of this effort. In this paper we initiate a framework studying what to do when it is no longer permissible to deploy models derivative from specific user data. In particular, we formulate the problem of efficiently deleting individual data points from trained machine learning models. For many standard ML models, the only way to completely remove an individual's data is to retrain the whole model from scratch on the remaining data, which is often not computationally practical. We investigate algorithmic principles that enable efficient data deletion in ML. For the specific setting of k-means clustering, we propose two provably efficient deletion algorithms which achieve an average of over 100 improvement in deletion efficiency across 6 datasets, while producing clusters of comparable statistical quality to a canonical k-means++ baseline.
Estimating Learnability in the Sublinear Data Regime
Weihao Kong, Gregory Valiant
A Spectral View of Adversarially Robust Features
Shivam Garg, Vatsal Sharan, Brian Zhang, Gregory Valiant
Given the apparent difficulty of learning models that are robust to adversarial perturbations, we propose tackling the simpler problem of developing adversarially robust features. Specifically, given a dataset and metric of interest, the goal is to return a function (or multiple functions) that 1) is robust to adversarial perturbations, and 2) has significant variation across the datapoints. We establish strong connections between adversarially robust features and a natural spectral property of the geometry of the dataset and metric of interest. This connection can be leveraged to provide both robust features, and a lower bound on the robustness of any function that has significant variance across the dataset. Finally, we provide empirical evidence that the adversarially robust features given by this spectral approach can be fruitfully leveraged to learn a robust (and accurate) model.
Making AI Forget You: Data Deletion in Machine Learning
Antonio Ginart, Melody Guan, Gregory Valiant, James Y. Zou
Intense recent discussions have focused on how to provide individuals with control over when their data can and cannot be used -- the EU's Right To Be Forgotten regulation is an example of this effort. In this paper we initiate a framework studying what to do when it is no longer permissible to deploy models derivative from specific user data. In particular, we formulate the problem of efficiently deleting individual data points from trained machine learning models. For many standard ML models, the only way to completely remove an individual's data is to retrain the whole model from scratch on the remaining data, which is often not computationally practical. We investigate algorithmic principles that enable efficient data deletion in ML. For the specific setting of k-means clustering, we propose two provably efficient deletion algorithms which achieve an average of over 100 improvement in deletion efficiency across 6 datasets, while producing clusters of comparable statistical quality to a canonical k-means++ baseline.
Avoiding Imposters and Delinquents: Adversarial Crowdsourcing and Peer Prediction
Jacob Steinhardt, Gregory Valiant, Moses Charikar
We consider a crowdsourcing model in which n workers are asked to rate the quality of n items previously generated by other workers. An unknown set of αn workers generate reliable ratings, while the remaining workers may behave arbitrarily and possibly adversarially. The manager of the experiment can also manually evaluate the quality of a small number of items, and wishes to curate together almost all of the high-quality items with at most an ɛ fraction of low-quality items.