Zou, James
Rich Component Analysis
Ge, Rong, Zou, James
In many settings, we have multiple data sets (also called views) that capture different and overlapping aspects of the same phenomenon. We are often interested in finding patterns that are unique to one or to a subset of the views. For example, we might have one set of molecular observations and one set of physiological observations on the same group of individuals, and we want to quantify molecular patterns that are uncorrelated with physiology. Despite being a common problem, this is highly challenging when the correlations come from complex distributions. In this paper, we develop the general framework of Rich Component Analysis (RCA) to model settings where the observations from different views are driven by different sets of latent components, and each component can be a complex, high-dimensional distribution. We introduce algorithms based on cumulant extraction that provably learn each of the components without having to model the other components. We show how to integrate RCA with stochastic gradient descent into a meta-algorithm for learning general models, and demonstrate substantial improvement in accuracy on several synthetic and real datasets in both supervised and unsupervised tasks. Our method makes it possible to learn latent variable models when we don't have samples from the true model but only samples after complex perturbations.
Intersecting Faces: Non-negative Matrix Factorization With New Guarantees
Ge, Rong, Zou, James
Non-negative matrix factorization (NMF) is a natural model of admixture and is widely used in science and engineering. A plethora of algorithms have been developed to tackle NMF, but due to the non-convex nature of the problem, there is little guarantee on how well these methods work. Recently a surge of research have focused on a very restricted class of NMFs, called separable NMF, where provably correct algorithms have been developed. In this paper, we propose the notion of subset-separable NMF, which substantially generalizes the property of separability. We show that subset-separability is a natural necessary condition for the factorization to be unique or to have minimum volume. We developed the Face-Intersect algorithm which provably and efficiently solves subset-separable NMF under natural conditions, and we prove that our algorithm is robust to small noise. We explored the performance of Face-Intersect on simulations and discuss settings where it empirically outperformed the state-of-art methods. Our work is a step towards finding provably correct algorithms that solve large classes of NMF problems.
Threats and Trade-Offs in Resource Critical Crowdsourcing Tasks Over Networks
Nath, Swaprava (Indian Institute of Science, Bangalore) | Dayama, Pankaj (Global General Motors R&D โ India Science Lab) | Garg, Dinesh (IBM India Research Lab) | Narahari, Y. (Indian Institute of Science) | Zou, James (Harvard University)
In recent times, crowdsourcing over social networks has emerged as an active tool for complex task execution. In this paper, we address the problem faced by a planner to incentivize agents in the network to execute a task and also help in recruiting other agents for this purpose. We study this mechanism design problem under two natural resource optimization settings: (1) cost critical tasks, where the planner's goal is to minimize the total cost, and (2) time critical tasks, where the goal is to minimize the total time elapsed before the task is executed. We define a set of fairness properties that should be ideally satisfied by a crowdsourcing mechanism. We prove that no mechanism can satisfy all these properties simultaneously. We relax some of these properties and define their approximate counterparts. Under appropriate approximate fairness criteria, we obtain a non-trivial family of payment mechanisms. Moreover, we provide precise characterizations of cost critical and time critical mechanisms.
Tolerable Manipulability in Dynamic Assignment without Money
Zou, James (Harvard University) | Gujar, Sujit (Indian Institute of Science) | Parkes, David (Harvard University)
We study a problem of dynamic allocation without money. Agents have arrivals and departures and strict preferences over items. Strategyproofness requires the use of an arrival-priority serial-dictatorship (APSD) mechanism, which is ex post Pareto efficient but has poor ex ante efficiency as measured through average rank efficiency. We introduce the scoring-rule (SR) mechanism, which biases in favor of allocating items that an agent values above the population consensus. The SR mechanism is not strategyproof but has tolerable manipulability in the sense that: (i) if every agent optimally manipulates, it reduces to APSD, and (ii) it significantly outperforms APSD for rank efficiency when only a fraction of agents are strategic. The performance of SR is also robust to mistakes by agents that manipulate on the basis of inaccurate information about the popularity of items.