Goto

Collaborating Authors

 algorithm converge





Cooperative Multi-player Bandit Optimization

Neural Information Processing Systems

Consider a team of cooperative players that take actions in a networked-environment. At each turn, each player chooses an action and receives a reward that is an unknown function of all the players' actions. The goal of the team of players is to learn to play together the action profile that maximizes the sum of their rewards. However, players cannot observe the actions or rewards of other players, and can only get this information by communicating with their neighbors. We design a distributed learning algorithm that overcomes the informational bias players have towards maximizing the rewards of nearby players they got more information about. We assume twice continuously differentiable reward functions and constrained convex and compact action sets. Our communication graph is a random time-varying graph that follows an ergodic Markov chain. We prove that even if at every turn players take actions based only on the small random subset of the players' rewards that they know, our algorithm converges with probability 1 to the set of stationary points of (projected) gradient ascent on the sum of rewards function. Hence, if the sum of rewards is concave, then the algorithm converges with probability 1 to the optimal action profile.


Decoupling "when to update" from "how to update"

Eran Malach, Shai Shalev-Shwartz

Neural Information Processing Systems

A useful approach to obtain data is to be creative and mine data from various sources, that were created for different purposes. Unfortunately, this approach often leads to noisy labels. In this paper, we propose a meta algorithm for tackling the noisy labels problem. The key idea is to decouple "when to update" from "how to update". We demonstrate the effectiveness of our algorithm by mining data for gender classification by combining the Labeled Faces in the Wild (LFW) face recognition dataset with a textual genderizing service, which leads to a noisy dataset. While our approach is very simple to implement, it leads to state-of-the-art results. We analyze some convergence properties of the proposed algorithm.





Reviews: Curvilinear Distance Metric Learning

Neural Information Processing Systems

Originality: The method is new and provides a direct generalization of the Linear Distance Metric learning. Quality: Theorems are clearly interesting to validate the methodology. Fitting capacity result (Theorem 2) ensures that there exists a curvilinear metric that can well separate the data. The Generalization bound ensures empirical loss converges to the expected loss. However, it is unclear whether this ensures that the algorithm converges to the/a Distance introduced by Theorem 2 (the distance well separating the data).


Cooperative Multi-player Bandit Optimization

Neural Information Processing Systems

Consider a team of cooperative players that take actions in a networked-environment. At each turn, each player chooses an action and receives a reward that is an unknown function of all the players' actions. The goal of the team of players is to learn to play together the action profile that maximizes the sum of their rewards. However, players cannot observe the actions or rewards of other players, and can only get this information by communicating with their neighbors. We design a distributed learning algorithm that overcomes the informational bias players have towards maximizing the rewards of nearby players they got more information about.