Learnability of Linear Thresholds from Label Proportions

Neural Information Processing Systems 

We study the problem of properly learning linear threshold functions (LTFs) in the learning from label proportions (LLP) framework. In this, the learning is on a collection of bags of feature-vectors with only the proportion of labels available for each bag. First, we provide an algorithm that, given a collection of such bags each of size at most two whose label proportions are consistent with (i.e., the bags are satisfied by) an unknown LTF, efficiently produces an LTF that satisfies at least (2/5) -fraction of the bags. If all the bags are non-monochromatic (i.e., bags of size two with differently labeled feature-vectors) the algorithm satisfies at least (1/2) -fraction of them. For the special case of OR over the d -dimensional boolean vectors, we give an algorithm which computes an LTF achieving an additional \Omega(1/d) in accuracy for the two cases.Our main result provides evidence that these algorithmic bounds cannot be significantly improved, even for learning monotone ORs using LTFs.