Aggregating Data for Optimal and Private Learning
Agarwal, Sushant, Makhija, Yukti, Saket, Rishi, Raghuveer, Aravindan
–arXiv.org Artificial Intelligence
In many applications however, due to lack of instrumentation or annotators [ Chen et al., 2004, Dery et al., 2017 ], or privacy constraints [ Rueping, 2010 ], instance-wise labels may not be available. Instead, the dat aset is partitioned into disjoint sets or bags of instances, and for each bag only one bag-label is available to the learner. The bag-label is derived from th e undisclosed instance-labels present in the bag via some agg regation function depending on the scenario. The goal is to train a model predicting the labels of individual i nstances. We call this paradigm as learning from aggregate labels, which directly generalizes traditional supervised learning, the latter being the special case of unit-sized bags. The two formalizations of our focus are ( i) multiple instance regression (MIR) where the bag-label is one of the instance-labels of the bag, and the in stance whose label is chosen as the bag-label is not revealed, and (ii) learning from label proportions (LLP) in which the bag-label is the average of the bag's instance-labels. In MIR as well as in LLP, our work considers real-valued instance-labels with regression as the underlying instance-level task.
arXiv.org Artificial Intelligence
Nov-28-2024