Aggregating Data for Optimal and Private Learning

Agarwal, Sushant, Makhija, Yukti, Saket, Rishi, Raghuveer, Aravindan

Nov-28-2024–arXiv.org Artificial Intelligence

In many applications however, due to lack of instrumentation or annotators [ Chen et al., 2004, Dery et al., 2017 ], or privacy constraints [ Rueping, 2010 ], instance-wise labels may not be available. Instead, the dat aset is partitioned into disjoint sets or bags of instances, and for each bag only one bag-label is available to the learner. The bag-label is derived from th e undisclosed instance-labels present in the bag via some agg regation function depending on the scenario. The goal is to train a model predicting the labels of individual i nstances. We call this paradigm as learning from aggregate labels, which directly generalizes traditional supervised learning, the latter being the special case of unit-sized bags. The two formalizations of our focus are ( i) multiple instance regression (MIR) where the bag-label is one of the instance-labels of the bag, and the in stance whose label is chosen as the bag-label is not revealed, and (ii) learning from label proportions (LLP) in which the bag-label is the average of the bag's instance-labels. In MIR as well as in LLP, our work considers real-valued instance-labels with regression as the underlying instance-level task.

artificial intelligence, machine learning, random 0, (17 more...)

arXiv.org Artificial Intelligence

Nov-28-2024

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.64)

Industry:
- Information Technology > Security & Privacy (0.67)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)