Linear Regression with Shuffled Labels

Abid, Abubakar, Poon, Ada, Zou, James

arXiv.org Machine Learning 

Since at least the 19th century, linear regression has been widely used in statistics to infer the relationship between one more explanatory variables (or input features) and a continuous dependent variable (or label) [1, 2]. In the classical setting, linear regression is used on supervised datasets that are fully and individually labeled. Not all data fit this criterion, so, in recent years, the question of inference from weakly-supervised datasets has drawn attention in the machine learning community [3, 4, 5]. In weakly-supervised datasets, data are neither entirely labeled nor entirely unlabeled; a subset of the data may be labeled, as is the case in semi-supervised learning, or the data may be implicitly labeled, as occurs, for example, in multi-instance learning [6, 7]. Weakly-supervised datasets naturally arise in situations where obtaining labels for individual data is expensive or difficult; often times, it is significantly easier to conduct experiments that provide partial information. In this paper, we study one specific case of weakly-supervised data: shuffled data, in which all of the labels are observed, but the mutual ordering between the input features and the labels is unknown. Shuffled linear regression, then, can be described as a variant of traditional linear regression in which the labels are additionally perturbed by an unknown permutation.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found