Linear Regression with Shuffled Labels

May-4-2017–arXiv.org Machine Learning

Since at least the 19th century, linear regression has been widely used in statistics to infer the relationship between one more explanatory variables (or input features) and a continuous dependent variable (or label) [1, 2]. In the classical setting, linear regression is used on supervised datasets that are fully and individually labeled. Not all data fit this criterion, so, in recent years, the question of inference from weakly-supervised datasets has drawn attention in the machine learning community [3, 4, 5]. In weakly-supervised datasets, data are neither entirely labeled nor entirely unlabeled; a subset of the data may be labeled, as is the case in semi-supervised learning, or the data may be implicitly labeled, as occurs, for example, in multi-instance learning [6, 7]. Weakly-supervised datasets naturally arise in situations where obtaining labels for individual data is expensive or difficult; often times, it is significantly easier to conduct experiments that provide partial information. In this paper, we study one specific case of weakly-supervised data: shuffled data, in which all of the labels are observed, but the mutual ordering between the input features and the labels is unknown. Shuffled linear regression, then, can be described as a variant of traditional linear regression in which the labels are additionally perturbed by an unknown permutation.

artificial intelligence, estimator, machine learning, (19 more...)

arXiv.org Machine Learning

May-4-2017

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.46)

Industry:
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found