Minimax rate of consistency for linear models with missing values

Ayme, Alexis, Boyer, Claire, Dieuleveut, Aymeric, Scornet, Erwan

Feb-3-2022–arXiv.org Machine Learning

Missing values are more and more present as the size of datasets increases. These missing values can occur for a variety of reasons, such as sensor failures, refusals to answer poll questions, or aggregations of data coming from different sources (with different methods of data collection). There may be different processes of missing value generation on the same dataset, which makes the task of data cleaning difficult or impossible without creating large biases. In his leading work, Rubin [1976] distinguishes three missing values scenarios: Missing Completely At Random (MCAR), Missing At Random (MAR), and Missing Not At Random (MNAR), depending on the links between the observed variables, the missing ones, and the missing pattern. In the linear regression framework, most of the literature focuses on parameter estimation [Little, 1992, Jones, 1996], using sometimes a sparse prior leading to the Lasso estimator [Loh and Wainwright, 2012] or the Dantzig selector [Rosenbaum and Tsybakov, 2010]. Note that the robust estimation literature [Dalalyan and Thompson, 2019, Chen and Caramanis, 2013] could be also used to handle missing values, as the latter can be reinterpreted as a multiplicative noise in linear models.

assumption, predictor, theorem 3, (16 more...)

arXiv.org Machine Learning

Feb-3-2022

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - California (0.04)
  - Georgia > Fulton County
    - Atlanta (0.04)
- Europe
  - Denmark (0.04)
  - United Kingdom > England
    - Oxfordshire > Oxford (0.04)
  - France > Île-de-France
    - Paris > Paris (0.04)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)