Model Selection Techniques -- An Overview

Ding, Jie, Tarokh, Vahid, Yang, Yuhong

arXiv.org Machine Learning 

Abstract--In the era of "big data", analysts usually explore various statistical models or machine learning methods for observed data in order to facilitate scientific discoveries or gain predictive power. Whatever data and fitting procedures are employed, a crucial step is to select the most appropriate model or method from a set of candidates. Model selection is a key ingredient in data analysis for reliable and reproducible statistical inference or prediction, and thus central to scientific studies in fields such as ecology, economics, engineering, finance, political science, biology, and epidemiology. There has been a long history of model selection techniques that arise from researches in statistics, information theory, and signal processing. A considerable number of methods have been proposed, following different philosophies and exhibiting varying performances. The purpose of this article is to bring a comprehensive overview of them, in terms of their motivation, large sample performance, and applicability. We provide integrated and practically relevant discussions on theoretical properties of state-ofthe-art model selection approaches. We also share our thoughts on some controversial views on the practice of model selection. Vast development in hardware storage, precision instrument manufacture, economic globalization, etc. have generated huge volumes of data that can be analyzed to extract useful information. Typical statistical inference or machine learning procedures learn from and make predictions on data by fitting parametric or nonparametric models (in a broad sense). However, there exists no model that is universally suitable for any data and goal. This research was funded in part by the Defense Advanced Research Projects Agency (DARPA) under grant number W911NF-18-1-0134. J. Ding and Y. Yang are with the School of Statistics, University of Minnesota, Minneapolis, Minnesota 55455, United States. V. Tarokh is with the Department of Electrical and Computer Engineering, Duke University, Durham, North Carolina 27708, United States. Therefore, a crucial step in a typical data analysis is to consider a set of candidate models (referred to as the model class), and then select the most appropriate one. In other words, model selection is the task of selecting a statistical model from a model class, given a set of data. There have been many overview papers on model selection scattered in the communities of signal processing [1], statistics [2], machine learning [3], epidemiology [4], chemometrics [5], ecology and evolution [6]. Despite the abundant literature on model selection, existing overviews usually focus on derivations, descriptions, or applications of particular model selection principles.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found