Blocked Clusterwise Regression

Cytrynbaum, Max

arXiv.org Machine Learning 

Such models have been shown to allow estimation and inference by regression clustering methods. This paper is motivated by the finding that the clustered heterogeneity models studied in this literature can be badly misspecified, even when the panel has significant discrete cross-sectional structure. To address this issue, we generalize previous approaches to discrete unobserved heterogeneity by allowing each unit to have multiple, imperfectly-correlated latent variables that describe its response-type to different covariates. We give inference results for a k-means style estimator of our model and develop information criteria to jointly select the number clusters for each latent variable. Monte Carlo simulations confirm our theoretical results and give intuition about the finite-sample performance of estimation and model selection. We also contribute to the theory of clustering with an over-specified number of clusters and derive new convergence rates for this setting. Our results suggest that over-fitting can be severe in k-means style estimators when the number of clusters is over-specified.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found