Scatter Matrix Concordance: A Diagnostic for Regressions on Subsets of Data
Kane, Michael J., Lewis, Bryan, Tatikonda, Sekhar, Urbanek, Simon
Linear regression models depend directly on the design matrix and its properties. Techniques that efficiently estimate model coefficients by partitioning rows of the design matrix are increasingly popular for large-scale problems because they fit well with modern parallel computing architectures. We propose a simple measure of {\em concordance} between a design matrix and a subset of its rows that estimates how well a subset captures the variance-covariance structure of a larger data set. We illustrate the use of this measure in a heuristic method for selecting row partition sizes that balance statistical and computational efficiency goals in real-world problems.
Jul-12-2015
- Country:
- North America > United States (0.47)
- Genre:
- Research Report (0.64)
- Industry:
- Consumer Products & Services > Travel (0.46)
- Transportation > Passenger (0.46)
- Technology: