Learning from Learning Curves
This is a follow-up to my earlier post on learning curves. A learning curve is a plot of predictive error for training and validation sets over a range of training set sizes. Here we're using simulated data to explore some fundamental relationships between training set size, model complexity, and prediction error. The input columns are named X1, X2, etc.; these are all categorical variables with single capital letters representing the different categories. Cardinality is the number of possible values in the column; our default cardinality of 10 means we sample from the capital letters A through J. Next we'll add an outcome variable (y); it has a base level of 100, but if the values in the first two X variables are equal, this is increased by 10.
Mar-31-2016, 02:55:47 GMT
- Technology: