A Coefficient of Determination for Probabilistic Topic Models
--This research proposes a new (old) metric for evaluating goodness of fit in topic models, the coefficient of determination, or R 2 . Within the context of topic modeling, R 2 has the same interpretation that it does when used in a broader class of statistical models. Reporting R 2 with topic models addresses two current problems in topic modeling: a lack of standard cross-contextual evaluation metrics for topic modeling and ease of communication with lay audiences. The author proposes that R 2 should be reported as a standard metric when constructing topic models. I NTRODUCTION According to an often-quoted but never cited definition, "the goodness of fit of a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question." 1 Goodness of fit measures vary with the goals of those constructing the statistical model. Inferential goals may emphasize in-sample fit while predictive goals may emphasize out-of-sample fit. Prior information may be included in the goodness of fit measure for Bayesian models, or it may not. Goodness of fit measures may include methods to correct for model overfitting. In short, goodness of fit measures the performance of a statistical model against the ground truth of observed data. Fitting the data well is generally a necessary--though not sufficient--condition for trust in a statistical model, whatever its goals. Of course, goodness of fit is only one concern in statistical modeling.
Nov-25-2019
- Country:
- Asia > Middle East
- Jordan (0.05)
- North America > United States
- California (0.04)
- Illinois > Cook County
- Chicago (0.04)
- South America > Paraguay
- Asia > Middle East
- Genre:
- Research Report (0.64)
- Industry: