Goto

Collaborating Authors

 predictive accuracy


Beyond Coefficients: Forecast-Necessity Testing for Interpretable Causal Discovery in Nonlinear Time-Series Models

arXiv.org Machine Learning

Nonlinear machine-learning models are increasingly used to discover causal relationships in time-series data, yet the interpretation of their outputs remains poorly understood. In particular, causal scores produced by regularized neural autoregressive models are often treated as analogues of regression coefficients, leading to misleading claims of statistical significance. In this paper, we argue that causal relevance in nonlinear time-series models should be evaluated through forecast necessity rather than coefficient magnitude, and we present a practical evaluation procedure for doing so. We present an interpretable evaluation framework based on systematic edge ablation and forecast comparison, which tests whether a candidate causal relationship is required for accurate prediction. Using Neural Additive Vector Autoregression as a case study model, we apply this framework to a real-world case study of democratic development, modeled as a multivariate time series of panel data - democracy indicators across 139 countries. We show that relationships with similar causal scores can differ dramatically in their predictive necessity due to redundancy, temporal persistence, and regime-specific effects. Our results demonstrate how forecast-necessity testing supports more reliable causal reasoning in applied AI systems and provides practical guidance for interpreting nonlinear time-series models in high-stakes domains.


Improved Bayes Risk Can Yield Reduced Social Welfare Under Competition

Neural Information Processing Systems

As the scale of machine learning models increases, trends such as scaling laws anticipate consistent downstream improvements in predictive accuracy. However, these trends take the perspective of a single model-provider in isolation, while in reality providers often compete with each other for users. In this work, we demonstrate that competition can fundamentally alter the behavior of these scaling trends, even causing overall predictive accuracy across users to be non-monotonic or decreasing with scale. We define a model of competition for classification tasks, and use data representations as a lens for studying the impact of increases in scale. We find many settings where improving data representation quality (as measured by Bayes risk) decreases the overall predictive accuracy across users (i.e., social welfare) for a marketplace of competing model-providers. Our examples range from closed-form formulas in simple settings to simulations with pretrained representations on CIFAR-10. At a conceptual level, our work suggests that favorable scaling trends for individual model-providers need not translate to downstream improvements in social welfare in marketplaces with multiple model providers.


A Divergence-Based Method for Weighting and Averaging Model Predictions

arXiv.org Machine Learning

This paper uses a minimum divergence framework to introduce a new way of calculating model weights that can be used to average probabilistic predictions from statistical and machine learning models. The method is general and can be applied regardless of whether the models under consideration are fit to data using frequentist, Bayesian, or some other fitting method. The proposed method is motivated in two different ways and is shown empirically to perform better than or on a par with standard model averaging methods, including model stacking and model averaging that relies on Akaike-style negative exponentiated model weighting, especially when the sample size is small. Our theoretical analysis explains why the method has a small-sample advantage.


Appendix

Neural Information Processing Systems

AAbout Equation (1) As we discussed in Section 3, label smoothing and focal loss are equivalent to the standard CE loss with an additional maximum-entropy regularizer (see in Equation (1) and (2) in the main text). The proof of Equation (2) can be found in the corresponding paper [4]. SVHN is an image dataset which consists of 32 32 colored images of 0 9 digits. CIFAR-10 and CIFAR-100 consist of 32 32 colored natural images arranged in 10 and 100 classes, respectively. For 20Newsgroups, we use the GloVe word embedding [7] for text representation before the 1D-CNN model and set the embedding dimension as 100.