Goto

Collaborating Authors

 probability estimate


Are Foundation Models Useful for Bankruptcy Prediction?

Kostrzewa, Marcin, Furman, Oleksii, Furman, Roman, Tomczak, Sebastian, Zięba, Maciej

arXiv.org Artificial Intelligence

Foundation models have shown promise across various financial applications, yet their effectiveness for corporate bankruptcy prediction remains systematically unevaluated against established methods. We study bankruptcy forecasting using Llama-3.3-70B-Instruct and TabPFN, evaluated on large, highly imbalanced datasets of over one million company records from the Visegrád Group. We provide the first systematic comparison of foundation models against classical machine learning baselines for this task. Our results show that models such as XGBoost and CatBoost consistently outperform foundation models across all prediction horizons. LLM-based approaches suffer from unreliable probability estimates, undermining their use in risk-sensitive financial settings. TabPFN, while competitive with simpler baselines, requires substantial computational resources with costs not justified by performance gains. These findings suggest that, despite their generality, current foundation models remain less effective than specialized methods for bankruptcy forecasting.



shown for an "improved " version of Tanh(16) model which uses more convolutional filters per layer (32 instead of 25

Neural Information Processing Systems

We chose "Distributionally Adversarial Attack" (DAA) by Zheng as it appears atop the leaderboards for Results for the original Tanh(16) model are shown in Table 1. These changes improve robustness of the model. The term "white-box" means that the adversary knows everything about the model, i.e. full ensemble with all layers Intuition indeed suggests this code should be more powerful. In general, though, we find robustness asymptotes with increasing code length; this appears related to the "rank" of the However, adversarial (and'Random" inputs) are (way) off this manifold. We don't aim to achieve carefully calibrated probability estimates; we Our results in Figures 3(a)-(c) indicate our probability estimates are still far better than those of conventional models. Some corrective action is needed (such as Platt scaling). By contrast, our approach appears well-behaved even off the training manifold. Compared to typical architectures, ours does not induce widening of the network. Reviewer's point is still well-taken, and in the revision we will reword this section to consider We were unclear about meaning of Reviewer's comment on "the relationship between adversarial constraints at Table 1: Accuracies against various attacks; "-": experiment was not run.



Split Conformal Classification with Unsupervised Calibration

Mazuelas, Santiago

arXiv.org Machine Learning

Methods for split conformal prediction leverage calibration samples to transform any prediction rule into a set-prediction rule that complies with a target coverage probability. Existing methods provide remarkably strong performance guarantees with minimal computational costs. However, they require to use calibration samples composed by labeled examples different to those used for training. This requirement can be highly inconvenient, as it prevents the use of all labeled examples for training and may require acquiring additional labels solely for calibration. This paper presents an effective methodology for split conformal prediction with unsupervised calibration for classification tasks. In the proposed approach, set-prediction rules are obtained using unsupervised calibration samples together with supervised training samples previously used to learn the classification rule. Theoretical and experimental results show that the presented methods can achieve performance comparable to that with supervised calibration, at the expenses of a moderate degradation in performance guarantees and computational efficiency.





Classification Filtering

Bayram, Ilker

arXiv.org Artificial Intelligence

We consider a streaming signal in which each sample is linked to a latent class. We assume that multiple classifiers are available, each providing class probabilities with varying degrees of accuracy. These classifiers are employed following a straightforward and fixed policy. In this setting, we consider the problem of fusing the output of the classifiers while incorporating the temporal aspect to improve classification accuracy. We propose a state-space model and develop a filter tailored for realtime execution. We demonstrate the effectiveness of the proposed filter in an activity classification application based on inertial measurement unit (IMU) data from a wearable device.


Calibrated and uncertain? Evaluating uncertainty estimates in binary classification models

Grefsrud, Aurora, Blaser, Nello, Buanes, Trygve

arXiv.org Machine Learning

Rigorous statistical methods, including parameter estimation with accompanying uncertainties, underpin the validity of scientific discovery, especially in the natural sciences. With increasingly complex data models such as deep learning techniques, uncertainty quantification has become exceedingly difficult and a plethora of techniques have been proposed. In this case study, we use the unifying framework of approximate Bayesian inference combined with empirical tests on carefully created synthetic classification datasets to investigate qualitative properties of six different probabilistic machine learning algorithms for class probability and uncertainty estimation: (i) a neural network ensemble, (ii) neural network ensemble with conflictual loss, (iii) evidential deep learning, (iv) a single neural network with Monte Carlo Dropout, (v) Gaussian process classification and (vi) a Dirichlet process mixture model. We check if the algorithms produce uncertainty estimates which reflect commonly desired properties, such as being well calibrated and exhibiting an increase in uncertainty for out-of-distribution data points. Our results indicate that all algorithms are well calibrated, but none of the deep learning based algorithms provide uncertainties that consistently reflect lack of experimental evidence for out-of-distribution data points. We hope our study may serve as a clarifying example for researchers developing new methods of uncertainty estimation for scientific data-driven modeling.