Goto

Collaborating Authors

 percentile



Appendix AAnalysis of variance of uncertainty estimators

Neural Information Processing Systems

We list the raw data sources used across all experiments in Table 17: the MNIST dataset (Creative Commons Attribution-Share Alike 3.0 license), the arithmetic expressions dataset from Kusner et al. [4], and the ZINC data (see also https://zinc.docking.org/)


Tools for Verifying Neural Models ' Training Data

Neural Information Processing Systems

It is important that consumers and regulators can verify the provenance of large neural models to evaluate their capabilities and risks. We introduce the concept of a "Proof-of-Training-Data": any protocol that allows a model trainer to convince a Verifier of the training data that produced a set of model weights. Such protocols could verify the amount and kind of data and compute used to train the model, including whether it was trained on specific harmful or beneficial data sources. We explore efficient verification strategies for Proof-of-Training-Data that are compatible with most current large-model training procedures. These include a method for the model-trainer to verifiably pre-commit to a random seed used in training, and a method that exploits models' tendency to temporarily overfit to training data in order to detect whether a given data-point was included in training. We show experimentally that our verification procedures can catch a wide variety of attacks, including all known attacks from the Proof-of-Learning literature.



Quantum Amplitude Estimation for Catastrophe Insurance Tail-Risk Pricing: Empirical Convergence and NISQ Noise Analysis

arXiv.org Machine Learning

Classical Monte Carlo methods for pricing catastrophe insurance tail risk converge at order reciprocal root N, requiring large simulation budgets to resolve upper-tail percentiles of the loss distribution. This sample-sparsity problem can lead to AI models trained on impoverished tail data, producing poorly calibrated risk estimates where insolvency risk is greatest. Quantum Amplitude Estimation (QAE), following Montanaro, achieves convergence approaching order reciprocal N in oracle queries - a quadratic speedup that, at scale, would enable high-resolution tail estimation within practical budgets. We validate this advantage empirically using a Qiskit Aer simulator with genuine Grover amplification. A complete pipeline encodes fitted lognormal catastrophe distributions into quantum oracles via amplitude encoding, producing small readout probabilities that enable safe Grover amplification with up to k=16 iterations. Seven experiments on synthetic and real (NOAA Storm Events, 58,028 records) data yield three main findings: an oracle-model advantage, that strong classical baselines win when analytical access is available, and that discretisation, not estimation, is the current bottleneck.