Goto

Collaborating Authors

 minimisation



Pitfalls of Epistemic Uncertainty Quantification through Loss Minimisation

Neural Information Processing Systems

Uncertainty quantification has received increasing attention in machine learning in the recent past. In particular, a distinction between aleatoric and epistemic uncertainty has been found useful in this regard. The latter refers to the learner's (lack of) knowledge and appears to be especially difficult to measure and quantify. In this paper, we analyse a recent proposal based on the idea of a second-order learner, which yields predictions in the form of distributions over probability distributions. While standard (first-order) learners can be trained to predict accurate probabilities, namely by minimising suitable loss functions on sample data, we show that loss minimisation does not work for second-order predictors: The loss functions proposed for inducing such predictors do not incentivise the learner to represent its epistemic uncertainty in a faithful way.


minimisation was previously successful but has yet to be combined with modern feature learning techniques, because 4

Neural Information Processing Systems

We thank the reviewers for their extensive comments. Where is the novelty (R2+R4) / What is the point of the new proofs (R2)? However, our primary result is to show why it works. Newton's method with a more stable trust-region based method gave rise to a more stable fixed-point (line 131), and Given this, partial derivatives and full derivatives coincide. 'Wiberg optimisation is alternation (see [4]), and an inappropriate description for our work' (R6).


questions raised by each reviewer separately

Neural Information Processing Systems

We thank the reviewers for their close reading, detailed comments, and overall positive assessment. We will improve the flow and formatting of the paper, and fix the references in the final version. As we can see, ADE consistently achieves comparable or the best performance. We are exploring alternative sampling algorithm embeddings, e.g., ADE limitations and how to overcome. See Appendix C for details. ADE, then the parameter tuning requirements for ADE and GANs are comparable, i.e., we tune the inner optimization Re: "[the authors] further conduct T vanilla HMC steps to approximately solve it."



Pitfalls of Epistemic Uncertainty Quantification through Loss Minimisation

Neural Information Processing Systems

Uncertainty quantification has received increasing attention in machine learning in the recent past. In particular, a distinction between aleatoric and epistemic uncertainty has been found useful in this regard. The latter refers to the learner's (lack of) knowledge and appears to be especially difficult to measure and quantify. In this paper, we analyse a recent proposal based on the idea of a second-order learner, which yields predictions in the form of distributions over probability distributions. While standard (first-order) learners can be trained to predict accurate probabilities, namely by minimising suitable loss functions on sample data, we show that loss minimisation does not work for second-order predictors: The loss functions proposed for inducing such predictors do not incentivise the learner to represent its epistemic uncertainty in a faithful way.


Prevention of Overfitting on Mesh-Structured Data Regressions with a Modified Laplace Operator

Bigarella, Enda D. V.

arXiv.org Artificial Intelligence

This document reports on a method for detecting and preventing overfitting on data regressions, herein applied to mesh-like data structures. The mesh structure allows for the straightforward computation of the Laplace-operator second-order derivatives in a finite-difference fashion for noiseless data. Derivatives of the training data are computed on the original training mesh to serve as a true label of the entropy of the training data. Derivatives of the trained data are computed on a staggered mesh to identify oscillations in the interior of the original training mesh cells. The loss of the Laplace-operator derivatives is used for hyperparameter optimisation, achieving a reduction of unwanted oscillation through the minimisation of the entropy of the trained model. In this setup, testing does not require the splitting of points from the training data, and training is thus directly performed on all available training points. The Laplace operator applied to the trained data on a staggered mesh serves as a surrogate testing metric based on diffusion properties.


Learning with Symmetric Label Noise: The Importance of Being Unhinged

Neural Information Processing Systems

Convex potential minimisation is the de facto approach to binary classification. However, Long and Servedio [2008] proved that under symmetric label noise (SLN), minimisation of any convex potential over a linear function class can result in classification performance equivalent to random guessing. This ostensibly shows that convex losses are not SLN-robust. In this paper, we propose a convex, classification-calibrated loss and prove that it is SLN-robust. The loss avoids the Long and Servedio [2008] result by virtue of being negatively unbounded.


Minimisation of Polyak-\L{}ojasewicz Functions Using Random Zeroth-Order Oracles

Farzin, Amir Ali, Shames, Iman

arXiv.org Artificial Intelligence

The application of a zeroth-order scheme for minimising Polyak-\L{}ojasewicz (PL) functions is considered. The framework is based on exploiting a random oracle to estimate the function gradient. The convergence of the algorithm to a global minimum in the unconstrained case and to a neighbourhood of the global minimum in the constrained case along with their corresponding complexity bounds are presented. The theoretical results are demonstrated via numerical examples.


Promoting Counterfactual Robustness through Diversity

Leofante, Francesco, Potyka, Nico

arXiv.org Artificial Intelligence

Counterfactual explanations shed light on the decisions of black-box models by explaining how an input can be altered to obtain a favourable decision from the model (e.g., when a loan application has been rejected). However, as noted recently, counterfactual explainers may lack robustness in the sense that a minor change in the input can cause a major change in the explanation. This can cause confusion on the user side and open the door for adversarial attacks. In this paper, we study some sources of non-robustness. While there are fundamental reasons for why an explainer that returns a single counterfactual cannot be robust in all instances, we show that some interesting robustness guarantees can be given by reporting multiple rather than a single counterfactual. Unfortunately, the number of counterfactuals that need to be reported for the theoretical guarantees to hold can be prohibitively large. We therefore propose an approximation algorithm that uses a diversity criterion to select a feasible number of most relevant explanations and study its robustness empirically. Our experiments indicate that our method improves the state-of-the-art in generating robust explanations, while maintaining other desirable properties and providing competitive computational performance.