Goto

Collaborating Authors

 pitfall


We thank reviewers for positive feedback, mentioning DTSIL as an effective novel method (R2,3,4) for a significant

Neural Information Processing Systems

We will incorporate the suggestions. More details were provided in Appendix B.1, especially We will add these pointers and more descriptions in main text to clarify our algorithm. We will make the connection between DTSIL and prior works more clear, especially for imitation learning part. Pseudocode for organizing clusters was in Appendix A.3. DTSIL+EXP without SL performs worse on Montezuma's Revenge Assume agent's location in state embeddings is normalized to We will add this comparison and more discussions about off-policy and model-based exploration methods.


Adversarial Training for Graph Neural Networks: Pitfalls, Solutions, and New Directions

Neural Information Processing Systems

Despite its success in the image domain, adversarial training did not (yet) stand out as an effective defense for Graph Neural Networks (GNNs) against graph structure perturbations. In the pursuit of fixing adversarial training (1) we show and overcome fundamental theoretical as well as practical limitations of the adopted graph learning setting in prior work; (2) we reveal that flexible GNNs based on learnable graph diffusion are able to adjust to adversarial perturbations, while the learned message passing scheme is naturally interpretable; (3) we introduce the first attack for structure perturbations that, while targeting multiple nodes at once, is capable of handling global (graph-level) as well as local (node-level) constraints. Including these contributions, we demonstrate that adversarial training is a state-of-the-art defense against adversarial structure perturbations.


Pitfall of Optimism: Distributional Reinforcement Learning by Randomizing Risk Criterion

Neural Information Processing Systems

Distributional reinforcement learning algorithms have attempted to utilize estimated uncertainty for exploration, such as optimism in the face of uncertainty. However, using the estimated variance for optimistic exploration may cause biased data collection and hinder convergence or performance. In this paper, we present a novel distributional reinforcement learning that selects actions by randomizing risk criterion without losing the risk-neutral objective. We provide a perturbed distributional Bellman optimality operator by distorting the risk measure. Also,we prove the convergence and optimality of the proposed method with the weaker contraction property. Our theoretical results support that the proposed method does not fall into biased exploration and is guaranteed to converge to an optimal return. Finally, we empirically show that our method outperforms other existing distribution-based algorithms in various environments including Atari 55 games.


The Pitfalls of Regularization in Off-Policy TD Learning

Neural Information Processing Systems

Temporal Difference (TD) learning is ubiquitous in reinforcement learning, where it is often combined with off-policy sampling and function approximation. Unfortunately learning with this combination (known as the deadly triad), exhibits instability and unbounded error. To account for this, modern Reinforcement Learning methods often implicitly (or sometimes explicitly) assume that regularization is sufficient to mitigate the problem in practice; indeed, the standard deadly triad examples from the literature can be ``fixed'' via proper regularization. In this paper, we introduce a series of new counterexamples to show that the instability and unbounded error of TD methods is not solved by regularization. We demonstrate that, in the off-policy setting with linear function approximation, TD methods can fail to learn a non-trivial value function under any amount of regularization; we further show that regularization can induce divergence under common conditions; and we show that one of the most promising methods to mitigate this divergence (Emphatic TD algorithms) may also diverge under regularization. We further demonstrate such divergence when using neural networks as function approximators. Thus, we argue that the role of regularization in TD methods needs to be reconsidered, given that it is insufficient to prevent divergence and may itself introduce instability. There needs to be much more care in the practical and theoretical application of regularization to Reinforcement Learning methods.


Understanding and Addressing the Pitfalls of Bisimulation-based Representations in Offline Reinforcement Learning

Neural Information Processing Systems

While bisimulation-based approaches hold promise for learning robust state representations for Reinforcement Learning (RL) tasks, their efficacy in offline RL tasks has not been up to par. In some instances, their performance has even significantly underperformed alternative methods. We aim to understand why bisimulation methods succeed in online settings, but falter in offline tasks. Our analysis reveals that missing transitions in the dataset are particularly harmful to the bisimulation principle, leading to ineffective estimation. We also shed light on the critical role of reward scaling in bounding the scale of bisimulation measurements and of the value error they induce. Based on these findings, we propose to apply the expectile operator for representation learning to our offline RL setting, which helps to prevent overfitting to incomplete data. Meanwhile, by introducing an appropriate reward scaling strategy, we avoid the risk of feature collapse in representation space. We implement these recommendations on two state-of-the-art bisimulation-based algorithms, MICo and SimSR, and demonstrate performance gains on two benchmark suites: D4RL and Visual D4RL.


Pitfalls of Epistemic Uncertainty Quantification through Loss Minimisation

Neural Information Processing Systems

Uncertainty quantification has received increasing attention in machine learning in the recent past. In particular, a distinction between aleatoric and epistemic uncertainty has been found useful in this regard. The latter refers to the learner's (lack of) knowledge and appears to be especially difficult to measure and quantify. In this paper, we analyse a recent proposal based on the idea of a second-order learner, which yields predictions in the form of distributions over probability distributions. While standard (first-order) learners can be trained to predict accurate probabilities, namely by minimising suitable loss functions on sample data, we show that loss minimisation does not work for second-order predictors: The loss functions proposed for inducing such predictors do not incentivise the learner to represent its epistemic uncertainty in a faithful way.


Navigating the Pitfalls of Active Learning Evaluation: A Systematic Framework for Meaningful Performance Assessment

Neural Information Processing Systems

Active Learning (AL) aims to reduce the labeling burden by interactively selecting the most informative samples from a pool of unlabeled data. While there has been extensive research on improving AL query methods in recent years, some studies have questioned the effectiveness of AL compared to emerging paradigms such as semi-supervised (Semi-SL) and self-supervised learning (Self-SL), or a simple optimization of classifier configurations. Thus, today's AL literature presents an inconsistent and contradictory landscape, leaving practitioners uncertain about whether and how to use AL in their tasks. In this work, we make the case that this inconsistency arises from a lack of systematic and realistic evaluation of AL methods. Specifically, we identify five key pitfalls in the current literature that reflect the delicate considerations required for AL evaluation. Further, we present an evaluation framework that overcomes these pitfalls and thus enables meaningful statements about the performance of AL methods. To demonstrate the relevance of our protocol, we present a large-scale empirical study and benchmark for image classification spanning various data sets, query methods, AL settings, and training paradigms. Our findings clarify the inconsistent picture in the literature and enable us to give hands-on recommendations for practitioners.


The Pitfalls of Simplicity Bias in Neural Networks

Neural Information Processing Systems

Several works have proposed Simplicity Bias (SB)---the tendency of standard training procedures such as Stochastic Gradient Descent (SGD) to find simple models---to justify why neural networks generalize well [Arpit et al. 2017, Nakkiran et al. 2019, Valle-Perez et al. 2019]. However, the precise notion of simplicity remains vague. Furthermore, previous settings [Soudry et al. 2018, Gunasekar et al. 2018] that use SB to theoretically justify why neural networks generalize well do not simultaneously capture the non-robustness of neural networks---a widely observed phenomenon in practice [Goodfellow et al. 2014, Jo and Bengio 2017]. We attempt to reconcile SB and the superior standard generalization of neural networks with the non-robustness observed in practice by introducing piecewise-linear and image-based datasets, which (a) incorporate a precise notion of simplicity, (b) comprise multiple predictive features with varying levels of simplicity, and (c) capture the non-robustness of neural networks trained on real data. Using theory and empirics on these datasets, we make four observations: (i) SB of SGD and variants can be extreme: neural networks can exclusively rely on the simplest feature and remain invariant to all predictive complex features.


HybSpecNet: A Critical Analysis of Architectural Instability in Hybrid-Domain Spectral GNNs

Goksu, Huseyin

arXiv.org Artificial Intelligence

Spectral Graph Neural Networks offer a principled approach to graph filtering but face a fundamental "Stability-vs-Adaptivity" trade-off. This trade-off is dictated by the choice of spectral domain. Filters in the finite [-1, 1] domain (e.g., ChebyNet) are numerically stable at high polynomial degrees (K) but are static and low-pass, causing them to fail on heterophilic graphs. Conversely, filters in the semi-infinite [0, infty) domain (e.g., KrawtchoukNet) are highly adaptive and achieve SOTA results on heterophily by learning non-low-pass responses. However, as we demonstrate, these adaptive filters can also suffer from numerical instability, leading to catastrophic performance collapse at high K. In this paper, we propose to resolve this trade-off by designing a hybrid-domain GNN, HybSpecNet, which combines a stable `ChebyNet` branch with an adaptive `KrawtchoukNet` branch. We first demonstrate that a "naive" hybrid architecture, which fuses the branches via concatenation, successfully unifies performance at low K, achieving strong results on both homophilic and heterophilic benchmarks. However, we then prove that this naive architecture fails the stability test. Our K-ablation experiments show that this architecture catastrophically collapses at K=25, exactly mirroring the collapse of its unstable `KrawtchoukNet` branch. We identify this critical finding as "Instability Poisoning," where `NaN`/`Inf` gradients from the adaptive branch destroy the training of the model. Finally, we propose and validate an advanced architecture that uses "Late Fusion" to completely isolate the gradient pathways. We demonstrate that this successfully solves the instability problem, remaining perfectly stable up to K=30 while retaining its SOTA performance across all graph types. This work identifies a critical architectural pitfall in hybrid GNN design and provides the robust architectural solution.


Gini-based Model Monitoring: A General Framework with an Application to Non-life Insurance Pricing

Brauer, Alexej, Menzel, Paul

arXiv.org Machine Learning

In a dynamic landscape where portfolios and environments evolve, maintaining the accuracy of pricing models is critical. To the best of our knowledge, this is the first study to systematically examine concept drift in non-life insurance pricing. We (i) provide an overview of the relevant literature and commonly used methodologies, clarify the distinction between virtual drift and concept drift, and explain their implications for long-run model performance; (ii) review and formalize common performance measures, including the Gini index and deviance loss, and articulate their interpretation; (iii) derive the asymptotic distribution of the Gini index, enabling valid inference and hypothesis testing; and (iv) present a standardized monitoring procedure that indicates when refitting is warranted. We illustrate the framework using a modified real-world portfolio with induced concept drift and discuss practical considerations and pitfalls.