cfr
Solving Large Sequential Games with the Excessive Gap Technique
Christian Kroer, Gabriele Farina, Tuomas Sandholm
There has been tremendous recent progress on equilibrium-finding algorithms for zero-sum imperfect-information extensive-form games, but there has been a puzzling gap between theory and practice. First-order methods have significantly better theoretical convergence rates than any counterfactual-regret minimization (CFR) variant. Despite this, CFR variants have been favored in practice. Experiments with first-order methods have only been conducted on small-and medium-sized games because those methods are complicated to implement in this setting, and because CFR variants have been enhanced extensively for over a decade they perform well in practice. In this paper we show that a particular first-order method, a state-ofthe-art variant of the excessive gap technique--instantiated with the dilated entropy distance function--can efficiently solve large real-world problems competitively with CFR and its variants. We show this on large endgames encountered by the Libratus poker AI, which recently beat top human poker specialist professionals at no-limit Texas hold'em. We show experimental results on our variant of the excessive gap technique as well as a prior version. We introduce a numerically friendly implementation of the smoothed best response computation associated with first-order methods for extensive-form game solving.
Analysis of Bluffing by DQN and CFR in Leduc Hold'em Poker
Zaciragic, Tarik, Plaat, Aske, Batenburg, K. Joost
In the game of poker, being unpredictable, or bluffing, is an essential skill. When humans play poker, they bluff. However, most works on computer-poker focus on performance metrics such as win rates, while bluffing is overlooked. In this paper we study whether two popular algorithms, DQN (based on reinforcement learning) and CFR (based on game theory), exhibit bluffing behavior in Leduc Hold'em, a simplified version of poker. We designed an experiment where we let the DQN and CFR agent play against each other while we log their actions. We find that both DQN and CFR exhibit bluffing behavior, but they do so in different ways. Although both attempt to perform bluffs at different rates, the percentage of successful bluffs (where the opponent folds) is roughly the same. This suggests that bluffing is an essential aspect of the game, not of the algorithm. Future work should look at different bluffing styles and at the full game of poker.
We thank reviewers (R1
Therefore, our paper could have stronger implications than we expect. We disagree with R2 that the tabular form of JPS indeed has theoretical guarantees, as appreciated by other reviewers. Full game AI is a future work. This change leads to very different (and novel) theoretical insights. It leads to policy-change decomposition in Thm. 1, We will add comparisons in the next version.
Confidence-Filtered Relevance (CFR): An Interpretable and Uncertainty-Aware Machine Learning Framework for Naturalness Assessment in Satellite Imagery
Protected natural areas play a vital role in ecological balance and ecosystem services. Monitoring these regions at scale using satellite imagery and machine learning is promising, but current methods often lack interpretability and uncertainty-awareness, and do not address how uncertainty affects naturalness assessment. In contrast, we propose Confidence-Filtered Relevance (CFR), a data-centric framework that combines LRP Attention Rollout with Deep Deterministic Uncertainty (DDU) estimation to analyze how model uncertainty influences the interpretability of relevance heatmaps. CFR partitions the dataset into subsets based on uncertainty thresholds, enabling systematic analysis of how uncertainty shapes the explanations of naturalness in satellite imagery. Applied to the AnthroProtect dataset, CFR assigned higher relevance to shrublands, forests, and wetlands, aligning with other research on naturalness assessment. Moreover, our analysis shows that as uncertainty increases, the interpretability of these relevance heatmaps declines and their entropy grows, indicating less selective and more ambiguous attributions. CFR provides a data-centric approach to assess the relevance of patterns to naturalness in satellite imagery based on their associated certainty.
The Importance of Being Lazy: Scaling Limits of Continual Learning
Graldi, Jacopo, Breccia, Alessandro, Lanzillotta, Giulia, Hofmann, Thomas, Noci, Lorenzo
Despite recent efforts, neural networks still struggle to learn in non-stationary environments, and our understanding of catastrophic forgetting (CF) is far from complete. In this work, we perform a systematic study on the impact of model scale and the degree of feature learning in continual learning. We reconcile existing contradictory observations on scale in the literature, by differentiating between lazy and rich training regimes through a variable parameterization of the architecture. We show that increasing model width is only beneficial when it reduces the amount of feature learning, yielding more laziness. Using the framework of dynamical mean field theory, we then study the infinite width dynamics of the model in the feature learning regime and characterize CF, extending prior theoretical results limited to the lazy regime. We study the intricate relationship between feature learning, task non-stationarity, and forgetting, finding that high feature learning is only beneficial with highly similar tasks. We identify a transition modulated by task similarity where the model exits an effectively lazy regime with low forgetting to enter a rich regime with significant forgetting. Finally, our findings reveal that neural networks achieve optimal performance at a critical level of feature learning, which depends on task non-stationarity and transfers across model scales. This work provides a unified perspective on the role of scale and feature learning in continual learning.
Feature Subset Weighting for Distance-based Supervised Learning through Choquet Integration
Theerens, Adnan, Saeys, Yvan, Cornelis, Chris
This paper introduces feature subset weighting using monotone measures for distance-based supervised learning. The Choquet integral is used to define a distance metric that incorporates these weights. This integration enables the proposed distances to effectively capture non-linear relationships and account for interactions both between conditional and decision attributes and among conditional attributes themselves, resulting in a more flexible distance measure. In particular, we show how this approach ensures that the distances remain unaffected by the addition of duplicate and strongly correlated features. Another key point of this approach is that it makes feature subset weighting computationally feasible, since only $m$ feature subset weights should be calculated each time instead of calculating all feature subset weights ($2^m$), where $m$ is the number of attributes. Next, we also examine how the use of the Choquet integral for measuring similarity leads to a non-equivalent definition of distance. The relationship between distance and similarity is further explored through dual measures. Additionally, symmetric Choquet distances and similarities are proposed, preserving the classical symmetry between similarity and distance. Finally, we introduce a concrete feature subset weighting distance, evaluate its performance in a $k$-nearest neighbors (KNN) classification setting, and compare it against Mahalanobis distances and weighted distance methods.