Goto

Collaborating Authors

 Statistical Learning




Neural Auto-Curricula

Neural Information Processing Systems

When solving two-player zero-sum games, multi-agent reinforcement learning (MARL) algorithms often create populations of agents where, at each iteration, a new agent is discovered as the best response to a mixture over the opponent population. Within such a process, the update rules of "who to compete with" (i.e., the opponent mixture) and "how to beat them" (i.e., finding best responses) are underpinned by manually developed game theoretical principles such as fictitious play and Double Oracle. In this paper1, we introduce a novel framework--Neural Auto-Curricula (NAC)--that leverages meta-gradient descent to automate the discovery of the learning update rule without explicit human design. Specifically, we parameterise the opponent selection module by neural networks and the bestresponse module by optimisation subroutines, and update their parameters solely via interaction with the game engine, where both players aim to minimise their exploitability. Surprisingly, even without human design, the discovered MARL algorithms achieve competitive or even better performance with the state-of-the-art population-based game solvers (e.g., PSRO) on Games of Skill, differentiable Lotto, non-transitive Mixture Games, Iterated Matching Pennies, and Kuhn Poker. Additionally, we show that NAC is able to generalise from small games to large games, for example training on Kuhn Poker and outperforming PSRO on Leduc Poker. Our work inspires a promising future direction to discover general MARL algorithms solely from data.


Appendix: Learning Compact Representations of Neural Networks using DiscriminAtive Masking (DAM) AAnalysis of the DAMGate Function Dynamics During Training

Neural Information Processing Systems

In this section, we theoretically analyze the dynamics of the DAM mask gi at the i-th layer as the training process unfolds. The loss function for training the neural network for the target task can then be denoted as L= L(f(x,ฮ˜,ฮฒi)) (e.g., cross-entropy loss for supervised structured pruning problems and reconstruction error for representation learning problems), where xdenotes the input features to the neural network. Using gradient descent methods with a learning rate of ฮท, the expected update formula of ฮฒi in DAM is given by: ฮฒi = ฮทEx Dtr [ ฮฒiL(f(x,ฮ˜,ฮฒi)) + ฮป ฮฒiฮฒi/(l 1)] (2) = ฮทEx Dtr [ ฮฒiL(f(x,ฮ˜,ฮฒi))] ฮทฮป/(l 1) (3) Let hi be the layer output before applying the DAM mask, and the masked output be represented as oi = hi gi after applying the gate. For the j-th neuron, gij/ ฮฒi = 0 if and only if ฮพj(ฮฒi)/ ฮฒi = 0. Since tanh(z) has non-zero gradients for z >0, the gradient of ฮพj(ฮฒi) is 0 only when kj/ni + ฮฒi 0, i.e., the mask value of the neuron is 0 (or in other words, it is deactivated or dead). Let us denote the set of all neuron indices with non-zero mask values (also referred to as active neurons) as J. Equation 4 can then be simplified as: ฮฒiL(f(x,ฮ˜,ฮฒi)) = ฮฑi X We can make the following two observations: (i) only those neurons that are active (i.e., have non-zero mask values) have a contribution towards updating ฮฒi and moving the gate function. We name these neurons as support neurons and their position in the ordering of neurons as the transitioning zone of the gate function.




Invariance Principle Meets Information Bottleneck for Out-of-Distribution Generalization

Neural Information Processing Systems

The invariance principle from causality is at the heart of notable approaches such as invariant risk minimization (IRM) that seek to address out-of-distribution (OOD) generalization failures. Despite the promising theory, invariance principle-based approaches fail in common classification tasks, where invariant (causal) features capture all the information about the label. Are these failures due to the methods failing to capture the invariance? Or is the invariance principle itself insufficient? To answer these questions, we revisit the fundamental assumptions in linear regression tasks, where invariance-based approaches were shown to provably generalize OOD. In contrast to the linear regression tasks, we show that for linear classification tasks we need much stronger restrictions on the distribution shifts, or otherwise OOD generalization is impossible. Furthermore, even with appropriate restrictions on distribution shifts in place, we show that the invariance principle alone is insufficient. We prove that a form of the information bottleneck constraint along with invariance helps address key failures when invariant features capture all the information about the label and also retains the existing success when they do not. We propose an approach that incorporates both of these principles and demonstrate its effectiveness in several experiments.


Locally private online change point detection

Neural Information Processing Systems

We study online change point detection problems under the constraint of local differential privacy (LDP) where, in particular, the statistician does not have access to the raw data. As a concrete problem, we study a multivariate nonparametric regression problem. At each time point t, the raw data are assumed to be of the form (Xt,Yt), where Xt is a d-dimensional feature vector and Yt is a response variable. Our primary aim is to detect changes in the regression function mt(x) = E(Yt|Xt = x) as soon as the change occurs. We provide algorithms which respect the LDP constraint, which control the false alarm probability, and which detect changes with a minimal (minimax rate-optimal) delay. To quantify the cost of privacy, we also present the optimal rate in the benchmark, non-private setting. These non-private results are also new to the literature and thus are interesting per se. In addition, we study the univariate mean online change point detection problem, under privacy constraints. This serves as the blueprint of studying more complicated private change point detection problems.


Adversarially Robust Learning with Uncertain Perturbation Sets

Neural Information Processing Systems

In many real-world settings exact perturbation sets to be used by an adversary are not plausibly available to a learner. While prior literature has studied both scenarios with completely known and completely unknown perturbation sets, we propose an in-between setting of learning with respect to a class of perturbation sets. We show that in this setting we can improve on previous results with completely unknown perturbation sets, while still addressing the concerns of not having perfect knowledge of these sets in real life. In particular, we give the first positive results for the learnability of infinite Littlestone classes when having access to a perfect-attack oracle. We also consider a setting of learning with abstention, where predictions are considered robustness violations, only when the wrong label prediction is made within the perturbation set. We show there are classes for which perturbation-set unaware learning without query access is possible, but abstention is required.