Goto

Collaborating Authors

 diff


SupplementaryMaterials: ImprovingDeepLearning InterpretabilitybySaliencyGuidedTraining

Neural Information Processing Systems

This would be particularly useful for large datasets like imagenet. Table 2 shows the area under accuracydrop curve(AUC) on MNIST Figure 4for gradient when training traditionally,training using saliencyguided procedure andfine-tuning (smaller AUCindicates better performance).


Scale-Invariant Fast Convergence in Games

Tsuchiya, Taira, Luo, Haipeng, Ito, Shinji

arXiv.org Machine Learning

Scale-invariance in games has recently emerged as a widely valued desirable property. Yet, almost all fast convergence guarantees in learning in games require prior knowledge of the utility scale. To address this, we develop learning dynamics that achieve fast convergence while being both scale-free, requiring no prior information about utilities, and scale-invariant, remaining unchanged under positive rescaling of utilities. For two-player zero-sum games, we obtain scale-free and scale-invariant dynamics with external regret bounded by $\tilde{O}(A_{\mathrm{diff}})$, where $A_{\mathrm{diff}}$ is the payoff range, which implies an $\tilde{O}(A_{\mathrm{diff}} / T)$ convergence rate to Nash equilibrium after $T$ rounds. For multiplayer general-sum games with $n$ players and $m$ actions, we obtain scale-free and scale-invariant dynamics with swap regret bounded by $O(U_{\mathrm{max}} \log T)$, where $U_{\mathrm{max}}$ is the range of the utilities, ignoring the dependence on the number of players and actions. This yields an $O(U_{\mathrm{max}} \log T / T)$ convergence rate to correlated equilibrium. Our learning dynamics are based on optimistic follow-the-regularized-leader with an adaptive learning rate that incorporates the squared path length of the opponents' gradient vectors, together with a new stopping-time analysis that exploits negative terms in regret bounds without scale-dependent tuning. For general-sum games, scale-free learning is enabled also by a technique called doubling clipping, which clips observed gradients based on past observations.




ConstrainedOptimizationtoTrainNeuralNetworks onCriticaland Under-RepresentedClasses

Neural Information Processing Systems

Asaconsequence, removing theerror P would reduce theloss more than removing the error N. Moreover, it is clear that this difference in error weighing increases withthelevelofimbalance between theclasses.


A and Model Statistics

Neural Information Processing Systems

We use 9 datasets and pre-trained models provided in Chen et al. (2019b), which can be downloaded Methods on the bottom-left corner are better. For completeness we include verification results (Chen et al., 2019b; Wang et al., 2020) in


CollaborativeCausalDiscovery withAtomicInterventions

Neural Information Processing Systems

Asinterventions areexpensive(require carefully controlled experiments) andperforming multiple interventions is time-consuming, an important goal in causal discovery is to design algorithms that utilize simple (preferably, single variable) and fewer interventions [Shanmugam et al.,2015]. However, when there are latents or unobserved variables in the system, in the worst-case, it is not possible to learn the exact causal DAG without intervening on every variable at least once.




On Non-Linear operators for Geometric Deep Learning

Neural Information Processing Systems

This work studies operators mapping vector and scalar fields defined over a manifold $\mathcal{M}$, and which commute with its group of diffeomorphisms $\text{Diff}(\mathcal{M})$. We prove that in the case of scalar fields $L^p_\omega(\mathcal{M,\mathbb{R}})$, those operators correspond to point-wise non-linearities, recovering and extending known results on $\mathbb{R}^d$. In the context of Neural Networks defined over $\mathcal{M}$, it indicates that point-wise non-linear operators are the only universal family that commutes with any group of symmetries, and justifies their systematic use in combination with dedicated linear operators commuting with specific symmetries. In the case of vector fields $L^p_\omega(\mathcal{M},T\mathcal{M})$, we show that those operators are solely the scalar multiplication. It indicates that $\text{Diff}(\mathcal{M})$ is too rich and that there is no universal class of non-linear operators to motivate the design of Neural Networks over the symmetries of $\mathcal{M}$.