Goto

Collaborating Authors

 variance


Data Quality in Imitation Learning

Neural Information Processing Systems

In supervised learning, the question of data quality and curation has been overshadowed in recent years by increasingly more powerful and expressive models that can ingest internet-scale data.


Debiasing Conditional Stochastic Optimization Lie He

Neural Information Processing Systems

The sample-averaged gradient of the CSO objective is biased due to its nested structure, and therefore requires a high sample complexity for convergence. We introduce a general stochastic extrapolation technique that effectively reduces the bias.




Supplementary Material 1 Decoding using automatic differentiation inference ADVI

Neural Information Processing Systems

In the method section of our paper, we describe the general encoding-decoding paradigm. We provide a brief overview of our data preprocessing pipeline, which involves the following steps. We employ the method of Boussard et al. (2021) to estimate the location of Decentralized registration (Windolf et al., 2022) is applied to track and correct Figure 6: Motion drift in "good" and "bad" sorting recordings. "bad" sorting example, which is still affected by drift even after registration. To decode binary behaviors, such as the mouse's left or right choices, we utilize In this section, we provide visualizations to gain insights into the effectiveness of our proposed decoder.



The Best of Both Worlds in Network Population Games: Reaching Consensus & Convergence to Equilibrium

Neural Information Processing Systems

Reaching consensus and convergence to equilibrium are two major challenges of multi-agent systems. Although each has attracted significant attention, relatively few studies address both challenges at the same time. This paper examines the connection between the notions of consensus and equilibrium in a multi-agent system where multiple interacting sub-populations coexist. We argue that consensus can be seen as an intricate component of intra-population stability, whereas equilibrium can be seen as encoding inter-population stability. We show that smooth fictitious play, a well-known learning model in game theory, can achieve both consensus and convergence to equilibrium in diverse multi-agent settings. Moreover, we show that the consensus formation process plays a crucial role in the seminal thorny problem of equilibrium selection in multi-agent learning.


Doubly Robust Augmented Transfer for Meta-Reinforcement Learning

Anonymous Authors

Neural Information Processing Systems

RL problems through the idea of "learning to learn". Current meta-RL methods can be classified in to two categories. These methods mainly differ in their ways of inference [3, 4, 20]. The other line follows the technique of relabeling that enables sample reuse across tasks, i.e., learning a task Packer et al. apply hindsight relabeling for meta-RL, and propose hindsight task relabeling (HTR) to relabel the trajectories Taking a step further than hindsight relabelling, Wan et al. introduce additionally foresight Huang et al. derive a general form of policy gradient from DR value estimator [29], whereas a DR off-policy actor-critic Kallus et al. propose the doubly robust method to find a robust policy that can Depending on the knowledge to be transferred, these methods in RL can be roughly divided into classes including sampled transitions [32, 33], learned policies or value networks [34, 35, 36, 37], features [38, 39, 40], and skills [41, 42]. Doubly Robust Property for Direct Use of Doubly Robust Estimator We show the doubly robust property of the DR estimator for value function in Eq. (5) in the main text, as follows.



SANFlow: Semantic-Aware Normalizing Flow for Anomaly Detection and Localization

Neural Information Processing Systems

However, previous NF-based methods forcibly transform the distribution of all features into a single distribution (e.g., unit normal distribution), even when the features can have locally distinct semantic information and thus follow different