Plotting


Local Curvature Smoothing with Stein's Identity for Efficient Score Matching

Neural Information Processing Systems

The training of score-based diffusion models (SDMs) is based on score matching. The challenge of score matching is that it includes a computationally expensive Jacobian trace. While several methods have been proposed to avoid this computation, each has drawbacks, such as instability during training and approximating the learning as learning a denoising vector field rather than a true score. We propose a novel score matching variant, local curvature smoothing with Stein's identity (LCSS). The LCSS bypasses the Jacobian trace by applying Stein's identity, enabling regularization effectiveness and efficient computation. We show that LCSS surpasses existing methods in sample generation performance and matches the performance of denoising score matching, widely adopted by most SDMs, in evaluations such as FID, Inception score, and bits per dimension. Furthermore, we show that LCSS enables realistic image generation even at a high resolution of 1024 1024.


Interaction-Grounded Learning with Action-Inclusive Feedback

Neural Information Processing Systems

Consider the problem setting of Interaction-Grounded Learning (IGL), in which a learner's goal is to optimally interact with the environment with no explicit reward to ground its policies. The agent observes a context vector, takes an action, and receives a feedback vector--using this information to effectively optimize a policy with respect to a latent reward function. Prior analyzed approaches fail when the feedback vector contains the action, which significantly limits IGL's success in many potential scenarios such as Brain-computer interface (BCI) or Humancomputer interface (HCI) applications. We address this by creating an algorithm and analysis which allows IGL to work even when the feedback vector contains the action, encoded in any fashion. We provide theoretical guarantees and large-scale experiments based on supervised datasets to demonstrate the effectiveness of the new approach.



Rethinking Exploration in Reinforcement Learning with Effective Metric-Based Exploration Bonus Yiming Wang 1

Neural Information Processing Systems

Enhancing exploration in reinforcement learning (RL) through the incorporation of intrinsic rewards, specifically by leveraging state discrepancy measures within various metric spaces as exploration bonuses, has emerged as a prevalent strategy to encourage agents to visit novel states. The critical factor lies in how to quantify the difference between adjacent states as novelty for promoting effective exploration.


Appendix

Neural Information Processing Systems

Format is the same as Figure 2c&d. The peak correlation vs. segment duration curve tended to approach an asymptotic value at long segment durations (see Figure 2d). For simplicity, we estimated this asymptotic value for each unit by measuring the peak cross-context correlation across lag for the longest segment duration tested (2.48 seconds) (i.e., the rightmost values in the curves shown in Figure 2d). Convolutional layers have a maximum value of 1, as expected since they have a well-defined upper bound on their integration window. The LSTM layers also showed high maximum values (median correlation value across units was above 0.93 for all layers), indicating a mostly context-invariant response.


Methods

Neural Information Processing Systems

This section describes the image dataset, simulations, network architectures, and neural data used in our work. Our simulations were coded in Python, using Keras [65] and TensorFlow [66]. For our image dataset, we randomly sample 12 million "natural" colorful images from the Yahoo Flickr Creative Commons 100 Million Dataset (YFCC100M) [67], which contains ~100 million images uploaded by users to Flickr between 2004 and 2014. Images are unlabeled (i.e., no content information) and need not contain a recognizable object. We resize each RGB image to 112 112 pixels, randomly cropping the image to have the same number of row and column pixels. We choose this dataset primarily to ensure that our training images are different from those in ImageNet [68], as the DNNs that we choose for features and simulated responses are trained on ImageNet images. We test to what extent gaudy images improve the prediction of generalized linear models (GLMs) when the ground truth model is also a GLM. For the ground truth model (Figure 1a), we use a 112 112 Gabor filter with spatial frequency 0.1 cycles/pixel, bandwidth 0.5, orientation angle 45, and location at the center of the image. The output response (a single variable) is computed by taking a dot product between the input image and the Gabor filter, which is then passed through an activation function (either linear, relu, or sigmoid). For the sigmoid activation function, we first normalized the dot product (dividing by a constant factor equal to 1,000) before passing it through the sigmoid to ensure responses are within a reasonable range (i.e., not simply 0 or 1). We do not add noise to the ground truth outputs, as we already see training improvements without noise; however, adding output noise leads to similar improvements in prediction when training on gaudy images. To predict ground truth responses, we consider GLMs with three different activation functions: linear, relu, and sigmoid (Figure 1b-d). The activation function of the GLM always matches that of the ground truth Gabor filter model. This is to ensure that we uphold the assumption made by the active learning theory in Eqn. 1 (i.e., fitting a linear mapping to a ground truth linear function). The GLM takes as input 112 112 re-centered images (i.e., 110 is subtracted from each pixel intensity).


f610a13de080fb8df6cf972fc01ad93f-Paper.pdf

Neural Information Processing Systems

A key challenge in understanding the sensory transformations of the visual system is to obtain a highly predictive model that maps natural images to neural responses. Deep neural networks (DNNs) provide a promising candidate for such a model. However, DNNs require orders of magnitude more training data than neuroscientists can collect because experimental recording time is severely limited. This motivates us to find images to train highly-predictive DNNs with as little training data as possible. We propose high-contrast, binarized versions of natural images--termed gaudy images--to efficiently train DNNs to predict higher-order visual cortical responses. In simulation experiments and analyses of real neural data, we find that training DNNs with gaudy images substantially reduces the number of training images needed to accurately predict responses to natural images. We also find that gaudy images, chosen before training, outperform images chosen during training by active learning algorithms. Thus, gaudy images overemphasize features of natural images that are the most important for efficiently training DNNs. We believe gaudy images will aid in the modeling of visual cortical neurons, potentially opening new scientific questions about visual processing.


f610a13de080fb8df6cf972fc01ad93f-AuthorFeedback.pdf

Neural Information Processing Systems

We thank the reviewers for their thorough and constructive reviews. Supplemental Material, which is optional and thus not guaranteed to be peer-reviewed for NeurIPS. We respectfully ask reviewers to increase their score, if they agree. We will update the text with all of the reviewer's comments. R1: We agree about linear mappings and will provide broader scope about them.