Methods
–Neural Information Processing Systems
This section describes the image dataset, simulations, network architectures, and neural data used in our work. Our simulations were coded in Python, using Keras [65] and TensorFlow [66]. For our image dataset, we randomly sample 12 million "natural" colorful images from the Yahoo Flickr Creative Commons 100 Million Dataset (YFCC100M) [67], which contains ~100 million images uploaded by users to Flickr between 2004 and 2014. Images are unlabeled (i.e., no content information) and need not contain a recognizable object. We resize each RGB image to 112 112 pixels, randomly cropping the image to have the same number of row and column pixels. We choose this dataset primarily to ensure that our training images are different from those in ImageNet [68], as the DNNs that we choose for features and simulated responses are trained on ImageNet images. We test to what extent gaudy images improve the prediction of generalized linear models (GLMs) when the ground truth model is also a GLM. For the ground truth model (Figure 1a), we use a 112 112 Gabor filter with spatial frequency 0.1 cycles/pixel, bandwidth 0.5, orientation angle 45, and location at the center of the image. The output response (a single variable) is computed by taking a dot product between the input image and the Gabor filter, which is then passed through an activation function (either linear, relu, or sigmoid). For the sigmoid activation function, we first normalized the dot product (dividing by a constant factor equal to 1,000) before passing it through the sigmoid to ensure responses are within a reasonable range (i.e., not simply 0 or 1). We do not add noise to the ground truth outputs, as we already see training improvements without noise; however, adding output noise leads to similar improvements in prediction when training on gaudy images. To predict ground truth responses, we consider GLMs with three different activation functions: linear, relu, and sigmoid (Figure 1b-d). The activation function of the GLM always matches that of the ground truth Gabor filter model. This is to ensure that we uphold the assumption made by the active learning theory in Eqn. 1 (i.e., fitting a linear mapping to a ground truth linear function). The GLM takes as input 112 112 re-centered images (i.e., 110 is subtracted from each pixel intensity).
Neural Information Processing Systems
Mar-21-2025, 22:10:29 GMT