A Experimental Settings
–Neural Information Processing Systems
All experiments were conducted on a single NVIDIA RTX 3090 GPU. The obtained text features were also projected into the CLIP latent space via an FC layer. The test images followed the same process except that the center cropping was used. Besides, the classification accuracy is adopted for Adience. Image Aesthetics Assessment An ImageNet pre-trained VGG-16 was used as the image encoder.
Neural Information Processing Systems
Nov-17-2025, 01:21:34 GMT