image category
Stable diffusion models reveal a persisting human and AI gap in visual creativity
Rondini, Silvia, Alvarez-Martin, Claudia, Angermair-Barkai, Paula, Penacchio, Olivier, Paz, M., Pelowski, Matthew, Dediu, Dan, Rodriguez-Fornells, Antoni, Cerda-Company, Xim
While recent research suggests Large Language Models match human creative performance in divergent thinking tasks, visual creativity remains underexplored. This study compared image generation in human participants (Visual Artists and Non Artists) and using an image generation AI model (two prompting conditions with varying human input: high for Human Inspired, low for Self Guided). Human raters (N=255) and GPT4o evaluated the creativity of the resulting images. We found a clear creativity gradient, with Visual Artists being the most creative, followed by Non Artists, then Human Inspired generative AI, and finally Self Guided generative AI. Increased human guidance strongly improved GenAI's creative output, bringing its productions close to those of Non Artists. Notably, human and AI raters also showed vastly different creativity judgment patterns. These results suggest that, in contrast to language centered tasks, GenAI models may face unique challenges in visual domains, where creativity depends on perceptual nuance and contextual sensitivity, distinctly human capacities that may not be readily transferable from language models.
- Europe > Austria > Vienna (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- (12 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
EMCNet : Graph-Nets for Electron Micrographs Classification
Srinivas, Sakhinana Sagar, Sarkar, Rajat Kumar, Runkana, Venkataramana
Characterization of materials via electron micrographs is an important and challenging task in several materials processing industries. Classification of electron micrographs is complex due to the high intra-class dissimilarity, high inter-class similarity, and multi-spatial scales of patterns. However, existing methods are ineffective in learning complex image patterns. We propose an effective end-to-end electron micrograph representation learning-based framework for nanomaterial identification to overcome the challenges. We demonstrate that our framework outperforms the popular baselines on the open-source datasets in nanomaterials-based identification tasks. The ablation studies are reported in great detail to support the efficacy of our approach.
- North America > United States > District of Columbia > Washington (0.05)
- Asia > India > Maharashtra > Pune (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > India > Karnataka > Bengaluru (0.04)
Large Scale Category Structure Aware Image Categorization
Most previous research on image categorization has focused on medium-scale data sets, while large-scale image categorization with millions of images from thousands of categories remains a challenge. With the emergence of structured large-scale dataset such as the ImageNet, rich information about the conceptual relationships between images, such as a tree hierarchy among various image categories, become available. As human cognition of complex visual world benefits from underlying semantic relationships between object classes, we believe a machine learning system can and should leverage such information as well for better performance. In this paper, we employ such semantic relatedness among image categories for large-scale image categorization. Specifically, a category hierarchy is utilized to properly define loss function and select common set of features for related categories. An efficient optimization method based on proximal approximation and accelerated parallel gradient method is introduced. Experimental results on a subset of ImageNet containing 1.2 million images from 1000 categories demonstrate the effectiveness and promise of our proposed approach.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > New York (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.89)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.67)
Fast Template Evaluation with Vector Quantization
Applying linear templates is an integral part of many object detection systems and accounts for a significant portion of computation time. We describe a method that achieves a substantial end-to-end speedup over the best current methods, without loss of accuracy. Our method is a combination of approximating scores by vector quantizing feature windows and a number of speedup techniques including cascade. Our procedure allows speed and accuracy to be traded off in two ways: by choosing the number of Vector Quantization levels, and by choosing to rescore windows or not. Our method can be directly plugged into any recognition system that relies on linear templates. We demonstrate our method to speed up the original Exemplar SVM detector [1] by an order of magnitude and Deformable Part models [2] by two orders of magnitude with no loss of accuracy.
Adaptive Multi-Modality Prompt Learning
Wu, Zongqian, Liu, Yujing, Zhan, Mengmeng, Shen, Jialie, Hu, Ping, Zhu, Xiaofeng
Although current prompt learning methods have successfully been designed to effectively reuse the large pre-trained models without fine-tuning their large number of parameters, they still have limitations to be addressed, i.e., without considering the adverse impact of meaningless patches in every image and without simultaneously considering in-sample generalization and out-of-sample generalization. In this paper, we propose an adaptive multi-modality prompt learning to address the above issues. To do this, we employ previous text prompt learning and propose a new image prompt learning. The image prompt learning achieves in-sample and out-of-sample generalization, by first masking meaningless patches and then padding them with the learnable parameters and the information from texts. Moreover, each of the prompts provides auxiliary information to each other, further strengthening these two kinds of generalization. Experimental results on real datasets demonstrate that our method outperforms SOTA methods, in terms of different downstream tasks.
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Large-Scale Category Structure Aware Image Categorization
Most previous research on image categorization has focused on medium-scale data sets, while large-scale image categorization with millions of images from thousands of categories remains a challenge. With the emergence of structured large-scale dataset such as the ImageNet, rich information about the conceptual relationships between images, such as a tree hierarchy among various image categories, become available. As human cognition of complex visual world benefits from underlying semantic relationships between object classes, we believe a machine learning system can and should leverage such information as well for better performance. In this paper, we employ such semantic relatedness among image categories for large-scale image categorization. Specifically, a category hierarchy is utilized to properly define loss function and select common set of features for related categories.
MONet: Multi-scale Overlap Network for Duplication Detection in Biomedical Images
Sabir, Ekraam, Nandi, Soumyaroop, AbdAlmageed, Wael, Natarajan, Prem
Manipulation of biomedical images to misrepresent experimental results has plagued the biomedical community for a while. Recent interest in the problem led to the curation of a dataset and associated tasks to promote the development of biomedical forensic methods. Of these, the largest manipulation detection task focuses on the detection of duplicated regions between images. Traditional computer-vision based forensic models trained on natural images are not designed to overcome the challenges presented by biomedical images. We propose a multi-scale overlap detection model to detect duplicated image regions. Our model is structured to find duplication hierarchically, so as to reduce the number of patch operations. It achieves state-of-the-art performance overall and on multiple biomedical image categories.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > California > Los Angeles County > Los Angeles (0.04)
The perceptual boost of visual attention is task-dependent in naturalistic settings
Smith, Freddie Bickford, Luo, Xiaoliang, Roads, Brett D., Love, Bradley C.
Attentional modulation of neural representations is known to enhance processing of task-relevant visual information. Is the resulting perceptual boost task-dependent in naturalistic settings? We aim to answer this with a large-scale computational experiment. First we design a series of visual tasks, each consisting of classifying images from a particular task set (group of image categories). The nature of a given task is determined by which categories are included in the task set. Then on each task we compare the accuracy of an attention-augmented neural network to that of an attention-free counterpart. We show that, all else being equal, the performance impact of attention is stronger with increasing task-set difficulty, weaker with increasing task-set size, and weaker with increasing perceptual similarity within a task set.
Large-Scale Category Structure Aware Image Categorization
Zhao, Bin, Li, Fei, Xing, Eric P.
Most previous research on image categorization has focused on medium-scale data sets, while large-scale image categorization with millions of images from thousands of categories remains a challenge. With the emergence of structured large-scale dataset such as the ImageNet, rich information about the conceptual relationships between images, such as a tree hierarchy among various image categories, become available. As human cognition of complex visual world benefits from underlying semantic relationships between object classes, we believe a machine learning system can and should leverage such information as well for better performance. In this paper, we employ such semantic relatedness among image categories for large-scale image categorization. Specifically, a category hierarchy is utilized to properly define loss function and select common set of features for related categories.
Create Your Own Computer Vision Sandbox - KDnuggets
Together we will be creating a CNN sandbox that can collect images, augment data, and easily alter architectures for quick and nimble pivoting for a variety of projects. The process was written for someone with very little CNN building experience, so will abstract away higher-level details for the benefit of being able to get a macro view on some of the fundamental components. As well several non-essential steps have been added in order to visualize outputs and provide transparency to the process. Each of these steps were written to be directly transitioned to create a model entirely of your choice, containing any amount of classifications; dogs vs cats, country flags, hotdogs vs not hotdogs. However accuracy will vary greatly dependent on the amount of classes and similarity between them.