AITopics | generated image

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
North America > Canada (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Neural Information Processing SystemsOct-10-2025, 12:08:09 GMT

Immiscible Diffusion: Accelerating Diffusion Training with Noise Assignment

In this paper, we point out that suboptimal noise-data mapping leads to slow training of diffusion models.

diffusion, diffusion model, immiscible diffusion, (15 more...)

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
North America > Canada (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Neural Information Processing SystemsOct-1-2025, 22:35:59 GMT

A Riemannian Manifold Definition 3 (Manifold) [36] Let M

Assume that the heat kernel is Lipschitz continous. Proof We start by introducing the following lemma, which is Proposition 4.4 in [20]. Following some previous work, we first define the projection operator. We then introduce the following lemma which utilize the projection operator. Readers who are interested may also refer to Chapter 5.3 in [8] and proof C (null) 0 as null 0, and Γ is the gamma function.

artificial intelligence, machine learning, manifold, (15 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Frkatovic, Jasmin, Malemath, Akash, Kankeu, Ivan, Werner, Yannick, Tschöpe, Matthias, Rey, Vitor Fortes, Suh, Sungho, Lukowicz, Paul, Palaiodimopoulos, Nikolaos, Kiefer-Emmanouilidis, Maximilian

On the Generalization Limits of Quantum Generative Adversarial Networks with Pure State Generators

arXiv.org Artificial IntelligenceAug-14-2025

Over the past decade, advancements in model architectures, the availability of larger datasets, and improvements in hardware--among other factors--have significantly enhanced the capabilities of generative machine learning models [1-3]. At the same time, ongoing progress toward scalable quantum hardware has sparked growing interest in the development of quantum machine learning (QML) algorithms [4, 5], which aim to leverage quantum properties--such as superposition and entanglement--to enhance the efficiency and expressivity of classical machine learning approaches. Although large-scale fault-tolerant quantum hardware is not yet realizable, many QML algorithms are specifically designed to operate within the constraints of the noisy intermediate-scale quantum (NISQ) era [6-8]. In image generation tasks, several classical deep learning architectures have demonstrated notable effectiveness. Variational Autoencoders (VAEs) are particularly useful for tasks like image denoising [9] and anomaly detection [10] due to their structured latent spaces.

artificial intelligence, generator, machine learning, (17 more...)

2508.09844

Country:

Europe > Germany > Rhineland-Palatinate > Kaiserslautern (0.05)
North America > Canada > Ontario > Toronto (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

arXiv.org Artificial IntelligenceMay-29-2025

Thinking with Generated Images

Chern, Ethan, Hu, Zhulin, Chern, Steffi, Kou, Siqi, Su, Jiadi, Ma, Yan, Deng, Zhijie, Liu, Pengfei

We present Thinking with Generated Images, a novel paradigm that fundamentally transforms how large multimodal models (LMMs) engage with visual reasoning by enabling them to natively think across text and vision modalities through spontaneous generation of intermediate visual thinking steps. Current visual reasoning with LMMs is constrained to either processing fixed user-provided images or reasoning solely through text-based chain-of-thought (CoT). Thinking with Generated Images unlocks a new dimension of cognitive capability where models can actively construct intermediate visual thoughts, critique their own visual hypotheses, and refine them as integral components of their reasoning process. We demonstrate the effectiveness of our approach through two complementary mechanisms: (1) vision generation with intermediate visual subgoals, where models decompose complex visual tasks into manageable components that are generated and integrated progressively, and (2) vision generation with self-critique, where models generate an initial visual hypothesis, analyze its shortcomings through textual reasoning, and produce refined outputs based on their own critiques. Our experiments on vision generation benchmarks show substantial improvements over baseline approaches, with our models achieving up to 50% (from 38% to 57%) relative improvement in handling complex multi-object scenarios. From biochemists exploring novel protein structures, and architects iterating on spatial designs, to forensic analysts reconstructing crime scenes, and basketball players envisioning strategic plays, our approach enables AI models to engage in the kind of visual imagination and iterative refinement that characterizes human creative, analytical, and strategic thinking. We release our open-source suite at https://github.com/GAIR-NLP/thinking-with-generated-images.

large language model, machine learning, natural language, (17 more...)

2505.22525

Country:

Asia > Vietnam > Hanoi > Hanoi (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre:

Research Report (0.64)
Workflow (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Neural Information Processing SystemsMay-27-2025, 05:19:27 GMT

JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated Images

Existing vision-language understanding benchmarks largely consist of images of objects in their usual contexts.As a consequence, recent multimodal large language models can perform well with only a shallow visual understanding by relying on background language biases. Thus, strong performance on these benchmarks does not necessarily correlate with strong visual understanding. In this paper, we release JourneyBench, a comprehensive human-annotated benchmark of generated images designed to assess the model's fine-grained multimodal reasoning abilities across five tasks: complementary multimodal chain of thought, multi-image VQA, imaginary image captioning, VQA with hallucination triggers, and fine-grained retrieval with sample-specific distractors.Unlike existing benchmarks, JourneyBench explicitly requires fine-grained multimodal reasoning in unusual imaginary scenarios where language bias and holistic image gist are insufficient. We benchmark state-of-the-art models on JourneyBench and analyze performance along a number of fine-grained dimensions. Results across all five tasks show that JourneyBench is exceptionally challenging for even the best models, indicating that models' visual reasoning abilities are not as strong as they first appear. We discuss the implications of our findings and propose avenues for further research.

benchmark, generated image, journeybench, (3 more...)

Genre: Research Report (0.62)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

arXiv.org Artificial IntelligenceJun-7-2024

Efficient Differentially Private Fine-Tuning of Diffusion Models

Liu, Jing, Lowy, Andrew, Koike-Akino, Toshiaki, Parsons, Kieran, Wang, Ye

The recent developments of Diffusion Models (DMs) enable generation of astonishingly high-quality synthetic samples. Recent work showed that the synthetic samples generated by the diffusion model, which is pre-trained on public data and fully fine-tuned with differential privacy on private data, can train a downstream classifier, while achieving a good privacy-utility tradeoff. However, fully fine-tuning such large diffusion models with DP-SGD can be very resource-demanding in terms of memory usage and computation. In this work, we investigate Parameter-Efficient Fine-Tuning (PEFT) of diffusion models using Low-Dimensional Adaptation (LoDA) with Differential Privacy. We evaluate the proposed method with the MNIST and CIFAR-10 datasets and demonstrate that such efficient fine-tuning can also generate useful synthetic samples for training downstream classifiers, with guaranteed privacy protection of fine-tuning data. Our source code will be made available on GitHub.

diffusion model, fine-tuning, synthetic sample, (12 more...)

2406.05257

Country:

North America > United States > Wisconsin > Dane County > Madison (0.14)
North America > Canada > Ontario > Toronto (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Fuchi, Masane, Takagi, Tomohiro

Erasing Concepts from Text-to-Image Diffusion Models with Few-shot Unlearning

arXiv.org Artificial IntelligenceMay-12-2024

Generating images from text has become easier because of the scaling of diffusion models and advancements in the field of vision and language. These models are trained using vast amounts of data from the Internet. Hence, they often contain undesirable content such as copyrighted material. As it is challenging to remove such data and retrain the models, methods for erasing specific concepts from pre-trained models have been investigated. We propose a novel concept-erasure method that updates the text encoder using few-shot unlearning in which a few real images are used. The discussion regarding the generated images after erasing a concept has been lacking. While there are methods for specifying the transition destination for concepts, the validity of the specified concepts is unclear. Our method implicitly achieves this by transitioning to the latent concepts inherent in the model or the images. Our method can erase a concept within 10 s, making concept erasure more accessible than ever before. Implicitly transitioning to related concepts leads to more natural concept erasure. We applied the proposed method to various concepts and confirmed that concept erasure can be achieved tens to hundreds of times faster than with current methods. By varying the parameters to be updated, we obtained results suggesting that, like previous research, knowledge is primarily accumulated in the feed-forward networks of the text encoder.

encoder, text encoder, text-to-image diffusion model, (14 more...)

2405.07288

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)
South America > Colombia > Meta Department > Villavicencio (0.04)
(7 more...)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)

arXiv.org Artificial IntelligenceJan-26-2024

Annotated Hands for Generative Models

Yang, Yue, Gandhi, Atith N, Turk, Greg

Generative models such as GANs and diffusion models have demonstrated impressive image generation capabilities. Despite these successes, these systems are surprisingly poor at creating images with hands. We propose a novel training framework for generative models that substantially improves the ability of such systems to create hand images. Our approach is to augment the training images with three additional channels that provide annotations to hands in the image. These annotations provide additional structure that coax the generative model to produce higher quality hand images. We demonstrate this approach on two different generative models: a generative adversarial network and a diffusion model. We demonstrate our method both on a new synthetic dataset of hand images and also on real photographs that contain hands. We measure the improved quality of the generated hands through higher confidence in finger joint identification using an off-the-shelf hand detector.

dataset, diffusion model, generative model, (13 more...)

2401.15075

Country: North America > United States > New York (0.04)

Genre:

Workflow (0.46)
Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceJan-23-2024

Qualitative Failures of Image Generation Models and Their Application in Detecting Deepfakes

Borji, Ali

The ability of image and video generation models to create photorealistic images has reached unprecedented heights, making it difficult to distinguish between real and fake images in many cases. However, despite this progress, a gap remains between the quality of generated images and those found in the real world. To address this, we have reviewed a vast body of literature from both academic publications and social media to identify qualitative shortcomings in image generation models, which we have classified into five categories. By understanding these failures, we can identify areas where these models need improvement, as well as develop strategies for detecting deep fakes. The prevalence of deep fakes in today's society is a serious concern, and our findings can help mitigate their negative impact.

arxiv preprint arxiv, deepfake, generated image, (13 more...)

2304.0647

Country: North America > United States > Florida > Orange County > Orlando (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)