AITopics | generation result

Collaborating Authors

generation result

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Phased Consistency Models Fu-Y un Wang

Neural Information Processing SystemsFeb-16-2026, 20:54:35 GMT

Consistency Models (CMs) have made significant progress in accelerating the generation of diffusion models. However, their application to high-resolution, text-conditioned image generation in the latent space remains unsatisfactory.

diffusion model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Hong Kong (0.04)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)

Add feedback

Low-shot Learning via Covariance-Preserving Adversarial Augmentation Networks

Hang Gao, Zheng Shou, Alireza Zareian, Hanwang Zhang, Shih-Fu Chang

Neural Information Processing SystemsFeb-13-2026, 11:38:06 GMT

Specifically, a novel Generative Adversarial Network is designed to model thelatent distribution of each novel class given its related base counterparts.

artificial intelligence, arxivpreprintarxiv, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)

Add feedback

Supplementary Material

Neural Information Processing SystemsFeb-12-2026, 13:16:55 GMT

We provide the outcomes of this study in Figure 3 and Figure 4.

artificial intelligence, graph-to-3d, machine learning, (18 more...)

Neural Information Processing Systems

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.68)

Add feedback

M$^{3}$T2IBench: A Large-Scale Multi-Category, Multi-Instance, Multi-Relation Text-to-Image Benchmark

Zhang, Huixuan, Wan, Xiaojun

arXiv.org Artificial IntelligenceOct-28-2025

Text-to-image models are known to struggle with generating images that perfectly align with textual prompts. Several previous studies have focused on evaluating image-text alignment in text-to-image generation. However, these evaluations either address overly simple scenarios, especially overlooking the difficulty of prompts with multiple different instances belonging to the same category, or they introduce metrics that do not correlate well with human evaluation. Our findings reveal that current open-source text-to-image models perform poorly on this challenging benchmark. Additionally, we propose the Revise-Then-Enforce approach to enhance image-text alignment. This training-free post-editing method demonstrates improvements in image-text alignment across a broad range of diffusion models. Text-to-Image (T2I) models have demonstrated impressive performance in generating high-quality, realistic images (Betker et al., 2023; Esser et al., 2024). Despite this success, T2I models continue to struggle with accurately interpreting and following user prompts. They may fail to generate objects with the correct number, attributes, or relationships (Li et al., 2024). However, assessing the alignment between text and generated image has remained a longstanding challenge. There are generally three approaches to evaluating image-text alignment. The first approach involves using pretrained image-text models to generate an overall alignment score. CLIP Score (Hessel et al., 2021) is a widely used metric, while VQAScore (Lin et al., 2024) is an improved version of CLIP Score. However, these metrics have several limitations, including their inability to accurately reflect the true alignment between the image and the text (Li et al., 2024) and failing to provide explainable evaluation results. Figure 1: A failure case generated by Stable-Diffusion-3.

benchmark, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2510.2302

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > California (0.04)
Europe > Monaco (0.04)
(2 more...)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)

Add feedback

Phased Consistency Models Fu-Y un Wang

Neural Information Processing SystemsOct-10-2025, 10:39:55 GMT

arxiv preprint arxiv, consistency model, diffusion model, (15 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Hong Kong (0.04)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)

Add feedback

Supplementary Material CommonScenes: Generating Commonsense 3D Indoor Scenes with Scene Graph Diffusion

Neural Information Processing SystemsOct-8-2025, 18:59:24 GMT

We provide the outcomes of this study in Figure 3 and Figure 4.

artificial intelligence, graph-to-3d, machine learning, (16 more...)

Neural Information Processing Systems

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.68)

Add feedback

AudioMoG: Guiding Audio Generation with Mixture-of-Guidance

Wang, Junyou, Chen, Zehua, Yuan, Binjie, Zheng, Kaiwen, Li, Chang, Jiang, Yuxuan, Zhu, Jun

arXiv.org Artificial IntelligenceSep-30-2025

Guidance methods have demonstrated significant improvements in cross-modal audio generation, including text-to-audio (T2A) and video-to-audio (V2A) generation. The popularly adopted method, classifier-free guidance (CFG), steers generation by emphasizing condition alignment, enhancing fidelity but often at the cost of diversity. Recently, autoguidance (AG) has been explored for audio generation, encouraging the sampling to faithfully reconstruct the target distribution and showing increased diversity. Despite these advances, they usually rely on a single guiding principle, e.g., condition alignment in CFG or score accuracy in AG, leaving the full potential of guidance for audio generation untapped. In this work, we explore enriching the composition of the guidance method and present a mixture-of-guidance framework, AudioMoG. Within the design space, AudioMoG can exploit the complementary advantages of distinctive guiding principles by fulfilling their cumulative benefits. With a reduced form, AudioMoG can consider parallel complements or recover a single guiding principle, without sacrificing generality. We experimentally show that, given the same inference speed, AudioMoG approach consistently outperforms single guidance in T2A generation across sampling steps, concurrently showing advantages in V2A, text-to-music, and image generation. These results highlight a "free lunch" in current cross-modal audio generation systems: higher quality can be achieved through mixed guiding principles at the sampling stage without sacrificing inference efficiency. Demo samples are available at: https://audio-mog.github.io.

diffusion model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2509.23727

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.66)

Industry:

Media > Music (0.94)
Leisure & Entertainment (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Steerable Scene Generation with Post Training and Inference-Time Search

Pfaff, Nicholas, Dai, Hongkai, Zakharov, Sergey, Iwase, Shun, Tedrake, Russ

arXiv.org Artificial IntelligenceAug-27-2025

Training robots in simulation requires diverse 3D scenes that reflect the specific challenges of downstream tasks. However, scenes that satisfy strict task requirements, such as high-clutter environments with plausible spatial arrangement, are rare and costly to curate manually. Instead, we generate large-scale scene data using procedural models that approximate realistic environments for robotic manipulation, and adapt it to task-specific goals. We do this by training a unified diffusion-based generative model that predicts which objects to place from a fixed asset library, along with their SE(3) poses. This model serves as a flexible scene prior that can be adapted using reinforcement learning-based post training, conditional generation, or inference-time search, steering generation toward downstream objectives even when they differ from the original data distribution. Our method enables goal-directed scene synthesis that respects physical feasibility and scales across scene types. We introduce a novel MCTS-based inference-time search strategy for diffusion models, enforce feasibility via projection and simulation, and release a dataset of over 44 million SE(3) scenes spanning five diverse environments. Website with videos, code, data, and model weights: https://steerable-scene-generation.github.io/

machine learning, natural language, object-oriented architecture, (20 more...)

arXiv.org Artificial Intelligence

2505.04831

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
(2 more...)

Add feedback

A Closed form Token level Decomposition

Neural Information Processing SystemsAug-18-2025, 00:14:48 GMT

The typos do not affect related conclusions. For unsupervised LCG experiments, we use Y elp Reviews (Cho et al., 2018) and WMT News section Please refer to the official website of WMT dataset (Bojar et al., 2017) for more information about For MT experiments, we load the MarianMT from the es-en checkpoint provided by huggingface. All the hyperparameters are tuned on the development set. We simply report the results after the maximum number of training epochs (usually 20). For more implementation details and tricks, please refer to our code.

artificial intelligence, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Genre: Press Release (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.49)
Information Technology > Artificial Intelligence > Natural Language (0.30)

Add feedback