AITopics

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > Singapore (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Let us begin by recalling how SP ADE works, and study where its defects come from. These statistics are calculated via averages over examples and all spatial dimensions. In Figure 4, we can see that SP ADE has these droplet artifacts as well. Despite the rationale behind this idea, we could not find settings where we noticed a decrease in distortion that was not accompanied by a drastic decrease in quality. SSNs trained in FFHQ at 256 x 256 resolution.

artificial intelligence, conditioning, convolution, (16 more...)

Technology: Information Technology > Artificial Intelligence (0.30)

Neural Information Processing SystemsOct-2-2025, 12:57:47 GMT

370bfb31abd222b582245b977ea5f25a-AuthorFeedback.pdf

adain, artificial intelligence, example image, (14 more...)

Technology: Information Technology > Artificial Intelligence (0.30)

Chigot, Estelle, Wilson, Dennis G., Ghrib, Meriem, Oberlin, Thomas

Style Transfer with Diffusion Models for Synthetic-to-Real Domain Adaptation

arXiv.org Artificial IntelligenceSep-19-2025

Semantic segmentation models trained on synthetic data often perform poorly on real-world images due to domain gaps, particularly in adverse conditions where labeled data is scarce. Yet, recent foundation models enable to generate realistic images without any training. This paper proposes to leverage such diffusion models to improve the performance of vision models when learned on synthetic data. We introduce two novel techniques for semantically consistent style transfer using diffusion models: Class-wise Adaptive Instance Normalization and Cross-Attention (CACTI) and its extension with selective attention Filtering (CACTIF). CACTI applies statistical normalization selectively based on semantic classes, while CACTIF further filters cross-attention maps based on feature similarity, preventing artifacts in regions with weak cross-attention correspondences. Our methods transfer style characteristics while preserving semantic boundaries and structural coherence, unlike approaches that apply global transformations or generate content without constraints. Experiments using GTA5 as source and Cityscapes/ACDC as target domains show that our approach produces higher quality images with lower FID scores and better content preservation. Our work demonstrates that class-aware diffusion-based style transfer effectively bridges the synthetic-to-real domain gap even with minimal target domain data, advancing robust perception systems for challenging real-world applications. The source code is available at: https://github.com/echigot/cactif.

artificial intelligence, machine learning, natural language, (16 more...)

doi: 10.1016/j.cviu.2025.104445

2505.1636

Country: Europe > France (0.14)

Genre:

Research Report > Promising Solution (0.66)
Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games > Computer Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsAug-17-2025, 04:06:21 GMT

our performance remarkable (R1,R2,R3,R4,R5) and identified our contribution to this challenging OSUDA problem

We thank the reviewers for their thoughtful feedback! We are pleased to get a positive average score where R2,R4 and R5 gave positive feedback. We will incorporate all feedback in the revision. Here we'd like to emphasize our motivation for ASM again. RAIN seems only a complex version of AdaIN, which is not very attractive.

contribution, gaussian distribution, target sample, (16 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.53)

Li, Yinghao Aaron, Han, Cong, Mesgarani, Nima

StyleTTS: A Style-Based Generative Model for Natural and Diverse Text-to-Speech Synthesis

arXiv.org Artificial IntelligenceNov-19-2023

Text-to-Speech (TTS) has recently seen great progress in synthesizing high-quality speech owing to the rapid development of parallel TTS systems, but producing speech with naturalistic prosodic variations, speaking styles and emotional tones remains challenging. Moreover, since duration and speech are generated separately, parallel TTS models still have problems finding the best monotonic alignments that are crucial for naturalistic speech synthesis. Here, we propose StyleTTS, a style-based generative model for parallel TTS that can synthesize diverse speech with natural prosody from a reference speech utterance. With novel Transferable Monotonic Aligner (TMA) and duration-invariant data augmentation schemes, our method significantly outperforms state-of-the-art models on both single and multi-speaker datasets in subjective tests of speech naturalness and speaker similarity. Through self-supervised learning of the speaking styles, our model can synthesize speech with the same prosodic and emotional tone as any given reference speech without the need for explicitly labeling these categories.

alignment, speech, synthesis, (14 more...)

2205.15439

Country:

North America > United States (0.28)
North America > Canada > Quebec > Montreal (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Europe > United Kingdom > England (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Sun, Susu, Woerner, Stefano, Maier, Andreas, Koch, Lisa M., Baumgartner, Christian F.

Inherently Interpretable Multi-Label Classification Using Class-Specific Counterfactuals

arXiv.org Artificial IntelligenceAug-8-2023

Interpretability is essential for machine learning algorithms in high-stakes application fields such as medical image analysis. However, high-performing black-box neural networks do not provide explanations for their predictions, which can lead to mistrust and suboptimal human-ML collaboration. Post-hoc explanation techniques, which are widely used in practice, have been shown to suffer from severe conceptual problems. Furthermore, as we show in this paper, current explanation techniques do not perform adequately in the multi-label scenario, in which multiple medical findings may co-occur in a single image. We propose Attri-Net, an inherently interpretable model for multi-label classification. Attri-Net is a powerful classifier that provides transparent, trustworthy, and human-understandable explanations. The model first generates class-specific attribution maps based on counterfactuals to identify which image regions correspond to certain medical findings. Then a simple logistic regression classifier is used to make predictions based solely on these attribution maps. We compare Attri-Net to five post-hoc explanation techniques and one inherently interpretable classifier on three chest X-ray datasets. We find that Attri-Net produces high-quality multi-label explanations consistent with clinical knowledge and has comparable classification performance to state-of-the-art classification models.

artificial intelligence, machine learning, natural language, (17 more...)

2303.005

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
Europe > Germany > Bavaria > Middle Franconia > Nuremberg (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.35)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.90)

arXiv.org Artificial IntelligenceMar-24-2023

Factor Decomposed Generative Adversarial Networks for Text-to-Image Synthesis

Li, Jiguo, Liu, Xiaobin, Zheng, Lirong

Prior works about text-to-image synthesis typically concatenated the sentence embedding with the noise vector, while the sentence embedding and the noise vector are two different factors, which control the different aspects of the generation. Simply concatenating them will entangle the latent factors and encumber the generative model. In this paper, we attempt to decompose these two factors and propose Factor Decomposed Generative Adversarial Networks~(FDGAN). To achieve this, we firstly generate images from the noise vector and then apply the sentence embedding in the normalization layer for both generator and discriminators. We also design an additive norm layer to align and fuse the text-image features. The experimental results show that decomposing the noise and the sentence embedding can disentangle latent factors in text-to-image synthesis, and make the generative model more efficient. Compared with the baseline, FDGAN can achieve better performance, while fewer parameters are used.

artificial intelligence, fdgan, machine learning, (12 more...)

2303.13821

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.49)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)