AITopics | style encoder

Collaborating Authors

style encoder

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

4730d10b22261faa9a95ebf7497bc556-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 16:55:08 GMT

generspeech, mean opinion score, visualization, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.55)

Add feedback

Voiced-Aware Style Extraction and Style Direction Adjustment for Expressive Text-to-Speech

Kim, Nam-Gyu

arXiv.org Artificial IntelligenceNov-20-2025

Recent advances in expressive text-to-speech (TTS) have introduced diverse methods based on style embedding extracted from reference speech. However, synthesizing high-quality expressive speech remains challenging. We propose SpotlightTTS, which exclusively emphasizes style via voiced-aware style extraction and style direction adjustment. Voiced-aware style extraction focuses on voiced regions highly related to style while maintaining continuity across different speech regions to improve expressiveness. We adjust the direction of the extracted style for optimal integration into the TTS model, which improves speech quality. Experimental results demonstrate that Spotlight-TTS achieves superior performance compared to baseline models in terms of expressiveness, overall speech quality, and style transfer capability.

information, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2511.14824

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Education > Educational Setting > Higher Education (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.73)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.62)

Add feedback

MimicParts: Part-aware Style Injection for Speech-Driven 3D Motion Generation

Liu, Lianlian, He, YongKang, Chu, Zhaojie, Xing, Xiaofen, Xu, Xiangmin

arXiv.org Artificial IntelligenceOct-16-2025

Generating stylized 3D human motion from speech signals presents substantial challenges, primarily due to the intricate and fine-grained relationships among speech signals, individual styles, and the corresponding body movements. Current style encoding approaches either oversimplify stylistic diversity or ignore regional motion style differences (e.g., upper vs. lower body), limiting motion realism. Additionally, motion style should dynamically adapt to changes in speech rhythm and emotion, but existing methods often overlook this. To address these issues, we propose MimicParts, a novel framework designed to enhance stylized motion generation based on part-aware style injection and part-aware denoising network. It divides the body into different regions to encode localized motion styles, enabling the model to capture fine-grained regional differences. Furthermore, our part-aware attention block allows rhythm and emotion cues to guide each body region precisely, ensuring that the generated motion aligns with variations in speech rhythm and emotional state. Experimental results show that our method outperforming existing methods showcasing naturalness and expressive 3D human motion sequences.

diffusion model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2510.13208

Country:

Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Vision (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

DiffStyleTS: Diffusion Model for Style Transfer in Time Series

Nagda, Mayank, Ostheimer, Phil, Arweiler, Justus, Jungjohann, Indra, Werner, Jennifer, Wagner, Dennis, Muraleedharan, Aparna, Jafari, Pouya, Schmid, Jochen, Jirasek, Fabian, Burger, Jakob, Bortz, Michael, Hasse, Hans, Mandt, Stephan, Kloft, Marius, Fellenz, Sophie

arXiv.org Artificial IntelligenceOct-14-2025

Style transfer combines the content of one signal with the style of another. It supports applications such as data augmentation and scenario simulation, helping machine learning models generalize in data-scarce domains. While well developed in vision and language, style transfer methods for time series data remain limited. We introduce DiffTSST, a diffusion-based framework that disentangles a time series into content and style representations via convolutional encoders and recombines them through a self-supervised attention-based diffusion process. At inference, encoders extract content and style from two distinct series, enabling conditional generation of novel samples to achieve style transfer. We demonstrate both qualitatively and quantitatively that DiffTSST achieves effective style transfer. We further validate its real-world utility by showing that data augmentation with DiffTSST improves anomaly detection in data-scarce regimes.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2510.11335

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California (0.04)
(8 more...)

Genre: Research Report (0.50)

Industry: Energy (0.46)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Emotional Text-To-Speech Based on Mutual-Information-Guided Emotion-Timbre Disentanglement

Yang, Jianing, Li, Sheng, Shinozaki, Takahiro, Saito, Yuki, Saruwatari, Hiroshi

arXiv.org Artificial IntelligenceOct-3-2025

Current emotional Text-To-Speech (TTS) and style transfer methods rely on reference encoders to control global style or emotion vectors, but do not capture nuanced acoustic details of the reference speech. To this end, we propose a novel emotional TTS method that enables fine-grained phoneme-level emotion embedding prediction while disentangling intrinsic attributes of the reference speech. The proposed method employs a style disentanglement method to guide two feature extractors, reducing mutual information between timbre and emotion features, and effectively separating distinct style components from the reference speech. Experimental results demonstrate that our method outperforms baseline TTS systems in generating natural and emotionally rich speech. This work highlights the potential of disentangled and fine-grained representations in advancing the quality and flexibility of emotional TTS systems.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2510.01722

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.15)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.73)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.62)

Add feedback

BlendGAN: Implicitly GAN Blending for Arbitrary Stylized Face Generation Supplementary Materials

Neural Information Processing SystemsAug-18-2025, 23:53:31 GMT

Sec. 3.1 of the main text, the directly concatenating method leads to a 610,304-dimensional vector, The above process can be regarded as one-shot learning. BlendGAN are inconsistent with the reference styles. Progressive growing of gans for improved quality, stability, and variation.

artificial intelligence, machine learning, reference image, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

4730d10b22261faa9a95ebf7497bc556-Supplemental-Conference.pdf

Neural Information Processing SystemsAug-14-2025, 14:18:25 GMT

artificial intelligence, machine learning, visualization, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

Spotlight-TTS: Spotlighting the Style via Voiced-Aware Style Extraction and Style Direction Adjustment for Expressive Text-to-Speech

Kim, Nam-Gyu, Cho, Deok-Hyeon, Kim, Seung-Bin, Lee, Seong-Whan

arXiv.org Artificial IntelligenceJul-1-2025

Recent advances in expressive text-to-speech (TTS) have introduced diverse methods based on style embedding extracted from reference speech. However, synthesizing high-quality expressive speech remains challenging. We propose Spotlight-TTS, which exclusively emphasizes style via voiced-aware style extraction and style direction adjustment. V oiced-aware style extraction focuses on voiced regions highly related to style while maintaining continuity across different speech regions to improve expressiveness. We adjust the direction of the extracted style for optimal integration into the TTS model, which improves speech quality. Experimental results demonstrate that Spotlight-TTS achieves superior performance compared to baseline models in terms of expressiveness, overall speech quality, and style transfer capability. Our audio samples are publicly available.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2505.20868

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > South Korea > Seoul > Seoul (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

GSA-TTS : Toward Zero-Shot Speech Synthesis based on Gradual Style Adaptor

Lee, Seokgi, Kim, Jungjun

arXiv.org Artificial IntelligenceMay-27-2025

ABSTRACT We present the gradual style adaptor TTS (GSA-TTS) with a novel style encoder that gradually encodes speaking styles from an acoustic reference for zero-shot speech synthesis. Then the local styles are combined by self-attention to obtain a global style condition. This semantic and hierarchical encoding strategy provides a robust and rich style representation for an acoustic model. We test GSA-TTS on unseen speakers and obtain promising results regarding naturalness, speaker similarity, and intelligibility. Additionally, we explore the potential of GSA in terms of interpretability and controllability, which stems from its hierarchical structure. Audio samples are available at https://gsattsdemo.github.io/

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2505.19384

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > Japan > Honshū > Tōhoku > Iwate Prefecture > Morioka (0.04)

Genre: Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.75)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.62)

Add feedback

Domain-invariant feature learning in brain MR imaging for content-based image retrieval

Tobari, Shuya, Tomoshige, Shuhei, Muraki, Hayato, Oishi, Kenichi, Iyatomi, Hitoshi

arXiv.org Artificial IntelligenceJan-2-2025

When conducting large-scale studies that collect brain MR images from multiple facilities, the impact of differences in imaging equipment and protocols at each site cannot be ignored, and this domain gap has become a significant issue in recent years. In this study, we propose a new low-dimensional representation (LDR) acquisition method called style encoder adversarial domain adaptation (SE-ADA) to realize content-based image retrieval (CBIR) of brain MR images. SE-ADA reduces domain differences while preserving pathological features by separating domain-specific information from LDR and minimizing domain differences using adversarial learning. In evaluation experiments comparing SE-ADA with recent domain harmonization methods on eight public brain MR datasets (ADNI1/2/3, OASIS1/2/3/4, PPMI), SE-ADA effectively removed domain information while preserving key aspects of the original brain structure and demonstrated the highest disease search accuracy.

artificial intelligence, information, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2501.01326

Country:

North America > United States > Maryland > Baltimore (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report (0.84)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.71)
Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.33)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

Add feedback