AITopics

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
North America > Canada (0.04)

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Daan Wynen, Cordelia Schmid, Julien Mairal

Unsupervised Learning of Artistic Styles with Archetypal Style Analysis

Neural Information Processing SystemsFeb-12-2026, 04:28:28 GMT

This enables us tointerpret which archetypal styles are present inthe input image, and inwhich proportion.

artificial intelligence, machine learning, representation, (19 more...)

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > France (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Daan Wynen, Cordelia Schmid, Julien Mairal

Unsupervised Learning of Artistic Styles with Archetypal Style Analysis

Neural Information Processing SystemsNov-20-2025, 14:13:01 GMT

In this paper, we introduce an unsupervised learning approach to automatically discover, summarize, and manipulate artistic styles from large collections of paintings.

archetype, artificial intelligence, machine learning, (18 more...)

Country:

Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.05)
North America > Canada > Quebec > Montreal (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.61)

arXiv.org Artificial IntelligenceOct-16-2025

MimicParts: Part-aware Style Injection for Speech-Driven 3D Motion Generation

Liu, Lianlian, He, YongKang, Chu, Zhaojie, Xing, Xiaofen, Xu, Xiangmin

Generating stylized 3D human motion from speech signals presents substantial challenges, primarily due to the intricate and fine-grained relationships among speech signals, individual styles, and the corresponding body movements. Current style encoding approaches either oversimplify stylistic diversity or ignore regional motion style differences (e.g., upper vs. lower body), limiting motion realism. Additionally, motion style should dynamically adapt to changes in speech rhythm and emotion, but existing methods often overlook this. To address these issues, we propose MimicParts, a novel framework designed to enhance stylized motion generation based on part-aware style injection and part-aware denoising network. It divides the body into different regions to encode localized motion styles, enabling the model to capture fine-grained regional differences. Furthermore, our part-aware attention block allows rhythm and emotion cues to guide each body region precisely, ensuring that the generated motion aligns with variations in speech rhythm and emotional state. Experimental results show that our method outperforming existing methods showcasing naturalness and expressive 3D human motion sequences.

diffusion model, machine learning, natural language, (17 more...)

2510.13208

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Vision (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Neural Information Processing SystemsOct-2-2025, 19:17:50 GMT

Multi-mapping Image-to-Image Translation via Learning Disentanglement

Xiaoming Yu, Yuanqi Chen, Shan Liu, Thomas Li, Ge Li

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, translation, (16 more...)

Country: Asia > China (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Neural Information Processing SystemsOct-2-2025, 19:17:35 GMT

be addressed point by point. Confusions of the LPIPS metric. We apologize for the lack of explanation of LPIPS

Without reasonable representations, these methods are difficult to produce high-quality images.

artificial intelligence, different domain, representation, (16 more...)

Technology: Information Technology > Artificial Intelligence (0.32)

arXiv.org Artificial IntelligenceSep-30-2025

Disentangling Score Content and Performance Style for Joint Piano Rendering and Transcription

Zeng, Wei, Zhao, Junchuan, Wang, Ye

Expressive performance rendering (EPR) and automatic piano transcription (APT) are fundamental yet inverse tasks in music information retrieval: EPR generates expressive performances from symbolic scores, while APT recovers scores from performances. Despite their dual nature, prior work has addressed them independently. In this paper we propose a unified framework that jointly models EPR and APT by disentangling note-level score content and global performance style representations from both paired and unpaired data. Our framework is built on a transformer-based sequence-to-sequence architecture and is trained using only sequence-aligned data, without requiring fine-grained note-level alignment. To automate the rendering process while ensuring stylistic compatibility with the score, we introduce an independent diffusion-based performance style recommendation module that generates style embeddings directly from score content. This modular component supports both style transfer and flexible rendering across a range of expressive styles. Experimental results from both objective and subjective evaluations demonstrate that our framework achieves competitive performance on EPR and APT tasks, while enabling effective content-style disentanglement, reliable style transfer, and stylistically appropriate rendering. Demos are available at https://jointpianist.github.io/epr-apt/

large language model, machine learning, natural language, (19 more...)

2509.23878

Country:

Europe (1.00)
Asia (1.00)
North America > United States > California (0.67)

Genre: Research Report > New Finding (0.67)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Speech (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)

arXiv.org Artificial IntelligenceJul-21-2025

CSD-VAR: Content-Style Decomposition in Visual Autoregressive Models

Nguyen, Quang-Binh, Luu, Minh, Nguyen, Quang, Tran, Anh, Nguyen, Khoi

Disentangling content and style from a single image, known as content-style decomposition (CSD), enables recontextu-alization of extracted content and stylization of extracted styles, offering greater creative flexibility in visual synthesis. While recent personalization methods have explored the decomposition of explicit content style, they remain tailored for diffusion models. Meanwhile, Visual Autoregressive Modeling (VAR) has emerged as a promising alternative with a next-scale prediction paradigm, achieving performance comparable to that of diffusion models. In this paper, we explore VAR as a generative framework for CSD, leveraging its scale-wise generation process for improved disentanglement. T o this end, we propose CSD-VAR, a novel method that introduces three key innovations: (1) a scale-aware alternating optimization strategy that aligns content and style representation with their respective scales to enhance separation, (2) an SVD-based rectification method to mitigate content leakage into style representations, and (3) an Augmented Key-V alue (K-V) memory enhancing content identity preservation. T o benchmark this task, we introduce CSD-100, a dataset specifically designed for content-style decomposition, featuring diverse subjects rendered in various artistic styles. Experiments demonstrate that CSD-VAR outperforms prior approaches, achieving superior content preservation and stylization fidelity.

large language model, machine learning, natural language, (19 more...)

2507.13984

Country: Europe > Italy (0.29)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

arXiv.org Artificial IntelligenceApr-18-2025

ArtistAuditor: Auditing Artist Style Pirate in Text-to-Image Generation Models

Du, Linkang, Zhu, Zheng, Chen, Min, Su, Zhou, Ji, Shouling, Cheng, Peng, Chen, Jiming, Zhang, Zhikun

Text-to-image models based on diffusion processes, such as DALL-E, Stable Diffusion, and Midjourney, are capable of transforming texts into detailed images and have widespread applications in art and design. As such, amateur users can easily imitate professional-level paintings by collecting an artist's work and fine-tuning the model, leading to concerns about artworks' copyright infringement. To tackle these issues, previous studies either add visually imperceptible perturbation to the artwork to change its underlying styles (perturbation-based methods) or embed post-training detectable watermarks in the artwork (watermark-based methods). However, when the artwork or the model has been published online, i.e., modification to the original artwork or model retraining is not feasible, these strategies might not be viable. To this end, we propose a novel method for data-use auditing in the text-to-image generation model. The general idea of ArtistAuditor is to identify if a suspicious model has been finetuned using the artworks of specific artists by analyzing the features related to the style. Concretely, ArtistAuditor employs a style extractor to obtain the multi-granularity style representations and treats artworks as samplings of an artist's style. Then, ArtistAuditor queries a trained discriminator to gain the auditing decisions. The experimental results on six combinations of models and datasets show that ArtistAuditor can achieve high AUC values (> 0.937). By studying ArtistAuditor's transferability and core modules, we provide valuable insights into the practical implementation. Finally, we demonstrate the effectiveness of ArtistAuditor in real-world cases by an online platform Scenario. ArtistAuditor is open-sourced at https://github.com/Jozenn/ArtistAuditor.

artificial intelligence, deep learning, machine learning, (18 more...)

2504.13061

Country: Asia > China > Zhejiang Province (0.14)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Law > Intellectual Property & Technology Law (0.86)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.48)

Ostheimer, Phil, Kloft, Marius, Fellenz, Sophie

Challenging Assumptions in Learning Generic Text Style Embeddings

arXiv.org Artificial IntelligenceJan-27-2025

Recent advancements in language representation learning primarily emphasize language modeling for deriving meaningful representations, often neglecting style-specific considerations. This study addresses this gap by creating generic, sentence-level style embeddings crucial for style-centric tasks. Our approach is grounded on the premise that low-level text style changes can compose any high-level style. We hypothesize that applying this concept to representation learning enables the development of versatile text style embeddings. By fine-tuning a general-purpose text encoder using contrastive learning and standard cross-entropy loss, we aim to capture these low-level style shifts, anticipating that they offer insights applicable to high-level text styles. The outcomes prompt us to reconsider the underlying assumptions as the results do not always show that the learned style representations capture high-level text styles.

artificial intelligence, machine learning, natural language, (20 more...)

2501.16073

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.05)
Oceania > Australia > New South Wales > Sydney (0.04)
(10 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.34)