AITopics | Yan, Yichao

Collaborating Authors

Yan, Yichao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Towards High-fidelity 3D Talking Avatar with Personalized Dynamic Texture

Li, Xuanchen, Wang, Jianyu, Cheng, Yuhao, Zeng, Yikun, Ren, Xingyu, Zhu, Wenhan, Zhao, Weiming, Yan, Yichao

arXiv.org Artificial IntelligenceMar-1-2025

Significant progress has been made for speech-driven 3D face animation, but most works focus on learning the motion of mesh/geometry, ignoring the impact of dynamic texture. In this work, we reveal that dynamic texture plays a key role in rendering high-fidelity talking avatars, and introduce a high-resolution 4D dataset \textbf{TexTalk4D}, consisting of 100 minutes of audio-synced scan-level meshes with detailed 8K dynamic textures from 100 subjects. Based on the dataset, we explore the inherent correlation between motion and texture, and propose a diffusion-based framework \textbf{TexTalker} to simultaneously generate facial motions and dynamic textures from speech. Furthermore, we propose a novel pivot-based style injection strategy to capture the complicity of different texture and motion styles, which allows disentangled control. TexTalker, as the first method to generate audio-synced facial motion with dynamic texture, not only outperforms the prior arts in synthesising facial motions, but also produces realistic textures that are consistent with the underlying facial movements. Project page: https://xuanchenli.github.io/TexTalk/.

artificial intelligence, machine learning, texture, (18 more...)

arXiv.org Artificial Intelligence

2503.00495

Country: Asia (0.28)

Genre: Research Report (0.82)

Industry: Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Multimodal Latent Diffusion Model for Complex Sewing Pattern Generation

Liu, Shengqi, Cheng, Yuhao, Chen, Zhuo, Ren, Xingyu, Zhu, Wenhan, Li, Lincheng, Bi, Mengxiao, Yang, Xiaokang, Yan, Yichao

arXiv.org Artificial IntelligenceDec-18-2024

Generating sewing patterns in garment design is receiving increasing attention due to its CG-friendly and flexible-editing nature. Previous sewing pattern generation methods have been able to produce exquisite clothing, but struggle to design complex garments with detailed control. To address these issues, we propose SewingLDM, a multi-modal generative model that generates sewing patterns controlled by text prompts, body shapes, and garment sketches. Initially, we extend the original vector of sewing patterns into a more comprehensive representation to cover more intricate details and then compress them into a compact latent space. To learn the sewing pattern distribution in the latent space, we design a two-step training strategy to inject the multi-modal conditions, \ie, body shapes, text prompts, and garment sketches, into a diffusion model, ensuring the generated garments are body-suited and detail-controlled. Comprehensive qualitative and quantitative experiments show the effectiveness of our proposed method, significantly surpassing previous approaches in terms of complex garment design and various body adaptability. Our project page: https://shengqiliu1.github.io/SewingLDM.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2412.14453

Country: Asia (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)

Add feedback

PostEdit: Posterior Sampling for Efficient Zero-Shot Image Editing

Tian, Feng, Li, Yixuan, Yan, Yichao, Guan, Shanyan, Ge, Yanhao, Yang, Xiaokang

arXiv.org Artificial IntelligenceOct-7-2024

Large text-to-image diffusion models Saharia et al. (2022); Pernias et al. (2024); Podell et al. (2024); Ramesh et al. (2022) have demonstrated significant capabilities in generating photorealistic images based on given textual prompts, facilitating both the creation and editing of real images. Current research Cao et al. (2023); Brack et al. (2024); Ju et al. (2024); Parmar et al. (2023); Wu & la Torre (2022); Xu et al. (2024) highlights three main challenges in image editing: controllability, background preservation, and efficiency. Specifically, the edited parts must align with the target prompt's concepts, while unedited regions should remain unchanged. Additionally, the editing process must be sufficiently efficient to support interactive tasks. There are two mainstream categories of image editing approaches, namely inversion-based and inversion-free methods, as illustrated in Figure 1. Inversion-based approaches Song et al. (2021a); Mokady et al. (2023); Wu & la Torre (2022); Huberman-Spiegelglas et al. (2024) progressively add noise to a clean image and then remove the noise conditioned on a given target prompt, utilizing large text-to-image diffusion models (i.e. Stable Diffusion Rombach et al. (2022)), to obtain the edited image. However, directly inverting the diffusion sampling process (e.g., DDIM Song et al. (2021a)) for reconstruction introduces bias from the initial image due to errors accumulated by an unconditional score term, as discussed in classifier-free guidance (CFG) Ho & Salimans (2022) and proven in App.

artificial intelligence, diffusion model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2410.04844

Genre: Research Report (1.00)

Industry: Media > Photography (0.83)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.40)

Add feedback

HIMO: A New Benchmark for Full-Body Human Interacting with Multiple Objects

Lv, Xintao, Xu, Liang, Yan, Yichao, Jin, Xin, Xu, Congsheng, Wu, Shuwen, Liu, Yifan, Li, Lincheng, Bi, Mengxiao, Zeng, Wenjun, Yang, Xiaokang

arXiv.org Artificial IntelligenceJul-17-2024

Generating human-object interactions (HOIs) is critical with the tremendous advances of digital avatars. Existing datasets are typically limited to humans interacting with a single object while neglecting the ubiquitous manipulation of multiple objects. Thus, we propose HIMO, a large-scale MoCap dataset of full-body human interacting with multiple objects, containing 3.3K 4D HOI sequences and 4.08M 3D HOI frames. We also annotate HIMO with detailed textual descriptions and temporal segments, benchmarking two novel tasks of HOI synthesis conditioned on either the whole text prompt or the segmented text prompts as fine-grained timeline control. To address these novel tasks, we propose a dual-branch conditional diffusion model with a mutual interaction module for HOI synthesis. Besides, an auto-regressive generation pipeline is also designed to obtain smooth transitions between HOI segments. Experimental results demonstrate the generalization ability to unseen object geometries and temporal compositions.

artificial intelligence, dataset, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2407.12371

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Infusion: Preventing Customized Text-to-Image Diffusion from Overfitting

Zeng, Weili, Yan, Yichao, Zhu, Qi, Chen, Zhuo, Chu, Pengzhi, Zhao, Weiming, Yang, Xiaokang

arXiv.org Artificial IntelligenceApr-22-2024

Text-to-image (T2I) customization aims to create images that embody specific visual concepts delineated in textual descriptions. However, existing works still face a main challenge, concept overfitting. To tackle this challenge, we first analyze overfitting, categorizing it into concept-agnostic overfitting, which undermines non-customized concept knowledge, and concept-specific overfitting, which is confined to customize on limited modalities, i.e, backgrounds, layouts, styles. To evaluate the overfitting degree, we further introduce two metrics, i.e, Latent Fisher divergence and Wasserstein metric to measure the distribution changes of non-customized and customized concept respectively. Drawing from the analysis, we propose Infusion, a T2I customization method that enables the learning of target concepts to avoid being constrained by limited training modalities, while preserving non-customized knowledge. Remarkably, Infusion achieves this feat with remarkable efficiency, requiring a mere 11KB of trained parameters. Extensive experiments also demonstrate that our approach outperforms state-of-the-art methods in both single and multi-concept customized generation.

artificial intelligence, arxiv preprint arxiv, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2404.14007

Country:

Oceania > Australia (0.17)
Asia > China (0.16)
North America > United States (0.14)

Genre: Research Report > Promising Solution (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

ReGenNet: Towards Human Action-Reaction Synthesis

Xu, Liang, Zhou, Yizhou, Yan, Yichao, Jin, Xin, Zhu, Wenhan, Rao, Fengyun, Yang, Xiaokang, Zeng, Wenjun

arXiv.org Artificial IntelligenceMar-18-2024

In this paper, we focus on generative models for static scenes and objects, while the dynamic human actionreaction human action-reaction synthesis, i.e., generating human reactions synthesis for ubiquitous causal human-human interactions given the action sequence of another as conditions. is less explored. Human-human interactions We believe this task will contribute to many applications in can be regarded as asymmetric with actors and reactors AR/VR, games, human-robot interaction, and embodied AI. in atomic interaction periods. In this paper, we comprehensively Modeling human-human interactions is a challenging analyze the asymmetric, dynamic, synchronous, task with the following features: 1) Asymmetric, i.e., the and detailed nature of human-human interactions and propose actor and reactor play asymmetric roles during a causal interaction, the first multi-setting human action-reaction synthesis where one person acts, and the other reacts [78]; benchmark to generate human reactions conditioned on 2) Dynamic, i.e., during the interaction period, the two people given human actions. To begin with, we propose to annotate constantly wave their body parts, move close/away, and the actor-reactor order of the interaction sequences change relative orientations, spatially and temporally; 3) for the NTU120, InterHuman, and Chi3D datasets. Based Synchronous, i.e., typically, one person responds instantly on them, a diffusion-based generative model with a Transformer with others such as an immediate evasion when someone decoder architecture called ReGenNet together with throws a punch, thus the online generation is required; 4) an explicit distance-based interaction loss is proposed to Detailed, i.e., the interaction between humans involves not predict human reactions in an online manner, where the future only coarse body movements together with relative position states of actors are unavailable to reactors.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2403.11882

Country: Asia > China (0.28)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
(2 more...)

Add feedback

Video Summarization via Semantic Attended Networks

Wei, Huawei (Shanghai Jiao Tong University) | Ni, Bingbing (Shanghai Jiao Tong University) | Yan, Yichao (Shanghai Jiao Tong University) | Yu, Huanyu (Shanghai Jiao Tong University) | Yang, Xiaokang (Shanghai Jiao Tong University) | Yao, Chen (The Third Institute of Ministry of Public Security)

AAAI ConferencesFeb-8-2018

The goal of video summarization is to distill a raw video into a more compact form without losing much semantic information. However, previous methods mainly consider the diversity and representation interestingness of the obtained summary, and they seldom pay sufficient attention to semantic information of resulting frame set, especially the long temporal range semantics. To explicitly address this issue, we propose a novel technique which is able to extract the most semantically relevant video segments (i.e., valid for a long term temporal duration) and assemble them into an informative summary. To this end, we develop a semantic attended video summarization network (SASUM) which consists of a frame selector and video descriptor to select an appropriate number of video shots by minimizing the distance between the generated description sentence of the summarized video and the human annotated text of the original video. Extensive experiments show that our method achieves a superior performance gain over previous methods on two benchmark datasets.

deep learning, neural network, video summarization, (19 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country: Asia > China (0.14)

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)

Add feedback