AITopics | Ge, Yanhao

Collaborating Authors

Ge, Yanhao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Neural Material Adaptor for Visual Grounding of Intrinsic Dynamics

Cao, Junyi, Guan, Shanyan, Ge, Yanhao, Li, Wei, Yang, Xiaokang, Ma, Chao

arXiv.org Artificial IntelligenceOct-10-2024

While humans effortlessly discern intrinsic dynamics and adapt to new scenarios, modern AI systems often struggle. Current methods for visual grounding of dynamics either use pure neural-network-based simulators (black box), which may violate physical laws, or traditional physical simulators (white box), which rely on expert-defined equations that may not fully capture actual dynamics. We propose the Neural Material Adaptor (NeuMA), which integrates existing physical laws with learned corrections, facilitating accurate learning of actual dynamics while maintaining the generalizability and interpretability of physical priors. Additionally, we propose Particle-GS, a particle-driven 3D Gaussian Splatting variant that bridges simulation and observed images, allowing back-propagate image gradients to optimize the simulator. Comprehensive experiments on various dynamics in terms of grounded particle accuracy, dynamic rendering quality, and generalization ability demonstrate that NeuMA can accurately capture intrinsic dynamics.

artificial intelligence, gaussian kernel, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2410.08257

Genre: Research Report > New Finding (0.93)

Industry:

Energy > Oil & Gas > Upstream (0.68)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

PostEdit: Posterior Sampling for Efficient Zero-Shot Image Editing

Tian, Feng, Li, Yixuan, Yan, Yichao, Guan, Shanyan, Ge, Yanhao, Yang, Xiaokang

arXiv.org Artificial IntelligenceOct-7-2024

Large text-to-image diffusion models Saharia et al. (2022); Pernias et al. (2024); Podell et al. (2024); Ramesh et al. (2022) have demonstrated significant capabilities in generating photorealistic images based on given textual prompts, facilitating both the creation and editing of real images. Current research Cao et al. (2023); Brack et al. (2024); Ju et al. (2024); Parmar et al. (2023); Wu & la Torre (2022); Xu et al. (2024) highlights three main challenges in image editing: controllability, background preservation, and efficiency. Specifically, the edited parts must align with the target prompt's concepts, while unedited regions should remain unchanged. Additionally, the editing process must be sufficiently efficient to support interactive tasks. There are two mainstream categories of image editing approaches, namely inversion-based and inversion-free methods, as illustrated in Figure 1. Inversion-based approaches Song et al. (2021a); Mokady et al. (2023); Wu & la Torre (2022); Huberman-Spiegelglas et al. (2024) progressively add noise to a clean image and then remove the noise conditioned on a given target prompt, utilizing large text-to-image diffusion models (i.e. Stable Diffusion Rombach et al. (2022)), to obtain the edited image. However, directly inverting the diffusion sampling process (e.g., DDIM Song et al. (2021a)) for reconstruction introduces bias from the initial image due to errors accumulated by an unconditional score term, as discussed in classifier-free guidance (CFG) Ho & Salimans (2022) and proven in App.

artificial intelligence, diffusion model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2410.04844

Genre: Research Report (1.00)

Industry: Media > Photography (0.83)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.40)

Add feedback