AITopics | Jin, Xiaogang

Plotting

Jin, Xiaogang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DreamMat: High-quality PBR Material Generation with Geometry- and Light-aware Diffusion Models

Zhang, Yuqing, Liu, Yuan, Xie, Zhiyu, Yang, Lei, Liu, Zhongyuan, Yang, Mengzhou, Zhang, Runze, Kou, Qilong, Lin, Cheng, Wang, Wenping, Jin, Xiaogang

arXiv.org Artificial IntelligenceMay-27-2024

2D diffusion model, which often contains unwanted baked-in shading effects and results in unrealistic rendering effects in the downstream applications. Generating Physically Based Rendering (PBR) materials instead of just RGB textures would be a promising solution. However, directly distilling the PBR material parameters from 2D diffusion models still suffers from incorrect material decomposition, such as baked-in shading effects in albedo. We introduce DreamMat, an innovative approach to resolve the aforementioned problem, to generate high-quality PBR materials from text descriptions. We find out that the main reason for the incorrect material distillation is that large-scale 2D diffusion models are only trained to generate final shading colors, resulting in insufficient constraints on material decomposition during distillation. To tackle this problem, we first finetune a new light-aware 2D diffusion model to condition on a given lighting environment and generate the shading results on this specific lighting condition. Then, by applying the same environment lights in the material distillation, DreamMat can generate high-quality PBR materials that are not only consistent with the given geometry but also free from any baked-in shading effects in albedo. Extensive experiments demonstrate that the materials produced through our methods exhibit greater visual appeal to users and achieve significantly superior rendering quality compared to baseline methods, which are preferable for downstream tasks such as game and film production.

artificial intelligence, diffusion model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2405.17176

Country: Asia > China > Zhejiang Province (0.14)

Genre: Research Report > Promising Solution (0.68)

Industry:

Energy > Oil & Gas (0.34)
Media > Film (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

SocialCVAE: Predicting Pedestrian Trajectory via Interaction Conditioned Latents

Xiang, Wei, Yin, Haoteng, Wang, He, Jin, Xiaogang

arXiv.org Artificial IntelligenceFeb-27-2024

Pedestrian trajectory prediction is the key technology in many applications for providing insights into human behavior and anticipating human future motions. Most existing empirical models are explicitly formulated by observed human behaviors using explicable mathematical terms with a deterministic nature, while recent work has focused on developing hybrid models combined with learning-based techniques for powerful expressiveness while maintaining explainability. However, the deterministic nature of the learned steering behaviors from the empirical models limits the models' practical performance. To address this issue, this work proposes the social conditional variational autoencoder (SocialCVAE) for predicting pedestrian trajectories, which employs a CVAE to explore behavioral uncertainty in human motion decisions. SocialCVAE learns socially reasonable motion randomness by utilizing a socially explainable interaction energy map as the CVAE's condition, which illustrates the future occupancy of each pedestrian's local neighborhood area. The energy map is generated using an energy-based interaction model, which anticipates the energy cost (i.e., repulsion intensity) of pedestrians' interactions with neighbors. Experimental results on two public benchmarks including 25 scenes demonstrate that SocialCVAE significantly improves prediction accuracy compared with the state-of-the-art methods, with up to 16.85% improvement in Average Displacement Error (ADE) and 69.18% improvement in Final Displacement Error (FDE).

artificial intelligence, machine learning, trajectory, (16 more...)

arXiv.org Artificial Intelligence

2402.17339

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Communications (0.93)
Information Technology > Graphics (0.93)

Add feedback

A General Implicit Framework for Fast NeRF Composition and Rendering

Gao, Xinyu, Yang, Ziyi, Zhao, Yunlu, Sun, Yuxiang, Jin, Xiaogang, Zou, Changqing

arXiv.org Artificial IntelligenceJan-4-2024

A variety of Neural Radiance Fields (NeRF) methods have recently achieved remarkable success in high render speed. However, current accelerating methods are specialized and incompatible with various implicit methods, preventing real-time composition over various types of NeRF works. Because NeRF relies on sampling along rays, it is possible to provide general guidance for acceleration. To that end, we propose a general implicit pipeline for composing NeRF objects quickly. Our method enables the casting of dynamic shadows within or between objects using analytical light sources while allowing multiple NeRF objects to be seamlessly placed and rendered together with any arbitrary rigid transformations. Mainly, our work introduces a new surface representation known as Neural Depth Fields (NeDF) that quickly determines the spatial relationship between objects by allowing direct intersection computation between rays and implicit surfaces. It leverages an intersection neural network to query NeRF for acceleration instead of depending on an explicit spatial structure.Our proposed method is the first to enable both the progressive and interactive composition of NeRF objects. Additionally, it also serves as a previewing plugin for a range of existing NeRF works.

artificial intelligence, machine learning, nerf, (16 more...)

arXiv.org Artificial Intelligence

2308.04669

Country:

North America > United States (0.47)
Asia > Middle East > Israel (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

On Optimal Sampling for Learning SDF Using MLPs Equipped with Positional Encoding

Lin, Guying, Yang, Lei, Liu, Yuan, Zhang, Congyi, Hou, Junhui, Jin, Xiaogang, Komura, Taku, Keyser, John, Wang, Wenping

arXiv.org Artificial IntelligenceJan-2-2024

Neural implicit fields, such as the neural signed distance field (SDF) of a shape, have emerged as a powerful representation for many applications, e.g., encoding a 3D shape and performing collision detection. Typically, implicit fields are encoded by Multi-layer Perceptrons (MLP) with positional encoding (PE) to capture high-frequency geometric details. However, a notable side effect of such PE-equipped MLPs is the noisy artifacts present in the learned implicit fields. While increasing the sampling rate could in general mitigate these artifacts, in this paper we aim to explain this adverse phenomenon through the lens of Fourier analysis. We devise a tool to determine the appropriate sampling rate for learning an accurate neural implicit field without undesirable side effects. Specifically, we propose a simple yet effective method to estimate the intrinsic frequency of a given network with randomized weights based on the Fourier analysis of the network's responses. It is observed that a PE-equipped MLP has an intrinsic frequency much higher than the highest frequency component in the PE layer. Sampling against this intrinsic frequency following the Nyquist-Sannon sampling theorem allows us to determine an appropriate training sampling rate. We empirically show in the setting of SDF fitting that this recommended sampling rate is sufficient to secure accurate fitting results, while further increasing the sampling rate would not further noticeably reduce the fitting error. Training PE-equipped MLPs simply with our sampling strategy leads to performances superior to the existing methods.

artificial intelligence, frequency, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2401.01391

Country: North America > United States > Texas (0.14)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.68)

Add feedback

A Locality-based Neural Solver for Optical Motion Capture

Pan, Xiaoyu, Zheng, Bowen, Jiang, Xinwei, Xu, Guanglong, Gu, Xianli, Li, Jingxiang, Kou, Qilong, Wang, He, Shao, Tianjia, Zhou, Kun, Jin, Xiaogang

arXiv.org Artificial IntelligenceSep-4-2023

We present a novel locality-based learning method for cleaning and solving optical motion capture data. Given noisy marker data, we propose a new heterogeneous graph neural network which treats markers and joints as different types of nodes, and uses graph convolution operations to extract the local features of markers and joints and transform them to clean motions. To deal with anomaly markers (e.g. occluded or with big tracking errors), the key insight is that a marker's motion shows strong correlations with the motions of its immediate neighboring markers but less so with other markers, a.k.a. locality, which enables us to efficiently fill missing markers (e.g. due to occlusion). Additionally, we also identify marker outliers due to tracking errors by investigating their acceleration profiles. Finally, we propose a training regime based on representation learning and data augmentation, by training the model on data with masking. The masking schemes aim to mimic the occluded and noisy markers often observed in the real data. Finally, we show that our method achieves high accuracy on multiple metrics across various datasets. Extensive comparison shows our method outperforms state-of-the-art methods in terms of prediction accuracy of occluded marker position error by approximately 20%, which leads to a further error reduction on the reconstructed joint rotations and positions by 30%. The code and data for this paper are available at https://github.com/non-void/LocalMoCap.

artificial intelligence, machine learning, occlusion, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3610548.3618148

2309.00428

Country: Asia > China (0.47)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Vision > Video Understanding (0.74)

Add feedback

Diverse facial inpainting guided by exemplars

Lu, Wanglong, Zhao, Hanli, Jiang, Xianta, Jin, Xiaogang, Wang, Min, Lyu, Jiankai, Shi, Kaijie

arXiv.org Artificial IntelligenceFeb-15-2022

Facial image inpainting is a task of filling visually realistic and semantically meaningful contents for missing or masked pixels in a face image. Although existing methods have made significant progress in achieving high visual quality, the controllable diversity of facial image inpainting remains an open problem in this field. This paper introduces EXE-GAN, a novel diverse and interactive facial inpainting framework, which can not only preserve the high-quality visual effect of the whole image but also complete the face image with exemplar-like facial attributes. The proposed facial inpainting is achieved based on generative adversarial networks by leveraging the global style of input image, the stochastic style, and the exemplar style of exemplar image. A novel attribute similarity metric is introduced to encourage networks to learn the style of facial attributes from the exemplar in a self-supervised way. To guarantee the natural transition across the boundary of inpainted regions, a novel spatial variant gradient backpropagation technique is designed to adjust the loss gradients based on the spatial location. A variety of experimental results and comparisons on public CelebA-HQ and FFHQ datasets are presented to demonstrate the superiority of the proposed method in terms of both the quality and diversity in facial inpainting.

artificial intelligence, diverse facial, exemplar

arXiv.org Artificial Intelligence

2202.06358

Genre: Research Report (0.40)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.93)

Add feedback