AITopics | Xu, Yuanlu

Collaborating Authors

Xu, Yuanlu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

VIVE3D: Viewpoint-Independent Video Editing using 3D-Aware GANs

Frühstück, Anna, Sarafianos, Nikolaos, Xu, Yuanlu, Wonka, Peter, Tung, Tony

arXiv.org Artificial IntelligenceMar-28-2023

We introduce VIVE3D, a novel approach that extends the capabilities of image-based 3D GANs to video editing and is able to represent the input video in an identity-preserving and temporally consistent way. We propose two new building blocks. First, we introduce a novel GAN inversion technique specifically tailored to 3D GANs by jointly embedding multiple frames and optimizing for the camera parameters. Second, besides traditional semantic face edits (e.g. for age and expression), we are the first to demonstrate edits that show novel views of the head enabled by the inherent properties of 3D GANs and our optical flow-guided compositing technique to combine the head with the background video. Our experiments demonstrate that VIVE3D generates high-fidelity face edits at consistent quality from a range of camera viewpoints which are composited with the original video in a temporally and spatially consistent manner.

artificial intelligence, machine learning, video, (17 more...)

arXiv.org Artificial Intelligence

2303.15893

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

Learning Pose Grammar to Encode Human Body Configuration for 3D Pose Estimation

Fang, Hao-Shu (Shanghai Jiao Tong University) | Xu, Yuanlu (University of California, Los Angeles) | Wang, Wenguan (Beijing Institute of Technology) | Liu, Xiaobai (San Diego State University) | Zhu, Song-Chun (University of California, Los Angeles)

AAAI ConferencesFeb-8-2018

In this paper, we propose a pose grammar to tackle the problem of 3D human pose estimation. Our model directly takes 2D pose as input and learns a generalized 2D-3D mapping function. The proposed model consists of a base network which efficiently captures pose-aligned features and a hierarchy of Bi-directional RNNs (BRNN) on the top to explicitly incorporate a set of knowledge regarding human body configuration (i.e., kinematics, symmetry, motor coordination). The proposed model thus enforces high-level constraints over human poses. In learning, we develop a pose sample simulator to augment training samples in virtual camera views, which further improves our model generalizability. We validate our method on public 3D human pose benchmarks and propose a new evaluation protocol working on cross-view setting to verify the generalization capability of different methods. We empirically observe that most state-of-the-art methods encounter difficulty under such setting while our method can well handle such challenges.

deep learning, pose estimation, video understanding, (21 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country:

North America > United States > California (0.28)
Asia (0.28)

Genre: Research Report (0.48)

Industry: Health & Medicine (0.62)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Vision > Video Understanding (0.74)

Add feedback

Scene-Centric Joint Parsing of Cross-View Videos

Qi, Hang (University of California, Los Angeles) | Xu, Yuanlu (University of California, Los Angeles) | Yuan, Tao (University of California, Los Angeles) | Wu, Tianfu (NC State University) | Zhu, Song-Chun (University of California, Los Angeles)

AAAI ConferencesFeb-8-2018

Cross-view video understanding is an important yet under-explored area in computer vision. In this paper, we introduce a joint parsing framework that integrates view-centric proposals into scene-centric parse graphs that represent a coherent scene-centric understanding of cross-view scenes. Our key observations are that overlapping fields of views embed rich appearance and geometry correlations and that knowledge fragments corresponding to individual vision tasks are governed by consistency constraints available in commonsense knowledge. The proposed joint parsing framework represents such correlations and constraints explicitly and generates semantic scene-centric parse graphs. Quantitative experiments show that scene-centric predictions in the parse graph outperform view-centric predictions.

Add feedback

Cross-View People Tracking by Scene-Centered Spatio-Temporal Parsing

Xu, Yuanlu (University of California, Los Angeles) | Liu, Xiaobai (San Diego State University) | Qin, Lei (Chinese Academy of Sciences) | Zhu, Song-Chun (University of California, Los Angeles)

AAAI ConferencesFeb-14-2017

In this paper, we propose a Spatio-temporal Attributed Parse Graph (ST-APG) to integrate semantic attributes with trajectories for cross-view people tracking. Given videos from multiple cameras with overlapping field of view (FOV), our goal is to parse the videos and organize the trajectories of all targets into a scene-centered representation. We leverage rich semantic attributes of human, e.g., facing directions, postures and actions, to enhance cross-view tracklet associations, besides frequently used appearance and geometry features in the literature.In particular, the facing direction of a human in 3D, once detected, often coincides with his/her moving direction or trajectory. Similarly, the actions of humans, once recognized, provide strong cues for distinguishing one subject from the others. The inference is solved by iteratively grouping tracklets with cluster sampling and estimating people semantic attributes by dynamic programming.In experiments, we validate our method on one public dataset and create another new dataset that records people's daily life in public, e.g., food court, office reception and plaza, each of which includes 3-4 cameras. We evaluate the proposed method on these challenging videos and achieve promising multi-view tracking results.

Add feedback