AITopics | Yu, Hong-Xing

Collaborating Authors

Yu, Hong-Xing

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation

Zhang, Tianyuan, Yu, Hong-Xing, Wu, Rundi, Feng, Brandon Y., Zheng, Changxi, Snavely, Noah, Wu, Jiajun, Freeman, William T.

arXiv.org Artificial IntelligenceApr-19-2024

Realistic object interactions are crucial for creating immersive virtual experiences, yet synthesizing realistic 3D object dynamics in response to novel interactions remains a significant challenge. Unlike unconditional or text-conditioned dynamics generation, action-conditioned dynamics requires perceiving the physical material properties of objects and grounding the 3D motion prediction on these properties, such as object stiffness. However, estimating physical material properties is an open problem due to the lack of material ground-truth data, as measuring these properties for real objects is highly difficult. We present Phys-Dreamer, a physics-based approach that endows static 3D objects with interactive dynamics by leveraging the object dynamics priors learned by video generation models. By distilling these priors, PhysDreamer enables the synthesis of realistic object responses to novel interactions, such as external forces or agent manipulations. We demonstrate our approach on diverse examples of elastic objects and evaluate the realism of the synthesized interactions through a user study. PhysDreamer takes a step towards more engaging and realistic virtual experiences by enabling static 3D objects to dynamically respond to interactive stimuli in a physically plausible manner. See our project page at https: //physdreamer.github.io/.

arxiv preprint arxiv, machine learning, natural language, (13 more...)

arXiv.org Artificial Intelligence

2404.13026

Country:

North America > United States (0.14)
Asia > Japan > Honshū > Chūbu (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Model-Based Reasoning (0.61)

Add feedback

Fluid Simulation on Neural Flow Maps

Deng, Yitong, Yu, Hong-Xing, Zhang, Diyang, Wu, Jiajun, Zhu, Bo

arXiv.org Artificial IntelligenceDec-22-2023

We introduce Neural Flow Maps, a novel simulation method bridging the emerging paradigm of implicit neural representations with fluid simulation based on the theory of flow maps, to achieve state-of-the-art simulation of inviscid fluid phenomena. We devise a novel hybrid neural field representation, Spatially Sparse Neural Fields (SSNF), which fuses small neural networks with a pyramid of overlapping, multi-resolution, and spatially sparse grids, to compactly represent long-term spatiotemporal velocity fields at high accuracy. With this neural velocity buffer in hand, we compute long-term, bidirectional flow maps and their Jacobians in a mechanistically symmetric manner, to facilitate drastic accuracy improvement over existing solutions. These long-range, bidirectional flow maps enable high advection accuracy with low dissipation, which in turn facilitates high-fidelity incompressible flow simulations that manifest intricate vortical structures. We demonstrate the efficacy of our neural fluid simulation in a variety of challenging simulation scenarios, including leapfrogging vortices, colliding vortices, vortex reconnections, as well as vortex generation from moving obstacles and density differences. Our examples show increased performance over existing methods in terms of energy conservation, visual complexity, adherence to experimental observations, and preservation of detailed vortical structures.

artificial intelligence, machine learning, simulation, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3618392

2312.14635

Country: North America > United States > California (0.28)

Genre: Research Report (0.50)

Industry: Energy > Oil & Gas > Upstream (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)
Information Technology > Modeling & Simulation (0.92)

Add feedback

3D Copy-Paste: Physically Plausible Object Insertion for Monocular 3D Detection

Ge, Yunhao, Yu, Hong-Xing, Zhao, Cheng, Guo, Yuliang, Huang, Xinyu, Ren, Liu, Itti, Laurent, Wu, Jiajun

arXiv.org Artificial IntelligenceDec-8-2023

A major challenge in monocular 3D object detection is the limited diversity and quantity of objects in real datasets. While augmenting real scenes with virtual objects holds promise to improve both the diversity and quantity of the objects, it remains elusive due to the lack of an effective 3D object insertion method in complex real captured scenes. In this work, we study augmenting complex real indoor scenes with virtual objects for monocular 3D object detection. The main challenge is to automatically identify plausible physical properties for virtual assets (e.g., locations, appearances, sizes, etc.) in cluttered real scenes. To address this challenge, we propose a physically plausible indoor 3D object insertion approach to automatically copy virtual objects and paste them into real scenes. The resulting objects in scenes have 3D bounding boxes with plausible physical locations and appearances. In particular, our method first identifies physically feasible locations and poses for the inserted objects to prevent collisions with the existing room layout. Subsequently, it estimates spatially-varying illumination for the insertion location, enabling the immersive blending of the virtual objects into the original scene with plausible appearances and cast shadows. We show that our augmentation method significantly improves existing monocular 3D object models and achieves state-of-the-art performance. For the first time, we demonstrate that a physically plausible 3D object insertion, serving as a generative data augmentation technique, can lead to significant improvements for discriminative downstream tasks such as monocular 3D object detection. Project website: https://gyhandy.github.io/3D-Copy-Paste/

artificial intelligence, detection, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2312.05277

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Are These the Same Apple? Comparing Images Based on Object Intrinsics

Kotar, Klemen, Tian, Stephen, Yu, Hong-Xing, Yamins, Daniel L. K., Wu, Jiajun

arXiv.org Artificial IntelligenceNov-1-2023

The human visual system can effortlessly recognize an object under different extrinsic factors such as lighting, object poses, and background, yet current computer vision systems often struggle with these variations. An important step to understanding and improving artificial vision systems is to measure image similarity purely based on intrinsic object properties that define object identity. This problem has been studied in the computer vision literature as re-identification, though mostly restricted to specific object categories such as people and cars. We propose to extend it to general object categories, exploring an image similarity metric based on object intrinsics. To benchmark such measurements, we collect the Common paired objects Under differenT Extrinsics (CUTE) dataset of $18,000$ images of $180$ objects under different extrinsic factors such as lighting, poses, and imaging conditions. While existing methods such as LPIPS and CLIP scores do not measure object intrinsics well, we find that combining deep features learned from contrastive self-supervised learning with foreground filtering is a simple yet effective approach to approximating the similarity. We conduct an extensive survey of pre-trained features and foreground extraction methods to arrive at a strong baseline that best measures intrinsic object-centric image similarity among current methods. Finally, we demonstrate that our approach can aid in downstream applications such as acting as an analog for human subjects and improving generalizable re-identification. Please see our project website at https://s-tian.github.io/projects/cute/ for visualizations of the data and demos of our metric.

artificial intelligence, object intrinsic, object-oriented architecture, (2 more...)

arXiv.org Artificial Intelligence

2311.0075

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.73)

Add feedback

Unsupervised Discovery and Composition of Object Light Fields

Smith, Cameron, Yu, Hong-Xing, Zakharov, Sergey, Durand, Fredo, Tenenbaum, Joshua B., Wu, Jiajun, Sitzmann, Vincent

arXiv.org Artificial IntelligenceJul-15-2023

Neural scene representations, both continuous and discrete, have recently emerged as a powerful new paradigm for 3D scene understanding. Recent efforts have tackled unsupervised discovery of object-centric neural scene representations. However, the high cost of ray-marching, exacerbated by the fact that each object representation has to be ray-marched separately, leads to insufficiently sampled radiance fields and thus, noisy renderings, poor framerates, and high memory and time complexity during training and rendering. Here, we propose to represent objects in an object-centric, compositional scene representation as light fields. We propose a novel light field compositor module that enables reconstructing the global light field from a set of object-centric light fields. Dubbed Compositional Object Light Fields (COLF), our method enables unsupervised learning of object-centric neural scene representations, state-of-the-art reconstruction and novel view synthesis performance on standard datasets, and rendering and training speeds at orders of magnitude faster than existing 3D approaches.

artificial intelligence, machine learning, representation, (14 more...)

arXiv.org Artificial Intelligence

2205.03923

Country: North America > United States > Oklahoma > Beaver County (1.00)

Genre: Research Report > Promising Solution (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)

Add feedback

Multi-Object Manipulation via Object-Centric Neural Scattering Functions

Tian, Stephen, Cai, Yancheng, Yu, Hong-Xing, Zakharov, Sergey, Liu, Katherine, Gaidon, Adrien, Li, Yunzhu, Wu, Jiajun

arXiv.org Artificial IntelligenceJun-14-2023

Learned visual dynamics models have proven effective for robotic manipulation tasks. Yet, it remains unclear how best to represent scenes involving multi-object interactions. Current methods decompose a scene into discrete objects, but they struggle with precise modeling and manipulation amid challenging lighting conditions as they only encode appearance tied with specific illuminations. In this work, we propose using object-centric neural scattering functions (OSFs) as object representations in a model-predictive control framework. OSFs model per-object light transport, enabling compositional scene re-rendering under object rearrangement and varying lighting conditions. By combining this approach with inverse parameter estimation and graph-based neural dynamics models, we demonstrate improved model-predictive control performance and generalization in compositional multi-object environments, even in previously unseen scenarios and harsh lighting conditions.

artificial intelligence, estimation, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2306.08748

Genre:

Workflow (0.94)
Research Report (0.82)

Industry: Energy > Oil & Gas (0.70)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Differentiable Physics Simulation of Dynamics-Augmented Neural Objects

Cleac'h, Simon Le, Yu, Hong-Xing, Guo, Michelle, Howell, Taylor A., Gao, Ruohan, Wu, Jiajun, Manchester, Zachary, Schwager, Mac

arXiv.org Artificial IntelligenceMar-13-2023

We present a differentiable pipeline for simulating the motion of objects that represent their geometry as a continuous density field parameterized as a deep network. This includes Neural Radiance Fields (NeRFs), and other related models. From the density field, we estimate the dynamical properties of the object, including its mass, center of mass, and inertia matrix. We then introduce a differentiable contact model based on the density field for computing normal and friction forces resulting from collisions. This allows a robot to autonomously build object models that are visually and \emph{dynamically} accurate from still images and videos of objects in motion. The resulting Dynamics-Augmented Neural Objects (DANOs) are simulated with an existing differentiable simulation engine, Dojo, interacting with other standard simulation objects, such as spheres, planes, and robots specified as URDFs. A robot can use this simulation to optimize grasps and manipulation trajectories of neural objects, or to improve the neural object models through gradient-based real-to-simulation transfer. We demonstrate the pipeline to learn the coefficient of friction of a bar of soap from a real video of the soap sliding on a table. We also learn the coefficient of friction and mass of a Stanford bunny through interactions with a Panda robot arm from synthetic data, and we optimize trajectories in simulation for the Panda arm to push the bunny to a goal location.

artificial intelligence, density field, simulation, (18 more...)

arXiv.org Artificial Intelligence

2210.0942

Country: North America > United States (0.68)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.34)

Add feedback

Unsupervised Discovery of Object Radiance Fields

Yu, Hong-Xing, Guibas, Leonidas J., Wu, Jiajun

arXiv.org Artificial IntelligenceJul-16-2021

We study the problem of inferring an object-centric scene representation from a single image, aiming to derive a representation that explains the image formation process, captures the scene's 3D nature, and is learned without supervision. Most existing methods on scene decomposition lack one or more of these characteristics, due to the fundamental challenge in integrating the complex 3D-to-2D image formation process into powerful inference schemes like deep networks. In this paper, we propose unsupervised discovery of Object Radiance Fields (uORF), integrating recent progresses in neural 3D scene representations and rendering with deep inference networks for unsupervised 3D scene decomposition. Trained on multi-view RGB images without annotations, uORF learns to decompose complex scenes with diverse, textured background from a single image. We show that uORF performs well on unsupervised 3D scene segmentation, novel view synthesis, and scene editing on three datasets.

arxiv preprint arxiv, deep learning, neural network, (14 more...)

arXiv.org Artificial Intelligence

2107.07905

Genre: Research Report (0.82)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

DSRGAN: Explicitly Learning Disentangled Representation of Underlying Structure and Rendering for Image Generation without Tuple Supervision

Hao, Guang-Yuan, Yu, Hong-Xing, Zheng, Wei-Shi

arXiv.org Machine LearningSep-30-2019

We focus on explicitly learning disentangled representation for natural image generation, where the underlying spatial structure and the rendering on the structure can be independently controlled respectively, yet using no tuple supervision. The setting is significant since tuple supervision is costly and sometimes even unavailable. However, the task is highly unconstrained and thus ill-posed. To address this problem, we propose to introduce an auxiliary domain which shares a common underlying-structure space with the target domain, and we make a partially shared latent space assumption. The key idea is to encourage the partially shared latent variable to represent the similar underlying spatial structures in both domains, while the two domain-specific latent variables will be unavoidably arranged to present renderings of two domains respectively. This is achieved by designing two parallel generative networks with a common Progressive Rendering Architecture (PRA), which constrains both generative networks' behaviors to model shared underlying structure and to model spatially dependent relation between rendering and underlying structure. Thus, we propose DSRGAN (GANs for Disentangling Underlying Structure and Rendering) to instantiate our method. We also propose a quantitative criterion (the Normalized Disentanglability) to quantify disentanglability. Comparison to the state-of-the-art methods shows that DSRGAN can significantly outperform them in disentanglability.

artificial intelligence, neural network, spatial structure, (15 more...)

arXiv.org Machine Learning

1909.13501

Country: Asia > China (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

MIXGAN: Learning Concepts from Different Domains for Mixture Generation

Hao, Guang-Yuan, Yu, Hong-Xing, Zheng, Wei-Shi

arXiv.org Machine LearningJul-4-2018

In this work, we present an interesting attempt on mixture generation: absorbing different image concepts (e.g., content and style) from different domains and thus generating a new domain with learned concepts. In particular, we propose a mixture generative adversarial network (MIXGAN). MIXGAN learns concepts of content and style from two domains respectively, and thus can join them for mixture generation in a new domain, i.e., generating images with content from one domain and style from another. MIXGAN overcomes the limitation of current GAN-based models which either generate new images in the same domain as they observed in training stage, or require off-the-shelf content templates for transferring or translation. Extensive experimental results demonstrate the effectiveness of MIXGAN as compared to related state-of-the-art GAN-based models.

artificial intelligence, mixture generation, neural network, (19 more...)

arXiv.org Machine Learning

1807.01659

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback