AITopics | Zheng, Changxi

Collaborating Authors

Zheng, Changxi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis

Van Hoorick, Basile, Wu, Rundi, Ozguroglu, Ege, Sargent, Kyle, Liu, Ruoshi, Tokmakov, Pavel, Dave, Achal, Zheng, Changxi, Vondrick, Carl

arXiv.org Artificial IntelligenceJul-5-2024

Accurate reconstruction of complex dynamic scenes from just a single viewpoint continues to be a challenging task in computer vision. Current dynamic novel view synthesis methods typically require videos from many different camera viewpoints, necessitating careful recording setups, and significantly restricting their utility in the wild as well as in terms of embodied AI applications. In this paper, we propose $\textbf{GCD}$, a controllable monocular dynamic view synthesis pipeline that leverages large-scale diffusion priors to, given a video of any scene, generate a synchronous video from any other chosen perspective, conditioned on a set of relative camera pose parameters. Our model does not require depth as input, and does not explicitly model 3D scene geometry, instead performing end-to-end video-to-video translation in order to achieve its goal efficiently. Despite being trained on synthetic multi-view video data only, zero-shot real-world generalization experiments show promising results in multiple domains, including robotics, object permanence, and driving environments. We believe our framework can potentially unlock powerful applications in rich dynamic scene understanding, perception for robotics, and interactive 3D video viewing experiences for virtual reality.

large language model, machine learning, natural language, (13 more...)

arXiv.org Artificial Intelligence

2405.14868

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation

Zhang, Tianyuan, Yu, Hong-Xing, Wu, Rundi, Feng, Brandon Y., Zheng, Changxi, Snavely, Noah, Wu, Jiajun, Freeman, William T.

arXiv.org Artificial IntelligenceApr-19-2024

Realistic object interactions are crucial for creating immersive virtual experiences, yet synthesizing realistic 3D object dynamics in response to novel interactions remains a significant challenge. Unlike unconditional or text-conditioned dynamics generation, action-conditioned dynamics requires perceiving the physical material properties of objects and grounding the 3D motion prediction on these properties, such as object stiffness. However, estimating physical material properties is an open problem due to the lack of material ground-truth data, as measuring these properties for real objects is highly difficult. We present Phys-Dreamer, a physics-based approach that endows static 3D objects with interactive dynamics by leveraging the object dynamics priors learned by video generation models. By distilling these priors, PhysDreamer enables the synthesis of realistic object responses to novel interactions, such as external forces or agent manipulations. We demonstrate our approach on diverse examples of elastic objects and evaluate the realism of the synthesized interactions through a user study. PhysDreamer takes a step towards more engaging and realistic virtual experiences by enabling static 3D objects to dynamically respond to interactive stimuli in a physically plausible manner. See our project page at https: //physdreamer.github.io/.

arxiv preprint arxiv, machine learning, natural language, (13 more...)

arXiv.org Artificial Intelligence

2404.13026

Country:

North America > United States (0.14)
Asia > Japan > Honshū > Chūbu (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Model-Based Reasoning (0.61)

Add feedback

Implicit Neural Spatial Representations for Time-dependent PDEs

Chen, Honglin, Wu, Rundi, Grinspun, Eitan, Zheng, Changxi, Chen, Peter Yichen

arXiv.org Artificial IntelligenceMay-30-2023

Implicit Neural Spatial Representation (INSR) has emerged as an effective representation of spatially-dependent vector fields. This work explores solving time-dependent PDEs with INSR. Classical PDE solvers introduce both temporal and spatial discretizations. Common spatial discretizations include meshes and meshless point clouds, where each degree-of-freedom corresponds to a location in space. While these explicit spatial correspondences are intuitive to model and understand, these representations are not necessarily optimal for accuracy, memory usage, or adaptivity. Keeping the classical temporal discretization unchanged (e.g., explicit/implicit Euler), we explore INSR as an alternative spatial discretization, where spatial information is implicitly stored in the neural network weights. The network weights then evolve over time via time integration. Our approach does not require any training data generated by existing solvers because our approach is the solver itself. We validate our approach on various PDEs with examples involving large elastic deformations, turbulent fluids, and multi-scale phenomena. While slower to compute than traditional representations, our approach exhibits higher accuracy and lower memory consumption. Whereas classical solvers can dynamically adapt their spatial representation only by resorting to complex remeshing algorithms, our INSR approach is intrinsically adaptive. By tapping into the rich literature of classic time integrators, e.g., operator-splitting schemes, our method enables challenging simulations in contact mechanics and turbulent flows where previous neural-physics approaches struggle. Videos and codes are available on the project page: http://www.cs.columbia.edu/cg/INSR-PDE/

artificial intelligence, machine learning, representation, (13 more...)

arXiv.org Artificial Intelligence

2210.00124

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Hawaii (0.14)
Asia > Middle East > Israel (0.14)

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas > Upstream (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Sin3DM: Learning a Diffusion Model from a Single 3D Textured Shape

Wu, Rundi, Liu, Ruoshi, Vondrick, Carl, Zheng, Changxi

arXiv.org Artificial IntelligenceMay-24-2023

Synthesizing novel 3D models that resemble the input example has long been pursued by researchers and artists in computer graphics. In this paper, we present Sin3DM, a diffusion model that learns the internal patch distribution from a single 3D textured shape and generates high-quality variations with fine geometry and texture details. Training a diffusion model directly in 3D would induce large memory and computational cost. Therefore, we first compress the input into a lower-dimensional latent space and then train a diffusion model on it. Specifically, we encode the input 3D textured shape into triplane feature maps that represent the signed distance and texture fields of the input. The denoising network of our diffusion model has a limited receptive field to avoid overfitting, and uses triplane-aware 2D convolution blocks to improve the result quality. Aside from randomly generating new samples, our model also facilitates applications such as retargeting, outpainting and local editing. Through extensive qualitative and quantitative evaluation, we show that our model can generate 3D shapes of various types with better quality than prior methods.

artificial intelligence, diffusion model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2305.15399

Country:

North America > United States > New York > New York County > New York City (0.14)
Asia > Japan > Honshū > Chūbu (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Learning to Generate 3D Shapes from a Single Example

Wu, Rundi, Zheng, Changxi

arXiv.org Artificial IntelligenceSep-21-2022

Existing generative models for 3D shapes are typically trained on a large 3D dataset, often of a specific object category. In this paper, we investigate the deep generative model that learns from only a single reference 3D shape. Specifically, we present a multi-scale GAN-based model designed to capture the input shape's geometric features across a range of spatial scales. To avoid large memory and computational cost induced by operating on the 3D volume, we build our generator atop the tri-plane hybrid representation, which requires only 2D convolutions. We train our generative model on a voxel pyramid of the reference shape, without the need of any external supervision or manual annotation. Once trained, our model can generate diverse and high-quality 3D shapes possibly of different sizes and aspect ratios. The resulting shapes present variations across different scales, and at the same time retain the global structure of the reference shape. Through extensive evaluation, both qualitative and quantitative, we demonstrate that our model can generate 3D shapes of various types.

artificial intelligence, generative model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3550454.3555480

2208.02946

Country: North America > United States > New York > New York County > New York City (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.87)

Add feedback

Linear Semantics in Generative Adversarial Networks

Xu, Jianjin, Zheng, Changxi

arXiv.org Artificial IntelligenceApr-1-2021

Generative Adversarial Networks (GANs) are able to generate high-quality images, but it remains difficult to explicitly specify the semantics of synthesized images. In this work, we aim to better understand the semantic representation of GANs, and thereby enable semantic control in GAN's generation process. Interestingly, we find that a well-trained GAN encodes image semantics in its internal feature maps in a surprisingly simple way: a linear transformation of feature maps suffices to extract the generated image semantics. To verify this simplicity, we conduct extensive experiments on various GANs and datasets; and thanks to this simplicity, we are able to learn a semantic segmentation model for a trained GAN from a small number (e.g., 8) of labeled images. Last but not least, leveraging our findings, we propose two few-shot image editing approaches, namely Semantic-Conditional Sampling and Semantic Image Editing. Given a trained GAN and as few as eight semantic annotations, the user is able to generate diverse images subject to a user-provided semantic layout, and control the synthesized image semantics. We have made the code publicly available.

lse, neural network, text processing, (20 more...)

arXiv.org Artificial Intelligence

2104.00487

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

BourGAN: Generative Networks with Metric Embeddings

Xiao, Chang, Zhong, Peilin, Zheng, Changxi

Neural Information Processing SystemsFeb-14-2020, 10:15:46 GMT

This paper addresses the mode collapse for generative adversarial networks (GANs). We view modes as a geometric structure of data distribution in a metric space. Not only does this metric embedding determine the dimensionality of the latent space automatically, it also enables us to construct a mixture of Gaussians to draw latent space random vectors. We use the Gaussian mixture model in tandem with a simple augmentation of the objective function to train GANs. Every major step of our method is supported by theoretical analysis, and our experiments on real and synthetic data confirm that the generator is able to produce samples spreading over most of the modes while avoiding unwanted samples, outperforming several recent GAN variants on a number of metrics and offering new features.

artificial intelligence, bourgan, machine learning, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Resisting Adversarial Attacks by $k$-Winners-Take-All

Xiao, Chang, Zhong, Peilin, Zheng, Changxi

arXiv.org Artificial IntelligenceJun-10-2019

We propose a simple change to the current neural network structure for defending against gradient-based adversarial attacks. Instead of using popular activation functions (such as ReLU), we advocate the use of $k$-Winners-Take-All ($k$-WTA) activation, a $C^0$ discontinuous function that purposely invalidates the neural network model's gradient at densely distributed input data points. Our proposal is theoretically rationalized. We show why the discontinuities in $k$-WTA networks can largely prevent gradient-based search of adversarial examples and why they at the same time remain innocuous to the network training. This understanding is also empirically backed. Even without notoriously expensive adversarial training, the robustness performance of our networks is comparable to conventional ReLU networks optimized by adversarial training. Furthermore, after also optimized through adversarial training, our networks outperform the state-of-the-art methods under white-box attacks on various datasets that we experimented with.

adversarial training, deep learning, neural network, (17 more...)

arXiv.org Artificial Intelligence

1905.1051

Country:

Asia (0.14)
North America > United States (0.14)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Rethinking Generative Coverage: A Pointwise Guaranteed Approach

Zhong, Peilin, Mo, Yuchen, Xiao, Chang, Chen, Pengyu, Zheng, Changxi

arXiv.org Machine LearningFeb-20-2019

All generative models have to combat missing modes. The conventional wisdom is by reducing a statistical distance (such as f-divergence) between the generated distribution and the provided data distribution through training. We defy this wisdom. We show that even a small statistical distance does not imply a plausible mode coverage, because this distance measures a global similarity between two distributions, but not their similarity in local regions--which is needed to ensure a complete mode coverage. From a starkly different perspective, we view the battle against missing modes as a two-player game, between a player choosing a data point and an adversary choosing a generator aiming to cover that data point. Enlightened by von Neumann's minimax theorem, we see that if a generative model can approximate a data distribution moderately well under a global statistical distance measure, then we should be able to find a mixture of generators which collectively covers every data point and thus every mode with a lower-bounded probability density. A constructive realization of this minimax duality--that is, our proposed algorithm of finding the mixture of generators--is connected to a multiplicative weights update rule. We prove the pointwise coverage guarantee of our algorithm, and our experiments on real and synthetic data confirm better mode coverage over recent approaches that also use a mixture of generators but focus on global statistical distances.

artificial intelligence, generator, neural network, (16 more...)

arXiv.org Machine Learning

1902.04697

Country: North America > United States (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

BourGAN: Generative Networks with Metric Embeddings

Xiao, Chang, Zhong, Peilin, Zheng, Changxi

Neural Information Processing SystemsDec-31-2018

This paper addresses the mode collapse for generative adversarial networks (GANs). We view modes as a geometric structure of data distribution in a metric space. Under this geometric lens, we embed subsamples of the dataset from an arbitrary metric space into the L2 space, while preserving their pairwise distance distribution. Not only does this metric embedding determine the dimensionality of the latent space automatically, it also enables us to construct a mixture of Gaussians to draw latent space random vectors. We use the Gaussian mixture model in tandem with a simple augmentation of the objective function to train GANs. Every major step of our method is supported by theoretical analysis, and our experiments on real and synthetic data confirm that the generator is able to produce samples spreading over most of the modes while avoiding unwanted samples, outperforming several recent GAN variants on a number of metrics and offering new features.

artificial intelligence, latent space, machine learning, (12 more...)

Neural Information Processing Systems

Country: North America > Canada (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)

Add feedback