AITopics | Cao, Tianshi

Collaborating Authors

Cao, Tianshi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control

NVIDIA, null, :, null, Alhaija, Hassan Abu, Alvarez, Jose, Bala, Maciej, Cai, Tiffany, Cao, Tianshi, Cha, Liz, Chen, Joshua, Chen, Mike, Ferroni, Francesco, Fidler, Sanja, Fox, Dieter, Ge, Yunhao, Gu, Jinwei, Hassani, Ali, Isaev, Michael, Jannaty, Pooya, Lan, Shiyi, Lasser, Tobias, Ling, Huan, Liu, Ming-Yu, Liu, Xian, Lu, Yifan, Luo, Alice, Ma, Qianli, Mao, Hanzi, Ramos, Fabio, Ren, Xuanchi, Shen, Tianchang, Tang, Shitao, Wang, Ting-Chun, Wu, Jay, Xu, Jiashu, Xu, Stella, Xie, Kevin, Ye, Yuchong, Yang, Xiaodong, Zeng, Xiaohui, Zeng, Yu

arXiv.org Artificial IntelligenceMar-18-2025

We introduce Cosmos-Transfer1, a conditional world generation model that can generate world simulations based on multiple spatial control inputs of various modalities such as segmentation, depth, and edge. In the design, the spatial conditional scheme is adaptive and customizable. It allows weighting different conditional inputs differently at different spatial locations. This enables highly controllable world generation and finds use in various world-to-world transfer use cases, including Sim2Real. We conduct extensive evaluations to analyze the proposed model and demonstrate its applications for Physical AI, including robotics Sim2Real and autonomous vehicle data enrichment. We further demonstrate an inference scaling strategy to achieve real-time world generation with an NVIDIA GB200 NVL72 rack.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.14492

Genre: Research Report (1.00)

Industry: Transportation > Ground > Road (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

LATTE3D: Large-scale Amortized Text-To-Enhanced3D Synthesis

Xie, Kevin, Lorraine, Jonathan, Cao, Tianshi, Gao, Jun, Lucas, James, Torralba, Antonio, Fidler, Sanja, Zeng, Xiaohui

arXiv.org Artificial IntelligenceMar-22-2024

Recent text-to-3D generation approaches produce impressive 3D results but require time-consuming optimization that can take up to an hour per prompt [21, 39]. Amortized methods like ATT3D [26] optimize multiple prompts simultaneously to improve efficiency, enabling fast text-to-3D synthesis. However, they cannot capture high-frequency geometry and texture details and struggle to scale to large prompt sets, so they generalize poorly. We introduce Latte3D, addressing these limitations to achieve fast, high-quality generation on a significantly larger prompt set. Key to our method is 1) building a scalable architecture and 2) leveraging 3D data during optimization through 3D-aware diffusion priors, shape regularization, and model initialization to achieve robustness to diverse and complex training prompts. Latte3D amortizes both neural field and textured surface generation to produce highly detailed textured meshes in a single forward pass. Latte3D generates 3D objects in 400ms, and can be further enhanced with fast test-time optimization.

machine learning, natural language, optimization, (17 more...)

arXiv.org Artificial Intelligence

2403.15385

Country:

North America > United States (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment (0.92)
Transportation > Passenger (0.46)
Transportation > Ground (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

TexFusion: Synthesizing 3D Textures with Text-Guided Image Diffusion Models

Cao, Tianshi, Kreis, Karsten, Fidler, Sanja, Sharp, Nicholas, Yin, Kangxue

arXiv.org Artificial IntelligenceOct-20-2023

We present TexFusion (Texture Diffusion), a new method to synthesize textures for given 3D geometries, using large-scale text-guided image diffusion models. In contrast to recent works that leverage 2D text-to-image diffusion models to distill 3D objects using a slow and fragile optimization process, TexFusion introduces a new 3D-consistent generation technique specifically designed for texture synthesis that employs regular diffusion model sampling on different 2D rendered views. Specifically, we leverage latent diffusion models, apply the diffusion model's denoiser on a set of 2D renders of the 3D object, and aggregate the different denoising predictions on a shared latent texture map. Final output RGB textures are produced by optimizing an intermediate neural color field on the decodings of 2D renders of the latent texture. We thoroughly validate TexFusion and show that we can efficiently generate diverse, high quality and globally coherent textures. We achieve state-of-the-art text-guided texture synthesis performance using only image diffusion models, while avoiding the pitfalls of previous distillation-based methods. The text-conditioning offers detailed control and we also do not rely on any ground truth 3D textures for training. This makes our method versatile and applicable to a broad range of geometry and texture types. We hope that TexFusion will advance AI-based texturing of 3D assets for applications in virtual reality, game design, simulation, and more.

artificial intelligence, diffusion model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2310.13772

Country:

North America > United States > New York (0.14)
North America > United States > Virginia (0.14)
North America > Canada > Ontario > Toronto (0.14)
(2 more...)

Genre: Research Report (1.00)

Industry:

Automobiles & Trucks > Manufacturer (1.00)
Transportation > Ground > Road (0.68)
Leisure & Entertainment > Games > Computer Games (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A Benchmark of Medical Out of Distribution Detection

Cao, Tianshi, Huang, Chin-Wei, Hui, David Yu-Tung, Cohen, Joseph Paul

arXiv.org Machine LearningAug-4-2020

Motivation: Deep learning models deployed for use on medical tasks can be equipped with Out-of-Distribution Detection (OoDD) methods in order to avoid erroneous predictions. However it is unclear which OoDD method should be used in practice. Specific Problem: Systems trained for one particular domain of images cannot be expected to perform accurately on images of a different domain. These images should be flagged by an OoDD method prior to diagnosis. Our approach: This paper defines 3 categories of OoD examples and benchmarks popular OoDD methods in three domains of medical imaging: chest X-ray, fundus imaging, and histology slides. Results: Our experiments show that despite methods yielding good results on some categories of out-of-distribution samples, they fail to recognize images close to the training distribution. Conclusion: We find a simple binary classifier on the feature representation has the best accuracy and AUPRC on average. Users of diagnostic tools which employ these OoDD methods should still remain vigilant that images very close to the training distribution yet not in it could yield unexpected results.

dataset, deep learning, neural network, (22 more...)

arXiv.org Machine Learning

2007.0425

Country:

North America > Canada > Quebec (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.84)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

A Theoretical Analysis of the Number of Shots in Few-Shot Learning

Cao, Tianshi, Law, Marc, Fidler, Sanja

arXiv.org Machine LearningSep-25-2019

Few-shot classification is the task of predicting the category of an example from a set of few labeled examples. The number of labeled examples per category is called the number of shots (or shot number). Recent works tackle this task through meta-learning, where a meta-learner extracts information from observed tasks during meta-training to quickly adapt to new tasks during meta-testing. In this formulation, the number of shots exploited during meta-training has an impact on the recognition performance at meta-test time. Generally, the shot number used in meta-training should match the one used in meta-testing to obtain the best performance. We introduce a theoretical analysis of the impact of the shot number on Prototypical Networks, a state-of-the-art few-shot classification method. From our analysis, we propose a simple method that is robust to the choice of shot number used during meta-training, which is a crucial hyperparameter. The performance of our model trained for an arbitrary meta-training shot number shows great performance for different values of meta-testing shot numbers. We experimentally demonstrate our approach on different few-shot classification benchmarks.

artificial intelligence, machine learning, neural network, (12 more...)

arXiv.org Machine Learning

1909.11722

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback