AITopics | Chai, Yuning

Collaborating Authors

Chai, Yuning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Generative Data Mining with Longtail-Guided Diffusion

Hayden, David S., Ye, Mao, Garipov, Timur, Meyer, Gregory P., Vondrick, Carl, Chen, Zhao, Chai, Yuning, Wolff, Eric, Srinivasa, Siddhartha S.

arXiv.org Artificial IntelligenceFeb-3-2025

It is difficult to anticipate the myriad challenges that a predictive model will encounter once deployed. Common practice entails a reactive, cyclical approach: model deployment, data mining, and retraining. We instead develop a proactive longtail discovery process by imagining additional data during training. In particular, we develop general model-based longtail signals, including a differentiable, single forward pass formulation of epistemic uncertainty that does not impact model parameters or predictive performance but can flag rare or hard inputs. We leverage these signals as guidance to generate additional training data from a latent diffusion model in a process we call Longtail Guidance (LTG). Crucially, we can perform LTG without retraining the diffusion model or the predictive model, and we do not need to expose the predictive model to intermediate diffusion states. Data generated by LTG exhibit semantically meaningful variation, yield significant generalization improvements on image classification benchmarks, and can be analyzed to proactively discover, explain, and address conceptual gaps in a predictive model.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.0198

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(2 more...)

Add feedback

DriveGPT: Scaling Autoregressive Behavior Models for Driving

Huang, Xin, Wolff, Eric M., Vernaza, Paul, Phan-Minh, Tung, Chen, Hongge, Hayden, David S., Edmonds, Mark, Pierce, Brian, Chen, Xinxin, Jacob, Pratik Elias, Chen, Xiaobai, Tairbekov, Chingiz, Agarwal, Pratik, Gao, Tianshi, Chai, Yuning, Srinivasa, Siddhartha

arXiv.org Artificial IntelligenceDec-18-2024

We present DriveGPT, a scalable behavior model for autonomous driving. We model driving as a sequential decision making task, and learn a transformer model to predict future agent states as tokens in an autoregressive fashion. We scale up our model parameters and training data by multiple orders of magnitude, enabling us to explore the scaling properties in terms of dataset size, model parameters, and compute. We evaluate DriveGPT across different scales in a planning task, through both quantitative metrics and qualitative examples including closed-loop driving in complex real-world scenarios. In a separate prediction task, DriveGPT outperforms a state-of-the-art baseline and exhibits improved performance by pretraining on a large-scale dataset, further validating the benefits of data scaling.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2412.14415

Genre: Research Report > New Finding (1.00)

Industry:

Energy (0.48)
Transportation > Ground > Road (0.36)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

Add feedback

VLMine: Long-Tail Data Mining with Vision Language Models

Ye, Mao, Meyer, Gregory P., Zhang, Zaiwei, Park, Dennis, Mustikovela, Siva Karthik, Chai, Yuning, Wolff, Eric M

arXiv.org Artificial IntelligenceSep-23-2024

Ensuring robust performance on long-tail examples is an important problem for many real-world applications of machine learning, such as autonomous driving. This work focuses on the problem of identifying rare examples within a corpus of unlabeled data. We propose a simple and scalable data mining approach that leverages the knowledge contained within a large vision language model (VLM). Our approach utilizes a VLM to summarize the content of an image into a set of keywords, and we identify rare examples based on keyword frequency. We find that the VLM offers a distinct signal for identifying long-tail examples when compared to conventional methods based on model uncertainty. Therefore, we propose a simple and general approach for integrating signals from multiple mining algorithms. We evaluate the proposed method on two diverse tasks: 2D image classification, in which inter-class variation is the primary source of data diversity, and on 3D object detection, where intra-class variation is the main concern. Furthermore, through the detection task, we demonstrate that the knowledge extracted from 2D images is transferable to the 3D domain. Our experiments consistently show large improvements (between 10\% and 50\%) over the baseline techniques on several representative benchmarks: ImageNet-LT, Places-LT, and the Waymo Open Dataset.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2409.15486

Genre: Research Report (0.64)

Industry: Transportation > Ground > Road (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)

Add feedback

Cohere3D: Exploiting Temporal Coherence for Unsupervised Representation Learning of Vision-based Autonomous Driving

Xie, Yichen, Chen, Hongge, Meyer, Gregory P., Lee, Yong Jae, Wolff, Eric M., Tomizuka, Masayoshi, Zhan, Wei, Chai, Yuning, Huang, Xin

arXiv.org Artificial IntelligenceFeb-23-2024

Due to the lack of depth cues in images, multi-frame inputs are important for the success of vision-based perception, prediction, and planning in autonomous driving. Observations from different angles enable the recovery of 3D object states from 2D image inputs if we can identify the same instance in different input frames. However, the dynamic nature of autonomous driving scenes leads to significant changes in the appearance and shape of each instance captured by the camera at different time steps. To this end, we propose a novel contrastive learning algorithm, Cohere3D, to learn coherent instance representations in a long-term input sequence robust to the change in distance and perspective. The learned representation aids in instance-level correspondence across multiple input frames in downstream tasks. In the pretraining stage, the raw point clouds from LiDAR sensors are utilized to construct the long-term temporal correspondence for each instance, which serves as guidance for the extraction of instance-level representation from the vision-based bird's eye-view (BEV) feature map. Cohere3D encourages a consistent representation for the same instance at different frames but distinguishes between representations of different instances. We evaluate our algorithm by finetuning the pretrained model on various downstream perception, prediction, and planning tasks. Results show a notable improvement in both data efficiency and task performance.

artificial intelligence, machine learning, representation, (14 more...)

arXiv.org Artificial Intelligence

2402.15583

Country: North America > United States > California (0.14)

Genre: Research Report (0.70)

Industry:

Transportation > Ground > Road (0.92)
Information Technology > Robotics & Automation (0.82)
Automobiles & Trucks (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Making Large Multimodal Models Understand Arbitrary Visual Prompts

Cai, Mu, Liu, Haotian, Mustikovela, Siva Karthik, Meyer, Gregory P., Chai, Yuning, Park, Dennis, Lee, Yong Jae

arXiv.org Artificial IntelligenceDec-1-2023

While existing large vision-language multimodal models focus on whole image understanding, there is a prominent gap in achieving region-specific comprehension. Current approaches that use textual coordinates or spatial encodings often fail to provide a user-friendly interface for visual prompting. To address this challenge, we introduce a novel multimodal model capable of decoding arbitrary visual prompts. This allows users to intuitively mark images and interact with the model using natural cues like a "red bounding box" or "pointed arrow". Our simple design directly overlays visual markers onto the RGB image, eliminating the need for complex region encodings, yet achieves state-of-the-art performance on region-understanding tasks like Visual7W, PointQA, and Visual Commonsense Reasoning benchmark. Furthermore, we present ViP-Bench, a comprehensive benchmark to assess the capability of models in understanding visual prompts across multiple dimensions, enabling future research in this domain. Code, data, and model are publicly available.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2312.00784

Country:

North America > United States > Wisconsin (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

SHIFT3D: Synthesizing Hard Inputs For Tricking 3D Detectors

Chen, Hongge, Chen, Zhao, Meyer, Gregory P., Park, Dennis, Vondrick, Carl, Shrivastava, Ashish, Chai, Yuning

arXiv.org Artificial IntelligenceSep-11-2023

We present SHIFT3D, a differentiable pipeline for generating 3D shapes that are structurally plausible yet challenging to 3D object detectors. In safety-critical applications like autonomous driving, discovering such novel challenging objects can offer insight into unknown vulnerabilities of 3D detectors. By representing objects with a signed distanced function (SDF), we show that gradient error signals allow us to smoothly deform the shape or pose of a 3D object in order to confuse a downstream 3D detector. Importantly, the objects generated by SHIFT3D physically differ from the baseline object yet retain a semantically recognizable shape. Our approach provides interpretable failure modes for modern 3D object detectors, and can aid in preemptive discovery of potential safety risks within 3D perception systems before these risks become critical failures.

artificial intelligence, machine learning, shift3d, (15 more...)

arXiv.org Artificial Intelligence

2309.0581

Country: Asia > Middle East > Israel (0.14)

Genre: Research Report (0.82)

Industry:

Transportation > Ground > Road (0.68)
Information Technology (0.67)
Automobiles & Trucks (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.35)

Add feedback

MultiPath: Multiple Probabilistic Anchor Trajectory Hypotheses for Behavior Prediction

Chai, Yuning, Sapp, Benjamin, Bansal, Mayank, Anguelov, Dragomir

arXiv.org Machine LearningOct-11-2019

Predicting human behavior is a difficult and crucial task required for motion planning. It is challenging in large part due to the highly uncertain and multi-modal set of possible outcomes in real-world domains such as autonomous driving. Beyond single MAP trajectory prediction, obtaining an accurate probability distribution of the future is an area of active interest. We present MultiPath, which leverages a fixed set of future state-sequence anchors that correspond to modes of the trajectory distribution. At inference, our model predicts a discrete distribution over the anchors and, for each anchor, regresses offsets from anchor waypoints along with uncertainties, yielding a Gaussian mixture at each time step. Our model is efficient, requiring only one forward inference pass to obtain multi-modal future distributions, and the output is parametric, allowing compact communication and analytical probabilistic queries. We show on several datasets that our model achieves more accurate predictions, and compared to sampling baselines, does so with an order of magnitude fewer trajectories.

deep learning, neural network, trajectory, (17 more...)

arXiv.org Machine Learning

1910.05449

Country: Asia > Japan (0.14)

Genre: Research Report (0.64)

Industry: Transportation > Ground > Road (0.36)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Add feedback