AITopics | Meyer, Gregory P.

Collaborating Authors

Meyer, Gregory P.

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Generative Data Mining with Longtail-Guided Diffusion

Hayden, David S., Ye, Mao, Garipov, Timur, Meyer, Gregory P., Vondrick, Carl, Chen, Zhao, Chai, Yuning, Wolff, Eric, Srinivasa, Siddhartha S.

arXiv.org Artificial IntelligenceFeb-3-2025

It is difficult to anticipate the myriad challenges that a predictive model will encounter once deployed. Common practice entails a reactive, cyclical approach: model deployment, data mining, and retraining. We instead develop a proactive longtail discovery process by imagining additional data during training. In particular, we develop general model-based longtail signals, including a differentiable, single forward pass formulation of epistemic uncertainty that does not impact model parameters or predictive performance but can flag rare or hard inputs. We leverage these signals as guidance to generate additional training data from a latent diffusion model in a process we call Longtail Guidance (LTG). Crucially, we can perform LTG without retraining the diffusion model or the predictive model, and we do not need to expose the predictive model to intermediate diffusion states. Data generated by LTG exhibit semantically meaningful variation, yield significant generalization improvements on image classification benchmarks, and can be analyzed to proactively discover, explain, and address conceptual gaps in a predictive model.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.0198

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(2 more...)

Add feedback

VLM-AD: End-to-End Autonomous Driving through Vision-Language Model Supervision

Xu, Yi, Hu, Yuxin, Zhang, Zaiwei, Meyer, Gregory P., Mustikovela, Siva Karthik, Srinivasa, Siddhartha, Wolff, Eric M., Huang, Xin

arXiv.org Artificial IntelligenceDec-18-2024

Human drivers rely on commonsense reasoning to navigate diverse and dynamic real-world scenarios. Existing end-to-end (E2E) autonomous driving (AD) models are typically optimized to mimic driving patterns observed in data, without capturing the underlying reasoning processes. This limitation constrains their ability to handle challenging driving scenarios. T o close this gap, we propose VLM-AD, a method that leverages vision-language models (VLMs) as teachers to enhance training by providing additional supervision that incorporates unstructured reasoning information and structured action labels. Such supervision enhances the model's ability to learn richer feature representations that capture the rationale behind driving patterns. Importantly, our method does not require a VLM during inference, making it practical for real-time deployment. When integrated with state-of-the-art methods, VLM-AD achieves significant improvements in planning accuracy and reduced collision rates on the nuScenes dataset.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2412.14446

Genre:

Research Report > Promising Solution (0.66)
Research Report > New Finding (0.46)

Industry:

Transportation > Ground > Road (0.75)
Information Technology > Robotics & Automation (0.65)
Automobiles & Trucks (0.65)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
(2 more...)

Add feedback

VLMine: Long-Tail Data Mining with Vision Language Models

Ye, Mao, Meyer, Gregory P., Zhang, Zaiwei, Park, Dennis, Mustikovela, Siva Karthik, Chai, Yuning, Wolff, Eric M

arXiv.org Artificial IntelligenceSep-23-2024

Ensuring robust performance on long-tail examples is an important problem for many real-world applications of machine learning, such as autonomous driving. This work focuses on the problem of identifying rare examples within a corpus of unlabeled data. We propose a simple and scalable data mining approach that leverages the knowledge contained within a large vision language model (VLM). Our approach utilizes a VLM to summarize the content of an image into a set of keywords, and we identify rare examples based on keyword frequency. We find that the VLM offers a distinct signal for identifying long-tail examples when compared to conventional methods based on model uncertainty. Therefore, we propose a simple and general approach for integrating signals from multiple mining algorithms. We evaluate the proposed method on two diverse tasks: 2D image classification, in which inter-class variation is the primary source of data diversity, and on 3D object detection, where intra-class variation is the main concern. Furthermore, through the detection task, we demonstrate that the knowledge extracted from 2D images is transferable to the 3D domain. Our experiments consistently show large improvements (between 10\% and 50\%) over the baseline techniques on several representative benchmarks: ImageNet-LT, Places-LT, and the Waymo Open Dataset.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2409.15486

Genre: Research Report (0.64)

Industry: Transportation > Ground > Road (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)

Add feedback

Cohere3D: Exploiting Temporal Coherence for Unsupervised Representation Learning of Vision-based Autonomous Driving

Xie, Yichen, Chen, Hongge, Meyer, Gregory P., Lee, Yong Jae, Wolff, Eric M., Tomizuka, Masayoshi, Zhan, Wei, Chai, Yuning, Huang, Xin

arXiv.org Artificial IntelligenceFeb-23-2024

Due to the lack of depth cues in images, multi-frame inputs are important for the success of vision-based perception, prediction, and planning in autonomous driving. Observations from different angles enable the recovery of 3D object states from 2D image inputs if we can identify the same instance in different input frames. However, the dynamic nature of autonomous driving scenes leads to significant changes in the appearance and shape of each instance captured by the camera at different time steps. To this end, we propose a novel contrastive learning algorithm, Cohere3D, to learn coherent instance representations in a long-term input sequence robust to the change in distance and perspective. The learned representation aids in instance-level correspondence across multiple input frames in downstream tasks. In the pretraining stage, the raw point clouds from LiDAR sensors are utilized to construct the long-term temporal correspondence for each instance, which serves as guidance for the extraction of instance-level representation from the vision-based bird's eye-view (BEV) feature map. Cohere3D encourages a consistent representation for the same instance at different frames but distinguishes between representations of different instances. We evaluate our algorithm by finetuning the pretrained model on various downstream perception, prediction, and planning tasks. Results show a notable improvement in both data efficiency and task performance.

artificial intelligence, machine learning, representation, (14 more...)

arXiv.org Artificial Intelligence

2402.15583

Country: North America > United States > California (0.14)

Genre: Research Report (0.70)

Industry:

Transportation > Ground > Road (0.92)
Information Technology > Robotics & Automation (0.82)
Automobiles & Trucks (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Making Large Multimodal Models Understand Arbitrary Visual Prompts

Cai, Mu, Liu, Haotian, Mustikovela, Siva Karthik, Meyer, Gregory P., Chai, Yuning, Park, Dennis, Lee, Yong Jae

arXiv.org Artificial IntelligenceDec-1-2023

While existing large vision-language multimodal models focus on whole image understanding, there is a prominent gap in achieving region-specific comprehension. Current approaches that use textual coordinates or spatial encodings often fail to provide a user-friendly interface for visual prompting. To address this challenge, we introduce a novel multimodal model capable of decoding arbitrary visual prompts. This allows users to intuitively mark images and interact with the model using natural cues like a "red bounding box" or "pointed arrow". Our simple design directly overlays visual markers onto the RGB image, eliminating the need for complex region encodings, yet achieves state-of-the-art performance on region-understanding tasks like Visual7W, PointQA, and Visual Commonsense Reasoning benchmark. Furthermore, we present ViP-Bench, a comprehensive benchmark to assess the capability of models in understanding visual prompts across multiple dimensions, enabling future research in this domain. Code, data, and model are publicly available.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2312.00784

Country:

North America > United States > Wisconsin (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

SHIFT3D: Synthesizing Hard Inputs For Tricking 3D Detectors

Chen, Hongge, Chen, Zhao, Meyer, Gregory P., Park, Dennis, Vondrick, Carl, Shrivastava, Ashish, Chai, Yuning

arXiv.org Artificial IntelligenceSep-11-2023

We present SHIFT3D, a differentiable pipeline for generating 3D shapes that are structurally plausible yet challenging to 3D object detectors. In safety-critical applications like autonomous driving, discovering such novel challenging objects can offer insight into unknown vulnerabilities of 3D detectors. By representing objects with a signed distanced function (SDF), we show that gradient error signals allow us to smoothly deform the shape or pose of a 3D object in order to confuse a downstream 3D detector. Importantly, the objects generated by SHIFT3D physically differ from the baseline object yet retain a semantically recognizable shape. Our approach provides interpretable failure modes for modern 3D object detectors, and can aid in preemptive discovery of potential safety risks within 3D perception systems before these risks become critical failures.

artificial intelligence, machine learning, shift3d, (15 more...)

arXiv.org Artificial Intelligence

2309.0581

Country: Asia > Middle East > Israel (0.14)

Genre: Research Report (0.82)

Industry:

Transportation > Ground > Road (0.68)
Information Technology (0.67)
Automobiles & Trucks (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.35)

Add feedback