AITopics | Zhang, Zaiwei

Collaborating Authors

Zhang, Zaiwei

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

VLM-AD: End-to-End Autonomous Driving through Vision-Language Model Supervision

Xu, Yi, Hu, Yuxin, Zhang, Zaiwei, Meyer, Gregory P., Mustikovela, Siva Karthik, Srinivasa, Siddhartha, Wolff, Eric M., Huang, Xin

arXiv.org Artificial IntelligenceDec-18-2024

Human drivers rely on commonsense reasoning to navigate diverse and dynamic real-world scenarios. Existing end-to-end (E2E) autonomous driving (AD) models are typically optimized to mimic driving patterns observed in data, without capturing the underlying reasoning processes. This limitation constrains their ability to handle challenging driving scenarios. T o close this gap, we propose VLM-AD, a method that leverages vision-language models (VLMs) as teachers to enhance training by providing additional supervision that incorporates unstructured reasoning information and structured action labels. Such supervision enhances the model's ability to learn richer feature representations that capture the rationale behind driving patterns. Importantly, our method does not require a VLM during inference, making it practical for real-time deployment. When integrated with state-of-the-art methods, VLM-AD achieves significant improvements in planning accuracy and reduced collision rates on the nuScenes dataset.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2412.14446

Genre:

Research Report > Promising Solution (0.66)
Research Report > New Finding (0.46)

Industry:

Transportation > Ground > Road (0.75)
Information Technology > Robotics & Automation (0.65)
Automobiles & Trucks (0.65)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
(2 more...)

Add feedback

VLMine: Long-Tail Data Mining with Vision Language Models

Ye, Mao, Meyer, Gregory P., Zhang, Zaiwei, Park, Dennis, Mustikovela, Siva Karthik, Chai, Yuning, Wolff, Eric M

arXiv.org Artificial IntelligenceSep-23-2024

Ensuring robust performance on long-tail examples is an important problem for many real-world applications of machine learning, such as autonomous driving. This work focuses on the problem of identifying rare examples within a corpus of unlabeled data. We propose a simple and scalable data mining approach that leverages the knowledge contained within a large vision language model (VLM). Our approach utilizes a VLM to summarize the content of an image into a set of keywords, and we identify rare examples based on keyword frequency. We find that the VLM offers a distinct signal for identifying long-tail examples when compared to conventional methods based on model uncertainty. Therefore, we propose a simple and general approach for integrating signals from multiple mining algorithms. We evaluate the proposed method on two diverse tasks: 2D image classification, in which inter-class variation is the primary source of data diversity, and on 3D object detection, where intra-class variation is the main concern. Furthermore, through the detection task, we demonstrate that the knowledge extracted from 2D images is transferable to the 3D domain. Our experiments consistently show large improvements (between 10\% and 50\%) over the baseline techniques on several representative benchmarks: ImageNet-LT, Places-LT, and the Waymo Open Dataset.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2409.15486

Genre: Research Report (0.64)

Industry: Transportation > Ground > Road (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)

Add feedback

The Role of Linguistic Priors in Measuring Compositional Generalization of Vision-Language Models

Wu, Chenwei, Li, Li Erran, Ermon, Stefano, Haffner, Patrick, Ge, Rong, Zhang, Zaiwei

arXiv.org Artificial IntelligenceOct-4-2023

Compositionality is a common property in many modalities including natural languages and images, but the compositional generalization of multi-modal models is not well-understood. In this paper, we identify two sources of visual-linguistic compositionality: linguistic priors and the interplay between images and texts. We show that current attempts to improve compositional generalization rely on linguistic priors rather than on information in the image. We also propose a new metric for compositionality without such linguistic priors.

measuring compositional generalization, role, vision-language model

arXiv.org Artificial Intelligence

2310.02777

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.69)

Add feedback

ARAPReg: An As-Rigid-As Possible Regularization Loss for Learning Deformable Shape Generators

Huang, Qixing, Huang, Xiangru, Sun, Bo, Zhang, Zaiwei, Jiang, Junfeng, Bajaj, Chandrajit

arXiv.org Artificial IntelligenceAug-21-2021

This paper introduces an unsupervised loss for training parametric deformation shape generators. The key idea is to enforce the preservation of local rigidity among the generated shapes. Our approach builds on an approximation of the as-rigid-as possible (or ARAP) deformation energy. We show how to develop the unsupervised loss via a spectral decomposition of the Hessian of the ARAP energy. Our loss nicely decouples pose and shape variations through a robust norm. The loss admits simple closed-form expressions. It is easy to train and can be plugged into any standard generation models, e.g., variational auto-encoder (VAE) and auto-decoder (AD). Experimental results show that our approach outperforms existing shape generation approaches considerably on public benchmark datasets of various shape categories such as human, animal and bone.

arapreg, deep learning, neural network, (17 more...)

arXiv.org Artificial Intelligence

2108.09432

Country:

Europe (1.00)
Asia (0.68)
North America > United States > Louisiana (0.14)
(2 more...)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Joint Learning of Neural Networks via Iterative Reweighted Least Squares

Zhang, Zaiwei, Huang, Xiangru, Huang, Qixing, Zhang, Xiao, Li, Yuan

arXiv.org Machine LearningJun-11-2019

In this paper, we introduce the problem of jointly learning feed-forward neural networks across a set of relevant but diverse datasets. Compared to learning a separate network from each dataset in isolation, joint learning enables us to extract correlated information across multiple datasets to significantly improve the quality of learned networks. We formulate this problem as joint learning of multiple copies of the same network architecture and enforce the network weights to be shared across these networks. Instead of hand-encoding the shared network layers, we solve an optimization problem to automatically determine how layers should be shared between each pair of datasets. Experimental results show that our approach outperforms baselines without joint learning and those using pretraining-and-fine-tuning. We show the effectiveness of our approach on three tasks: image classification, learning auto-encoders, and image generation.

dataset, neural network, survey article, (17 more...)

arXiv.org Machine Learning

1905.06526

Country: North America > United States > Colorado (0.14)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback