AITopics | Wang, Zeyu

Plotting

Wang, Zeyu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Voila-A: Aligning Vision-Language Models with User's Gaze Attention

Yan, Kun, Ji, Lei, Wang, Zeyu, Wang, Yuntao, Duan, Nan, Ma, Shuai

arXiv.org Artificial IntelligenceDec-22-2023

In recent years, the integration of vision and language understanding has led to significant advancements in artificial intelligence, particularly through Vision-Language Models (VLMs). However, existing VLMs face challenges in handling real-world applications with complex scenes and multiple objects, as well as aligning their focus with the diverse attention patterns of human users. In this paper, we introduce gaze information, feasibly collected by AR or VR devices, as a proxy for human attention to guide VLMs and propose a novel approach, Voila-A, for gaze alignment to enhance the interpretability and effectiveness of these models in real-world applications. First, we collect hundreds of minutes of gaze data to demonstrate that we can mimic human gaze modalities using localized narratives. We then design an automatic data annotation pipeline utilizing GPT-4 to generate the VOILA-COCO dataset. Additionally, we innovate the Voila Perceiver modules to integrate gaze information into VLMs while preserving their pretrained knowledge. We evaluate Voila-A using a hold-out validation set and a newly collected VOILA-GAZE Testset, which features real-life scenarios captured with a gaze-tracking device. Our experimental results demonstrate that Voila-A significantly outperforms several baseline models. By aligning model attention with human gaze patterns, Voila-A paves the way for more intuitive, user-centric VLMs and fosters engaging human-AI interaction across a wide range of applications.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2401.09454

Genre: Research Report > Promising Solution (0.87)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Scenario Diffusion: Controllable Driving Scenario Generation With Diffusion

Pronovost, Ethan, Ganesina, Meghana Reddy, Hendy, Noureldin, Wang, Zeyu, Morales, Andres, Wang, Kai, Roy, Nicholas

arXiv.org Artificial IntelligenceNov-16-2023

Automated creation of synthetic traffic scenarios is a key part of validating the safety of autonomous vehicles (AVs). In this paper, we propose Scenario Diffusion, a novel diffusion-based architecture for generating traffic scenarios that enables controllable scenario generation. We combine latent diffusion, object detection and trajectory regression to generate distributions of synthetic agent poses, orientations and trajectories simultaneously. To provide additional control over the generated scenario, this distribution is conditioned on a map and sets of tokens describing the desired scenario. We show that our approach has sufficient expressive capacity to model diverse traffic patterns and generalizes to different geographical regions.

artificial intelligence, machine learning, scenario, (17 more...)

arXiv.org Artificial Intelligence

2311.02738

Country: North America > United States (0.28)

Genre: Research Report (0.85)

Industry: Transportation > Ground > Road (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.67)

Add feedback

Few-Shot Multi-Label Aspect Category Detection Utilizing Prototypical Network with Sentence-Level Weighting and Label Augmentation

Wang, Zeyu, Iwaihara, Mizuho

arXiv.org Artificial IntelligenceSep-27-2023

Multi-label aspect category detection is intended to detect multiple aspect categories occurring in a given sentence. Since aspect category detection often suffers from limited datasets and data sparsity, the prototypical network with attention mechanisms has been applied for few-shot aspect category detection. Nevertheless, most of the prototypical networks used so far calculate the prototypes by taking the mean value of all the instances in the support set. This seems to ignore the variations between instances in multi-label aspect category detection. Also, several related works utilize label text information to enhance the attention mechanism. However, the label text information is often short and limited, and not specific enough to discern categories. In this paper, we first introduce support set attention along with the augmented label information to mitigate the noise at word-level for each support set instance. Moreover, we use a sentence-level attention mechanism that gives different weights to each instance in the support set in order to compute prototypes by weighted averaging. Finally, the calculated prototypes are further used in conjunction with query instances to compute query attention and thereby eliminate noises from the query set. Experimental results on the Yelp dataset show that our proposed method is useful and outperforms all baselines in four different scenarios.

category detection utilizing prototypical network, sentence-level weighting and label augmentation

arXiv.org Artificial Intelligence

2309.15588

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.60)

Add feedback

DistillBEV: Boosting Multi-Camera 3D Object Detection with Cross-Modal Knowledge Distillation

Wang, Zeyu, Li, Dingwen, Luo, Chenxu, Xie, Cihang, Yang, Xiaodong

arXiv.org Artificial IntelligenceSep-26-2023

3D perception based on the representations learned from multi-camera bird's-eye-view (BEV) is trending as cameras are cost-effective for mass production in autonomous driving industry. However, there exists a distinct performance gap between multi-camera BEV and LiDAR based 3D object detection. One key reason is that LiDAR captures accurate depth and other geometry measurements, while it is notoriously challenging to infer such 3D information from merely image input. In this work, we propose to boost the representation learning of a multi-camera BEV based student detector by training it to imitate the features of a well-trained LiDAR based teacher detector. We propose effective balancing strategy to enforce the student to focus on learning the crucial features from the teacher, and generalize knowledge transfer to multi-scale layers with temporal fusion. We conduct extensive evaluations on multiple representative models of multi-camera BEV. Experiments reveal that our approach renders significant improvement over the student models, leading to the state-of-the-art performance on the popular benchmark nuScenes.

artificial intelligence, distillation, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2309.15109

Genre: Research Report (0.64)

Industry:

Education (1.00)
Transportation > Ground > Road (0.49)

Technology:

Information Technology > Artificial Intelligence > Vision (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Predicting Word Learning in Children from the Performance of Computer Vision Systems

Rane, Sunayana, Nencheva, Mira L., Wang, Zeyu, Lew-Williams, Casey, Russakovsky, Olga, Griffiths, Thomas L.

arXiv.org Artificial IntelligenceSep-9-2023

For human children as well as machine learning systems, a key challenge in learning a word is linking the word to the visual phenomena it describes. We explore this aspect of word learning by using the performance of computer vision systems as a proxy for the difficulty of learning a word from visual cues. We show that the age at which children acquire different categories of words is correlated with the performance of visual classification and captioning systems, over and above the expected effects of word frequency. The performance of the computer vision systems is correlated with human judgments of the concreteness of words, which are in turn a predictor of children's word learning, suggesting that these models are capturing the relationship between words and visual phenomena.

category, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2207.09847

Country:

North America > United States (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.73)

Add feedback

SpecInfer: Accelerating Generative Large Language Model Serving with Speculative Inference and Token Tree Verification

Miao, Xupeng, Oliaro, Gabriele, Zhang, Zhihao, Cheng, Xinhao, Wang, Zeyu, Wong, Rae Ying Yee, Zhu, Alan, Yang, Lijie, Shi, Xiaoxiang, Shi, Chunan, Chen, Zhuoming, Arfeen, Daiyaan, Abhyankar, Reyna, Jia, Zhihao

arXiv.org Artificial IntelligenceAug-16-2023

This approach is also called autoregressive decoding because each The high computational and memory requirements of generative generated token is also used as input for generating future large language models (LLMs) make it challenging tokens. This dependency between tokens is crucial for many to serve them quickly and cheaply. This paper introduces NLP tasks that require preserving the order and context of the SpecInfer, an LLM serving system that accelerates generative generated tokens, such as text completion [53]. LLM inference with speculative inference and token tree Existing LLM systems generally use an incremental decoding verification. A key insight behind SpecInfer is to combine approach to serving a request where the system computes various collectively boost-tuned small language models to the activations for all prompt tokens in a single step and then jointly predict the LLM's outputs; the predictions are organized iteratively decodes one new token using the input prompt as a token tree, whose nodes each represent a candidate and all previously generated tokens. This approach respects token sequence. The correctness of all candidate token sequences data dependencies between tokens, but achieves suboptimal represented by a token tree is verified against the runtime performance and limited GPU utilization, since the LLM in parallel using a novel tree-based parallel decoding degree of parallelism within each request is greatly limited in mechanism. SpecInfer uses an LLM as a token tree verifier the incremental phase. In addition, the attention mechanism of instead of an incremental decoder, which significantly Transformer [46] requires accessing the keys and values of all reduces the end-to-end latency and computational requirement previous tokens to compute the attention output of a new token.

machine learning, natural language, specinfer, (19 more...)

arXiv.org Artificial Intelligence

2305.09781

Country: North America > United States (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Subgraph Networks Based Contrastive Learning

Wang, Jinhuan, Shao, Jiafei, Wang, Zeyu, Yu, Shanqing, Xuan, Qi, Yang, Xiaoniu

arXiv.org Artificial IntelligenceJun-6-2023

Graph contrastive learning (GCL), as a self-supervised learning method, can solve the problem of annotated data scarcity. It mines explicit features in unannotated graphs to generate favorable graph representations for downstream tasks. Most existing GCL methods focus on the design of graph augmentation strategies and mutual information estimation operations. Graph augmentation produces augmented views by graph perturbations. These views preserve a locally similar structure and exploit explicit features. However, these methods have not considered the interaction existing in subgraphs. To explore the impact of substructure interactions on graph representations, we propose a novel framework called subgraph network-based contrastive learning (SGNCL). SGNCL applies a subgraph network generation strategy to produce augmented views. This strategy converts the original graph into an Edge-to-Node mapping network with both topological and attribute features. The single-shot augmented view is a first-order subgraph network that mines the interaction between nodes, node-edge, and edges. In addition, we also investigate the impact of the second-order subgraph augmentation on mining graph structure interactions, and further, propose a contrastive objective that fuses the first-order and second-order subgraph information. We compare SGNCL with classical and state-of-the-art graph contrastive learning methods on multiple benchmark datasets of different domains. Extensive experiments show that SGNCL achieves competitive or better performance (top three) on all datasets in unsupervised learning settings. Furthermore, SGNCL achieves the best average gain of 6.9\% in transfer learning compared to the best method. Finally, experiments also demonstrate that mining substructure interactions have positive implications for graph contrastive learning.

artificial intelligence, machine learning, representation, (17 more...)

arXiv.org Artificial Intelligence

2306.03506

Country: Asia > China > Zhejiang Province (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine (0.47)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.34)

Add feedback

A robust method for reliability updating with equality information using sequential adaptive importance sampling

Xiao, Xiong, Wang, Zeyu, Li, Quanwang

arXiv.org Artificial IntelligenceMar-8-2023

Reliability updating refers to a problem that integrates Bayesian updating technique with structural reliability analysis and cannot be directly solved by structural reliability methods (SRMs) when it involves equality information. The state-of-the-art approaches transform equality information into inequality information by introducing an auxiliary standard normal parameter. These methods, however, encounter the loss of computational efficiency due to the difficulty in finding the maximum of the likelihood function, the large coefficient of variation (COV) associated with the posterior failure probability and the inapplicability to dynamic updating problems where new information is constantly available. To overcome these limitations, this paper proposes an innovative method called RU-SAIS (reliability updating using sequential adaptive importance sampling), which combines elements of sequential importance sampling and K-means clustering to construct a series of important sampling densities (ISDs) using Gaussian mixture. The last ISD of the sequence is further adaptively modified through application of the cross entropy method. The performance of RU-SAIS is demonstrated by three examples. Results show that RU-SAIS achieves a more accurate and robust estimator of the posterior failure probability than the existing methods such as subset simulation.

artificial intelligence, gaussian mixture density, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.cma.2023.116028

2303.04545

Country: Asia > China (0.14)

Genre:

Research Report > Promising Solution (0.68)
Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Add feedback

On the Adversarial Robustness of Camera-based 3D Object Detection

Xie, Shaoyuan, Li, Zichao, Wang, Zeyu, Xie, Cihang

arXiv.org Artificial IntelligenceJan-25-2023

In recent years, camera-based 3D object detection has gained widespread attention for its ability to achieve high performance with low computational cost. However, the robustness of these methods to adversarial attacks has not been thoroughly examined. In this study, we conduct the first comprehensive investigation of the robustness of leading camera-based 3D object detection methods under various adversarial conditions. Our experiments reveal five interesting findings: (a) the use of accurate depth estimation effectively improves robustness; (b) depth-estimation-free approaches do not show superior robustness; (c) bird's-eye-view-based representations exhibit greater robustness against localization attacks; (d) incorporating multi-frame benign inputs can effectively mitigate adversarial attacks; and (e) addressing long-tail problems can enhance robustness. We hope our work can provide guidance for the design of future camera-based object detection modules with improved adversarial robustness.

artificial intelligence, machine learning, robustness, (16 more...)

arXiv.org Artificial Intelligence

2301.10766

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Whose Language Counts as High Quality? Measuring Language Ideologies in Text Data Selection

Gururangan, Suchin, Card, Dallas, Dreier, Sarah K., Gade, Emily K., Wang, Leroy Z., Wang, Zeyu, Zettlemoyer, Luke, Smith, Noah A.

arXiv.org Artificial IntelligenceJan-26-2022

Language models increasingly rely on massive web dumps for diverse text data. However, these sources are rife with undesirable content. As such, resources like Wikipedia, books, and newswire often serve as anchors for automatically selecting web text most suitable for language modeling, a process typically referred to as quality filtering. Using a new dataset of U.S. high school newspaper articles -- written by students from across the country -- we investigate whose language is preferred by the quality filter used for GPT-3. We find that newspapers from larger schools, located in wealthier, educated, and urban ZIP codes are more likely to be classified as high quality. We then demonstrate that the filter's measurement of quality is unaligned with other sensible metrics, such as factuality or literary acclaim. We argue that privileging any corpus as high quality entails a language ideology, and more care is needed to construct training corpora for language models, with better transparency and justification for the inclusion or exclusion of various texts.

high quality, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2201.10474

Country: North America > United States > Massachusetts (0.28)

Genre:

Research Report > New Finding (1.00)
Personal (1.00)
Research Report > Experimental Study (0.69)

Industry:

Media > News (1.00)
Leisure & Entertainment > Sports > Football (1.00)
Law (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.93)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.70)

Add feedback