AITopics | Wang, Zining

Collaborating Authors

Wang, Zining

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Fine-Grained Evaluation of Large Vision-Language Models in Autonomous Driving

Li, Yue, Tian, Meng, Lin, Zhenyu, Zhu, Jiangtong, Zhu, Dechang, Liu, Haiqiang, Wang, Zining, Zhang, Yueyi, Xiong, Zhiwei, Zhao, Xinhai

arXiv.org Artificial IntelligenceMar-27-2025

Existing benchmarks for Vision-Language Model (VLM) on autonomous driving (AD) primarily assess interpretability through open-form visual question answering (QA) within coarse-grained tasks, which remain insufficient to assess capabilities in complex driving scenarios. To this end, we introduce $\textbf{VLADBench}$, a challenging and fine-grained dataset featuring close-form QAs that progress from static foundational knowledge and elements to advanced reasoning for dynamic on-road situations. The elaborate $\textbf{VLADBench}$ spans 5 key domains: Traffic Knowledge Understanding, General Element Recognition, Traffic Graph Generation, Target Attribute Comprehension, and Ego Decision-Making and Planning. These domains are further broken down into 11 secondary aspects and 29 tertiary tasks for a granular evaluation. A thorough assessment of general and domain-specific (DS) VLMs on this benchmark reveals both their strengths and critical limitations in AD contexts. To further exploit the cognitive and reasoning interactions among the 5 domains for AD understanding, we start from a small-scale VLM and train the DS models on individual domain datasets (collected from 1.4M DS QAs across public sources). The experimental results demonstrate that the proposed benchmark provides a crucial step toward a more comprehensive assessment of VLMs in AD, paving the way for the development of more cognitively sophisticated and reasoning-capable AD systems.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2503.21505

Genre: Research Report > New Finding (0.48)

Industry:

Transportation > Ground > Road (1.00)
Transportation > Infrastructure & Services (0.94)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

Add feedback

InstructOCR: Instruction Boosting Scene Text Spotting

Duan, Chen, Jiang, Qianyi, Fu, Pei, Chen, Jiamin, Li, Shengxi, Wang, Zining, Guo, Shan, Luo, Junfeng

arXiv.org Artificial IntelligenceJan-13-2025

In the field of scene text spotting, previous OCR methods primarily relied on image encoders and pre-trained text information, but they often overlooked the advantages of incorporating human language instructions. To address this gap, we propose InstructOCR, an innovative instruction-based scene text spotting model that leverages human language instructions to enhance the understanding of text within images. Our framework employs both text and image encoders during training and inference, along with instructions meticulously designed based on text attributes. This approach enables the model to interpret text more accurately and flexibly. Extensive experiments demonstrate the effectiveness of our model and we achieve state-of-the-art results on widely used benchmarks. Furthermore, the proposed framework can be seamlessly applied to scene text VQA tasks. By leveraging instruction strategies during pre-training, the performance on downstream VQA tasks can be significantly improved, with a 2.6% increase on the TextVQA dataset and a 2.1% increase on the ST-VQA dataset. These experimental results provide insights into the benefits of incorporating human language instructions for OCR-related tasks.

machine learning, natural language, pattern recognition, (15 more...)

arXiv.org Artificial Intelligence

2412.15523

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.70)

Add feedback

A Survey of Low-bit Large Language Models: Basics, Systems, and Algorithms

Gong, Ruihao, Ding, Yifu, Wang, Zining, Lv, Chengtao, Zheng, Xingyu, Du, Jinyang, Qin, Haotong, Guo, Jinyang, Magno, Michele, Liu, Xianglong

arXiv.org Artificial IntelligenceSep-30-2024

However, their remarkable capabilities come with significant computational and memory demands. This has raised considerable challenges when deploying these models in scenarios with limited resources or high concurrency. To address these challenges, low-bit quantization has emerged as a pivotal approach for enhancing the efficiency and deployability of LLMs. Low-bit quantization involves the process of reducing the bit-width of tensors, which effectively decreases the memory footprint and computational requirements of LLMs. By compressing weights, activations, and gradients of LLMs with low-bit integer/binary representation, quantization can significantly accelerate inference and training and reduce storage requirements with acceptable accuracy. This efficiency is crucial for enabling advanced LLMs to be accessible on devices with constrained resources, thereby broadening their applicability. In this paper, we aim to provide a survey with a comprehensive overview of low-bit quantization for large language models (LLMs), encompassing the fundamental concepts, system implementations, and algorithmic approaches related to low-bit LLMs. Compared with the traditional models, LLMs, as the representative paradigm of the foundation model, always feature a vast number of parameters, which presents unique challenges for effective quantization. As depicted in Figure 1, Section 2 introduces the fundamentals of low-bit quantization of LLMs, including new low-bit data formats and quantization granularities specific to LLMs.

large language model, machine learning, quantization, (19 more...)

arXiv.org Artificial Intelligence

2409.16694

Country:

Asia > China (0.46)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Overview (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback