AITopics | Shu, Changyong

Collaborating Authors

Shu, Changyong

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Q-PETR: Quant-aware Position Embedding Transformation for Multi-View 3D Object Detection

Yu, Jiangyong, Shu, Changyong, Yang, Dawei, Zhou, Sifan, Yu, Zichen, Hu, Xing, Chen, Yan

arXiv.org Artificial IntelligenceMar-11-2025

Camera-based multi-view 3D detection has emerged as an attractive solution for autonomous driving due to its low cost and broad applicability. However, despite the strong performance of PETR-based methods in 3D perception benchmarks, their direct INT8 quantization for onboard deployment leads to drastic accuracy drops-up to 58.2% in mAP and 36.9% in NDS on the NuScenes dataset. In this work, we propose Q-PETR, a quantization-aware position embedding transformation that re-engineers key components of the PETR framework to reconcile the discrepancy between the dynamic ranges of positional encodings and image features, and to adapt the cross-attention mechanism for low-bit inference. By redesigning the positional encoding module and introducing an adaptive quantization strategy, Q-PETR maintains floating-point performance with a performance degradation of less than 1% under standard 8-bit per-tensor post-training quantization. Moreover, compared to its FP32 counterpart, Q-PETR achieves a two-fold speedup and reduces memory usage by three times, thereby offering a deployment-friendly solution for resource-constrained onboard devices. Extensive experiments across various PETR-series models validate the strong generalization and practical benefits of our approach.

machine learning, natural language, quantization, (19 more...)

arXiv.org Artificial Intelligence

2502.15488

Country:

Asia (0.28)
North America > Canada (0.14)
Europe > Germany (0.14)

Genre: Research Report (0.50)

Industry: Information Technology (0.34)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.94)
(3 more...)

Add feedback

MQuant: Unleashing the Inference Potential of Multimodal Large Language Models via Full Static Quantization

Yu, JiangYong, Zhou, Sifan, Yang, Dawei, Wang, Shuo, Li, Shuoyu, Hu, Xing, Xu, Chen, Xu, Zukang, Shu, Changyong, Yuan, Zhihang

arXiv.org Artificial IntelligenceFeb-1-2025

Multimodal large language models (MLLMs) have garnered widespread attention due to their ability to understand multimodal input. However, their large parameter sizes and substantial computational demands severely hinder their practical deployment and application.While quantization is an effective way to reduce model size and inference latency, its application to MLLMs remains underexplored. In this paper, we propose MQuant, a post-training quantization (PTQ) framework designed to tackle the unique challenges of multimodal large language models (MLLMs). Conventional quantization often struggles with MLLMs because of (a) high inference latency from large visual token counts, (b) distributional disparities between visual and textual tokens, and (c) extreme outliers introduced by Hadamard-based transformations. To address these issues, MQuant introduces: Modality-Specific Static Quantization (MSQ), assigning distinct static scales for visual vs. textual tokens; Attention-Invariant Flexible Switching (AIFS), reordering tokens to preserve casual attention while eliminating expensive token-wise scale computations; Rotation Magnitude Suppression (RMS), mitigating weight outliers arising from online Hadamard rotations. On five mainstream MLLMs (including Qwen-VL, MiniCPM-V, CogVLM2), MQuant under W4A8 achieves near-floating-point accuracy (<1% degradation) while reducing inference latency by up to 30%, significantly outperforming existing PTQ baselines. Our MQuant effectively bridges the gap for efficient and accurate MLLMs inference in resource-constrained devices. Code will be released.

large language model, machine learning, quantization, (15 more...)

arXiv.org Artificial Intelligence

2502.00425

Country: Asia (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

GSRender: Deduplicated Occupancy Prediction via Weakly Supervised 3D Gaussian Splatting

Sun, Qianpu, Shu, Changyong, Zhou, Sifan, Yu, Zichen, Chen, Yan, Yang, Dawei, Chun, Yuan

arXiv.org Artificial IntelligenceDec-19-2024

3D occupancy perception is gaining increasing attention due to its capability to offer detailed and precise environment representations. Previous weakly-supervised NeRF methods balance efficiency and accuracy, with mIoU varying by 5-10 points due to sampling count along camera rays. Recently, real-time Gaussian splatting has gained widespread popularity in 3D reconstruction, and the occupancy prediction task can also be viewed as a reconstruction task. Consequently, we propose GSRender, which naturally employs 3D Gaussian Splatting for occupancy prediction, simplifying the sampling process. In addition, the limitations of 2D supervision result in duplicate predictions along the same camera ray. We implemented the Ray Compensation (RC) module, which mitigates this issue by compensating for features from adjacent frames. Finally, we redesigned the loss to eliminate the impact of dynamic objects from adjacent frames. Extensive experiments demonstrate that our approach achieves SOTA (state-of-the-art) results in RayIoU (+6.0), while narrowing the gap with 3D supervision methods. Our code will be released soon.

artificial intelligence, gaussian, prediction, (16 more...)

arXiv.org Artificial Intelligence

2412.14579

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.93)

Technology: Information Technology > Artificial Intelligence > Vision (1.00)

Add feedback