Goto

Collaborating Authors

 quantization layer


Being-M0.5: A Real-Time Controllable Vision-Language-Motion Model

Cao, Bin, Zheng, Sipeng, Wang, Ye, Xia, Lujie, Wei, Qianshan, Jin, Qin, Liu, Jing, Lu, Zongqing

arXiv.org Artificial Intelligence

Human motion generation has emerged as a critical technology with transformative potential for real-world applications. However, existing vision-language-motion models (VLMMs) face significant limitations that hinder their practical deployment. We identify controllability as a main bottleneck, manifesting in five key aspects: inadequate response to diverse human commands, limited pose initialization capabilities, poor performance on long-term sequences, insufficient handling of unseen scenarios, and lack of fine-grained control over individual body parts. To overcome these limitations, we present Being-M0.5, the first real-time, controllable VLMM that achieves state-of-the-art performance across multiple motion generation tasks. Our approach is built upon HuMo100M, the largest and most comprehensive human motion dataset to date, comprising over 5 million self-collected motion sequences, 100 million multi-task instructional instances, and detailed part-level annotations that address a critical gap in existing datasets. We introduce a novel part-aware residual quantization technique for motion tokenization that enables precise, granular control over individual body parts during generation. Extensive experimental validation demonstrates Being-M0.5's superior performance across diverse motion benchmarks, while comprehensive efficiency analysis confirms its real-time capabilities. Our contributions include design insights and detailed computational analysis to guide future development of practical motion generators. We believe that HuMo100M and Being-M0.5 represent significant advances that will accelerate the adoption of motion generation technologies in real-world applications. The project page is available at https://beingbeyond.github.io/Being-M0.5.


DeepHQ: Learned Hierarchical Quantizer for Progressive Deep Image Coding

Lee, Jooyoung, Jeong, Se Yoon, Kim, Munchurl

arXiv.org Artificial Intelligence

Unlike fixed- or variable-rate image coding, progressive image coding (PIC) aims to compress various qualities of images into a single bitstream, increasing the versatility of bitstream utilization and providing high compression efficiency compared to simulcast compression. Research on neural network (NN)-based PIC is in its early stages, mainly focusing on applying varying quantization step sizes to the transformed latent representations in a hierarchical manner. These approaches are designed to compress only the progressively added information as the quality improves, considering that a wider quantization interval for lower-quality compression includes multiple narrower sub-intervals for higher-quality compression. However, the existing methods are based on handcrafted quantization hierarchies, resulting in sub-optimal compression efficiency. In this paper, we propose an NN-based progressive coding method that firstly utilizes learned quantization step sizes via learning for each quantization layer. We also incorporate selective compression with which only the essential representation components are compressed for each quantization layer. We demonstrate that our method achieves significantly higher coding efficiency than the existing approaches with decreased decoding time and reduced model size.


LL-VQ-VAE: Learnable Lattice Vector-Quantization For Efficient Representations

Khalil, Ahmed, Piechocki, Robert, Santos-Rodriguez, Raul

arXiv.org Artificial Intelligence

In this paper we introduce learnable lattice vector quantization and demonstrate its effectiveness for learning discrete representations. Our method, termed LL-VQ-VAE, replaces the vector quantization layer in VQ-VAE with lattice-based discretization. The learnable lattice imposes a structure over all discrete embeddings, acting as a deterrent against codebook collapse, leading to high codebook utilization. Compared to VQ-VAE, our method obtains lower reconstruction errors under the same training conditions, trains in a fraction of the time, and with a constant number of parameters (equal to the embedding dimension $D$), making it a very scalable approach. We demonstrate these results on the FFHQ-1024 dataset and include FashionMNIST and Celeb-A.


CQNet: Complex Input Quantized Neural Network designed for Massive MIMO CSI Feedback

Ji, Sijie, Sun, Weiping, Li, Mo

arXiv.org Artificial Intelligence

The Massive Multiple Input Multiple Output (MIMO) system is a core technology of the next generation communication. With the growing complexity of CSI in massive MIMO system, traditional compressive sensing based CSI feedback has become a bottleneck problem that is limited in piratical. Recently, numerous deep learning based CSI feedback approaches demonstrate the efficiency and potential. However, the existing methods lack a reasonable interpretation of the deep learning model and the accuracy of the model decreases significantly as the CSI compression rate increases. In this paper, from the intrinsic properties of CSI data itself, we devised the corresponding deep learning building blocks to compose a novel neural network CQNet and experiment result shows CQNet outperform the state-of-the-art method with less computational overhead by achieving an average performance improvement of 8.07% in both outdoor and indoor scenarios. In addition, this paper also investigates the reasons for the decrease in model accuracy at large compression rates and proposes a strategy to embed a quantization layer to achieve effective compression, by which the original accuracy loss of 67.19% on average is reduced to 21.96% on average, and the compression rate is increased by 8 times on the original benchmark. The massive multiple-input multiple-output (MIMO) technology is considered one of the core technologies of the next generation communication system, e.g., 5G. By equipping large number of antennas, base station (BS) can sufficiently utilize spatial diversity to improve channel capacity.


QuSecNets: Quantization-based Defense Mechanism for Securing Deep Neural Network against Adversarial Attacks

Ali, Hassan, Tariq, Hammad, Hanif, Muhammad Abdullah, Khalid, Faiq, Rehman, Semeen, Ahmed, Rehan, Shafique, Muhammad

arXiv.org Machine Learning

Deep Neural Networks (DNNs) have recently been shown vulnerable to adversarial attacks in which the input examples are perturbed to fool these DNNs towards confidence reduction and (targeted or random) misclassification. In this paper, we demonstrate that how an efficient quantization technique can be leveraged to increase the robustness of a given DNN against adversarial attacks. We present two quantization-based defense mechanisms, namely Constant Quantization (CQ) and Variable Quantization (VQ), applied at the input to increase the robustness of DNNs. In CQ, the intensity of the input pixel is quantized according to the number of quantization levels. While in VQ, the quantization levels are recursively updated during the training phase, thereby providing a stronger defense mechanism. We apply our techniques on the Convolutional Neural Networks (CNNs, a particular type of DNN which is heavily used in vision-based applications) against adversarial attacks from the open-source Cleverhans library. Our experimental results show 1%-5% increase in the adversarial accuracy for MNIST and 0%-2.4% increase in the adversarial accuracy for CIFAR10.


Operations Guided Neural Networks for High Fidelity Data-To-Text Generation

Nie, Feng, Wang, Jinpeng, Yao, Jin-Ge, Pan, Rong, Lin, Chin-Yew

arXiv.org Artificial Intelligence

Recent neural models for data-to-text generation are mostly based on data-driven end-to-end training over encoder-decoder networks. Even though the generated texts are mostly fluent and informative, they often generate descriptions that are not consistent with the input structured data. This is a critical issue especially in domains that require inference or calculations over raw data. In this paper, we attempt to improve the fidelity of neural data-to-text generation by utilizing pre-executed symbolic operations. We propose a framework called Operation-guided Attention-based sequence-to-sequence network (OpAtt), with a specifically designed gating mechanism as well as a quantization module for operation results to utilize information from pre-executed operations. Experiments on two sports datasets show our proposed method clearly improves the fidelity of the generated texts to the input structured data.