AITopics | dynamic quantization

Collaborating Authors

dynamic quantization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

983591c3e9a0dc94a99134b3238bbe52-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-16-2026, 01:10:52 GMT

artificial intelligence, machine learning, quantization interval, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.31)

Add feedback

Temporal Dynamic Quantization for Diffusion Models

Neural Information Processing SystemsFeb-16-2026, 01:10:48 GMT

The diffusion model has gained popularity in vision applications due to its remarkable generative performance and versatility.

artificial intelligence, diffusion model, machine learning, (14 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Supplementary Material for Temporal Dynamic Quantization for Diffusion Models 1 Introduction

Neural Information Processing SystemsOct-9-2025, 02:09:34 GMT

The following items are provided: A comparison between Dynamic Quantization and TDQ in Section 2. Ablation study on time step encoding in Section 3. Detailed TDQ Module architecture in Section 4. Comparison with multiple quantization interval directly on PTQ in Section 5. Integration of TDQ with various QA T schemes in Section 6. V arious experiments about robustness of the TDQ Module in Section 7. Detailed experimental results on the Output dynamics of the TDQ module in Section 8. Detailed experimental results on the Evolution of Activation Distribution in Section 9. V arious non-cherry-picked results of generated images in Section 10.

artificial intelligence, machine learning, quantization interval, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.31)

Add feedback

Effective Quantization of Muon Optimizer States

Gupta, Aman, Celente, Rafael, Shivanna, Abhishek, Braithwaite, D. T., Dexter, Gregory, Tang, Shao, Udagawa, Hiroto, Silva, Daniel, Ramanath, Rohan, Keerthi, S. Sathiya

arXiv.org Artificial IntelligenceSep-30-2025

The Muon optimizer, based on matrix orthogonalization, has recently shown faster convergence and up to 2x computational efficiency over AdamW in LLM pretraining. Like AdamW, Muon is stateful, requiring storage of both model weights and accumulated gradients. While 8-bit AdamW variants mitigate this overhead using blockwise quantization, they are typically stable only under dynamic quantization - which improves stability on linear quantization for extreme values. In this paper, we introduce the 8-bit Muon optimizer using blockwise quantization, supporting both linear and dynamic schemes. We demonstrate that 8-bit Muon maintains stability under both, while delivering $\sim$74\% reduction in memory footprint compared to full-precision Muon. In extensive experiments, 8-bit Muon closely matches the performance of Muon while outperforming AdamW and 8-bit AdamW in pre-training a 1.6B model on 4B FineWeb tokens. It also shows competitive results when fine-tuning the Llama 3.2 3B model on post-training data. We also provide a theoretical perspective to help explain this robustness under quantization.

large language model, machine learning, quantization, (19 more...)

arXiv.org Artificial Intelligence

2509.23106

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.89)

Add feedback

ProfilingAgent: Profiling-Guided Agentic Reasoning for Adaptive Model Optimization

Jafari, Sadegh, Sarkar, Aishwarya, Bilwal, Mohiuddin, Jannesari, Ali

arXiv.org Artificial IntelligenceSep-9-2025

Foundation models face growing compute and memory bottlenecks, hindering deployment on resource-limited platforms. While compression techniques such as pruning and quantization are widely used, most rely on uniform heuristics that ignore architectural and runtime heterogeneity. Profiling tools expose per-layer latency, memory, and compute cost, yet are rarely integrated into automated pipelines. We propose ProfilingAgent, a profiling-guided, agentic approach that uses large language models (LLMs) to automate compression via structured pruning and post-training dynamic quantization. Our modular multi-agent system reasons over static metrics (MACs, parameter counts) and dynamic signals (latency, memory) to design architecture-specific strategies. Unlike heuristic baselines, ProfilingAgent tailors layer-wise decisions to bottlenecks. Experiments on ImageNet-1K, CIFAR-10, and CIFAR-100 with ResNet-101, ViT-B/16, Swin-B, and DeiT-B/16 show pruning maintains competitive or improved accuracy (about 1% drop on ImageNet-1K, +2% gains for ViT-B/16 on smaller datasets), while quantization achieves up to 74% memory savings with <0.5% accuracy loss. Our quantization also yields consistent inference speedups of up to 1.74 times faster. Comparative studies with GPT-4o and GPT-4-Turbo highlight the importance of LLM reasoning quality for iterative pruning. These results establish agentic systems as scalable solutions for profiling-guided model optimization.

large language model, machine learning, quantization, (18 more...)

arXiv.org Artificial Intelligence

2509.05584

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Selective Quantization Tuning for ONNX Models

Louloudakis, Nikolaos, Rajan, Ajitha

arXiv.org Artificial IntelligenceJul-17-2025

Quantization is a process that reduces the precision of deep neural network models to lower model size and computational demands, often at the cost of accuracy. However, fully quantized models may exhibit sub-optimal performance below acceptable levels and face deployment challenges on low-end hardware accelerators due to practical constraints. To address these issues, quantization can be selectively applied to only a subset of layers, but selecting which layers to exclude is non-trivial. To this direction, we propose TuneQn, a suite enabling selective quantization, deployment and execution of ONNX models across various CPU and GPU devices, combined with profiling and multi-objective optimization. TuneQn generates selectively quantized ONNX models, deploys them on different hardware, measures performance on metrics like accuracy and size, performs Pareto Front minimization to identify the best model candidate and visualizes the results. To demonstrate the effectiveness of TuneQn, we evaluated TuneQn on four ONNX models with two quantization settings across CPU and GPU devices. As a result, we demonstrated that our utility effectively performs selective quantization and tuning, selecting ONNX model candidates with up to a $54.14$% reduction in accuracy loss compared to the fully quantized model, and up to a $72.9$% model size reduction compared to the original model.

artificial intelligence, machine learning, quantization, (17 more...)

arXiv.org Artificial Intelligence

2507.12196

Country:

North America > United States (0.47)
South America > Brazil > Rio de Janeiro (0.15)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

A probabilistic framework for dynamic quantization

Santini, Gabriele, Paissan, Francesco, Farella, Elisabetta

arXiv.org Artificial IntelligenceMay-19-2025

We propose a probabilistic framework for dynamic quantization of neural networks that allows for a computationally efficient input-adaptive rescaling of the quantization parameters. Our framework applies a probabilistic model to the network's pre-activations through a lightweight surrogate, enabling the adaptive adjustment of the quantization parameters on a per-input basis without significant memory overhead. We validate our approach on a set of popular computer vision tasks and models, observing only a negligible loss in performance. Our method strikes the best performance and computational overhead tradeoff compared to standard quantization strategies.

artificial intelligence, machine learning, quantization, (20 more...)

arXiv.org Artificial Intelligence

2505.10689

Country: Europe > Italy > Trentino-Alto Adige/Südtirol > Trentino Province > Trento (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Hardware-Friendly Static Quantization Method for Video Diffusion Transformers

Yi, Sanghyun, Liu, Qingfeng, El-Khamy, Mostafa

arXiv.org Artificial IntelligenceFeb-20-2025

Diffusion Transformers for video generation have gained significant research interest since the impressive performance of SORA. Efficient deployment of such generative-AI models on GPUs has been demonstrated with dynamic quantization. However, resource-constrained devices cannot support dynamic quantization, and need static quantization of the models for their efficient deployment on AI processors. In this paper, we propose a novel method for the post-training quantization of OpenSora\cite{opensora}, a Video Diffusion Transformer, without relying on dynamic quantization techniques. Our approach employs static quantization, achieving video quality comparable to FP16 and dynamically quantized ViDiT-Q methods, as measured by CLIP, and VQA metrics. In particular, we utilize per-step calibration data to adequately provide a post-training statically quantized model for each time step, incorporating channel-wise quantization for weights and tensor-wise quantization for activations. By further applying the smooth-quantization technique, we can obtain high-quality video outputs with the statically quantized models. Extensive experimental results demonstrate that static quantization can be a viable alternative to dynamic quantization for video diffusion transformers, offering a more efficient approach without sacrificing performance.

quantization, static quantization, time step, (15 more...)

arXiv.org Artificial Intelligence

2502.15077

Country:

North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > California > Los Angeles County > Pasadena (0.04)

Genre:

Research Report > Promising Solution (0.34)
Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Add feedback

Survey of Quantization Techniques for On-Device Vision-based Crack Detection

Zhang, Yuxuan, Martinez-Rau, Luciano Sebastian, Vu, Quynh Nguyen Phuong, Oelmann, Bengt, Bader, Sebastian

arXiv.org Artificial IntelligenceFeb-4-2025

Structural Health Monitoring (SHM) ensures the safety and longevity of infrastructure by enabling timely damage detection. Vision-based crack detection, combined with UAVs, addresses the limitations of traditional sensor-based SHM methods but requires the deployment of efficient deep learning models on resource-constrained devices. This study evaluates two lightweight convolutional neural network models, MobileNetV1x0.25 and MobileNetV2x0.5, across TensorFlow, PyTorch, and Open Neural Network Exchange platforms using three quantization techniques: dynamic quantization, post-training quantization (PTQ), and quantization-aware training (QAT). Results show that QAT consistently achieves near-floating-point accuracy, such as an F1-score of 0.8376 for MBNV2x0.5 with Torch-QAT, while maintaining efficient resource usage. PTQ significantly reduces memory and energy consumption but suffers from accuracy loss, particularly in TensorFlow. Dynamic quantization preserves accuracy but faces deployment challenges on PyTorch. By leveraging QAT, this work enables real-time, low-power crack detection on UAVs, enhancing safety, scalability, and cost-efficiency in SHM applications, while providing insights into balancing accuracy and efficiency across different platforms for autonomous inspections.

artificial intelligence, machine learning, quantization technique, (17 more...)

arXiv.org Artificial Intelligence

2502.02269

Country: Europe > Sweden (0.04)

Genre: Research Report > New Finding (0.89)

Industry: Information Technology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Edge AI: Evaluation of Model Compression Techniques for Convolutional Neural Networks

Francy, Samer, Singh, Raghubir

arXiv.org Artificial IntelligenceSep-2-2024

This work evaluates the compression techniques on ConvNeXt models in image classification tasks using the CIFAR-10 dataset. Structured pruning, unstructured pruning, and dynamic quantization methods are evaluated to reduce model size and computational complexity while maintaining accuracy. The experiments, conducted on cloud-based platforms and edge device, assess the performance of these techniques. Results show significant reductions in model size, with up to 75% reduction achieved using structured pruning techniques. Additionally, dynamic quantization achieves a reduction of up to 95% in the number of parameters. Fine-tuned models exhibit improved compression performance, indicating the benefits of pre-training in conjunction with compression techniques. Unstructured pruning methods reveal trends in accuracy and compression, with limited reductions in computational complexity. The combination of OTOV3 pruning and dynamic quantization further enhances compression performance, resulting 89.7% reduction in size, 95% reduction with number of parameters and MACs, and 3.8% increase with accuracy. The deployment of the final compressed model on edge device demonstrates high accuracy 92.5% and low inference time 20 ms, validating the effectiveness of compression techniques for real-world edge computing applications.

accuracy, pruning, quantization, (13 more...)

arXiv.org Artificial Intelligence

2409.02134

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Somerset > Bath (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
(2 more...)

Genre: Research Report > New Finding (0.66)

Industry:

Health & Medicine > Therapeutic Area (0.94)
Information Technology > Security & Privacy (0.93)
Information Technology > Services (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback