AITopics | Moro, Daniele

Collaborating Authors

Moro, Daniele

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Custom Gradient Estimators are Straight-Through Estimators in Disguise

Schoenbauer, Matt, Moro, Daniele, Lew, Lukasz, Howard, Andrew

arXiv.org Artificial IntelligenceMay-22-2024

Quantization-aware training comes with a fundamental challenge: the derivative of quantization functions such as rounding are zero almost everywhere and nonexistent elsewhere. Various differentiable approximations of quantization functions have been proposed to address this issue. In this paper, we prove that a large class of weight gradient estimators is approximately equivalent with the straight through estimator (STE). Specifically, after swapping in the STE and adjusting both the weight initialization and the learning rate in SGD, the model will train in almost exactly the same way as it did with the original gradient estimator. Moreover, we show that for adaptive learning rate algorithms like Adam, the same result can be seen without any modifications to the weight initialization and learning rate. These results reduce the burden of hyperparameter tuning for practitioners of QAT, as they can now confidently choose the STE for gradient estimation and ignore more complex gradient estimators. We experimentally show that these results hold for both a small convolutional model trained on the MNIST dataset and for a ResNet50 model trained on ImageNet.

artificial intelligence, gradient estimator, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2405.05171

Country: North America > United States > Maryland (0.14)

Genre:

Research Report (0.64)
Overview (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

PikeLPN: Mitigating Overlooked Inefficiencies of Low-Precision Neural Networks

Neseem, Marina, McCullough, Conor, Hsin, Randy, Leichner, Chas, Li, Shan, Chong, In Suk, Howard, Andrew G., Lew, Lukasz, Reda, Sherief, Rautio, Ville-Mikko, Moro, Daniele

arXiv.org Artificial IntelligenceMar-29-2024

Low-precision quantization is recognized for its efficacy in neural network optimization. Our analysis reveals that non-quantized elementwise operations which are prevalent in layers such as parameterized activation functions, batch normalization, and quantization scaling dominate the inference cost of low-precision models. These non-quantized elementwise operations are commonly overlooked in SOTA efficiency metrics such as Arithmetic Computation Effort (ACE). In this paper, we propose ACEv2 - an extended version of ACE which offers a better alignment with the inference cost of quantized models and their energy consumption on ML hardware. Moreover, we introduce PikeLPN, a model that addresses these efficiency issues by applying quantization to both elementwise operations and multiply-accumulate operations. In particular, we present a novel quantization technique for batch normalization layers named QuantNorm which allows for quantizing the batch normalization parameters without compromising the model performance. Additionally, we propose applying Double Quantization where the quantization scaling parameters are quantized. Furthermore, we recognize and resolve the issue of distribution mismatch in Separable Convolution layers by introducing Distribution-Heterogeneous Quantization which enables quantizing them to low-precision. PikeLPN achieves Pareto-optimality in efficiency-accuracy trade-off with up to 3X efficiency improvement compared to SOTA low-precision models.

artificial intelligence, machine learning, opération, (17 more...)

arXiv.org Artificial Intelligence

2404.00103

Genre: Research Report (0.82)

Industry: Energy (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Enabling Binary Neural Network Training on the Edge

Wang, Erwei, Davis, James J., Moro, Daniele, Zielinski, Piotr, Lim, Jia Jie, Coelho, Claudionor, Chatterjee, Satrajit, Cheung, Peter Y. K., Constantinides, George A.

arXiv.org Artificial IntelligenceSep-24-2023

The ever-growing computational demands of increasingly complex machine learning models frequently necessitate the use of powerful cloud-based infrastructure for their training. Binary neural networks are known to be promising candidates for on-device inference due to their extreme compute and memory savings over higher-precision alternatives. However, their existing training methods require the concurrent storage of high-precision activations for all layers, generally making learning on memory-constrained devices infeasible. In this article, we demonstrate that the backward propagation operations needed for binary neural network training are strongly robust to quantization, thereby making on-the-edge learning with modern models a practical proposition. We introduce a low-cost binary neural network training strategy exhibiting sizable memory footprint reductions while inducing little to no accuracy loss vs Courbariaux & Bengio's standard approach. These decreases are primarily enabled through the retention of activations exclusively in binary format. Against the latter algorithm, our drop-in replacement sees memory requirement reductions of 3--5$\times$, while reaching similar test accuracy in comparable time, across a range of small-scale models trained to classify popular datasets. We also demonstrate from-scratch ImageNet training of binarized ResNet-18, achieving a 3.78$\times$ memory reduction. Our work is open-source, and includes the Raspberry Pi-targeted prototype we used to verify our modeled memory decreases and capture the associated energy drops. Such savings will allow for unnecessary cloud offloading to be avoided, reducing latency, increasing energy efficiency, and safeguarding end-user privacy.

artificial intelligence, conference, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2102.0427

Country:

Europe > United Kingdom (0.29)
North America > United States > California > Santa Clara County > Palo Alto (0.14)

Genre: Research Report (0.50)

Industry: Information Technology > Hardware (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback