AITopics | deepzero

Collaborating Authors

deepzero

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DeepZero: Scaling up Zeroth-Order Optimization for Deep Model Training

Chen, Aochuan, Zhang, Yimeng, Jia, Jinghan, Diffenderfer, James, Liu, Jiancheng, Parasyris, Konstantinos, Zhang, Yihua, Zhang, Zheng, Kailkhura, Bhavya, Liu, Sijia

arXiv.org Artificial IntelligenceFeb-3-2024

Zeroth-order (ZO) optimization has become a popular technique for solving machine learning (ML) problems when first-order (FO) information is difficult or impossible to obtain. However, the scalability of ZO optimization remains an open problem: Its use has primarily been limited to relatively small-scale ML problems, such as sample-wise adversarial attack generation. To our best knowledge, no prior work has demonstrated the effectiveness of ZO optimization in training deep neural networks (DNNs) without a significant decrease in performance. To overcome this roadblock, we develop DeepZero, a principled ZO deep learning (DL) framework that can scale ZO optimization to DNN training from scratch through three primary innovations. First, we demonstrate the advantages of coordinatewise gradient estimation (CGE) over randomized vector-wise gradient estimation in training accuracy and computational efficiency. Second, we propose a sparsityinduced ZO training protocol that extends the model pruning methodology using only finite differences to explore and exploit the sparse DL prior in CGE. Third, we develop the methods of feature reuse and forward parallelization to advance the practical implementations of ZO training. Our extensive experiments show that DeepZero achieves state-of-the-art (SOTA) accuracy on ResNet-20 trained on CIFAR-10, approaching FO training performance for the first time. Furthermore, we show the practical utility of DeepZero in applications of certified adversarial defense and DL-based partial differential equation error correction, achieving 10-20% improvement over SOTA. We believe our results will inspire future research on scalable ZO optimization and contribute to advancing DL with black box. Codes are available at https://github.com/OPTML-Group/DeepZero.

arxiv preprint arxiv, deepzero, optimization, (13 more...)

arXiv.org Artificial Intelligence

2310.02025

Country:

North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
North America > United States > Virginia > Arlington County > Arlington (0.04)
North America > United States > Michigan (0.04)
(3 more...)

Genre: Research Report > New Finding (0.66)

Industry:

Energy (0.93)
Education (0.68)
Government (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

Designing and Training of Lightweight Neural Networks on Edge Devices using Early Halting in Knowledge Distillation

Mishra, Rahul, Gupta, Hari Prabhat

arXiv.org Artificial IntelligenceSep-30-2022

Automated feature extraction capability and significant performance of Deep Neural Networks (DNN) make them suitable for Internet of Things (IoT) applications. However, deploying DNN on edge devices becomes prohibitive due to the colossal computation, energy, and storage requirements. This paper presents a novel approach for designing and training lightweight DNN using large-size DNN. The approach considers the available storage, processing speed, and maximum allowable processing time to execute the task on edge devices. We present a knowledge distillation based training procedure to train the lightweight DNN to achieve adequate accuracy. During the training of lightweight DNN, we introduce a novel early halting technique, which preserves network resources; thus, speedups the training procedure. Finally, we present the empirically and real-world evaluations to verify the effectiveness of the proposed approach under different constraints using various edge devices.

artificial intelligence, dnn, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2209.1556

Country: Asia > India (0.04)

Genre: Research Report (1.00)

Industry:

Education (1.00)
Information Technology (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback