AITopics | Gross, Warren J.

Collaborating Authors

Gross, Warren J.

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Automatic Pruning of Fine-tuning Datasets for Transformer-based Language Models

Tayaranian, Mohammadreza, Mozafari, Seyyed Hasan, Meyer, Brett H., Clark, James J., Gross, Warren J.

arXiv.org Artificial IntelligenceJul-11-2024

Transformer-based language models have shown state-of-the-art performance on a variety of natural language understanding tasks. To achieve this performance, these models are first pre-trained on general corpus and then fine-tuned on downstream tasks. Previous work studied the effect of pruning the training set of the downstream tasks on the performance of the model on its evaluation set. In this work, we propose an automatic dataset pruning method for the training set of fine-tuning tasks. Our method is based on the model's success rate in correctly classifying each training data point. Unlike previous work which relies on user feedback to determine subset size, our method automatically extracts training subsets that are adapted for each pair of model and fine-tuning task. Our method provides multiple subsets for use in dataset pruning that navigate the trade-off between subset size and evaluation accuracy. Our largest subset, which we also refer to as the winning ticket subset, is on average 3 smaller than the original training set of the fine-tuning task. Our experiments on 5 downstream tasks and 2 language models show that, on average, fine-tuning on the winning ticket subsets results in a 0.1% increase in the evaluation performance of the model. Transformer-based language models have shown state-of-the-art performance in various natural language understanding tasks (Liu et al., 2019; Raffel et al., 2020). These models are commonly used in a transfer learning setup in which they are first pre-trained on general textual data and then transferred by fine-tuning their parameters on the training set of each downstream task. The goal of fine-tuning is to maximise the model's performance on the evaluation set However, different data points in the fine-tuning dataset have different contributions in achieving this goal (Katharopoulos & Fleuret, 2018).

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2407.08887

Country:

North America > Canada > Quebec > Montreal (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SSS3D: Fast Neural Architecture Search For Efficient Three-Dimensional Semantic Segmentation

Therrien, Olivier, Amein, Marihan, Xiong, Zhuoran, Gross, Warren J., Meyer, Brett H.

arXiv.org Artificial IntelligenceApr-21-2023

We present SSS3D, a fast multi-objective NAS framework designed to find computationally efficient 3D semantic scene segmentation networks. It uses RandLA-Net, an off-the-shelf point-based network, as a super-network to enable weight sharing and reduce search time by 99.67% for single-stage searches. SSS3D has a complex search space composed of sampling and architectural parameters that can form 2.88 * 10^17 possible networks. To further reduce search time, SSS3D splits the complete search space and introduces a two-stage search that finds optimal subnetworks in 54% of the time required by single-stage searches.

artificial intelligence, machine learning, randla-net, (15 more...)

arXiv.org Artificial Intelligence

2304.11207

Country:

North America > United States > California > San Francisco County > San Francisco (0.15)
North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report (0.40)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (0.94)

Add feedback

FMAS: Fast Multi-Objective SuperNet Architecture Search for Semantic Segmentation

Xiong, Zhuoran, Amein, Marihan, Therrien, Olivier, Gross, Warren J., Meyer, Brett H.

arXiv.org Artificial IntelligenceMar-28-2023

We present FMAS, a fast multi-objective neural architecture search framework for semantic segmentation. FMAS subsamples the structure and pre-trained parameters of DeepLabV3+, without fine-tuning, dramatically reducing training time during search. To further reduce candidate evaluation time, we use a subset of the validation dataset during the search. Only the final, Pareto non-dominated, candidates are ultimately fine-tuned using the complete training set. We evaluate FMAS by searching for models that effectively trade accuracy and computational cost on the PASCAL VOC 2012 dataset. FMAS finds competitive designs quickly, e.g., taking just 0.5 GPU days to discover a DeepLabV3+ variant that reduces FLOPs and parameters by 10$\%$ and 20$\%$ respectively, for less than 3$\%$ increased error. We also search on an edge device called GAP8 and use its latency as the metric. FMAS is capable of finding 2.2$\times$ faster network with 7.61$\%$ MIoU loss.

artificial intelligence, gpu day, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2303.16322

Country:

North America > Canada > Quebec > Montreal (0.29)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)

Add feedback

Efficient Fine-Tuning of BERT Models on the Edge

Vucetic, Danilo, Tayaranian, Mohammadreza, Ziaeefard, Maryam, Clark, James J., Meyer, Brett H., Gross, Warren J.

arXiv.org Artificial IntelligenceMay-3-2022

Resource-constrained devices are increasingly the deployment targets of machine learning applications. Static models, however, do not always suffice for dynamic environments. On-device training of models allows for quick adaptability to new scenarios. With the increasing size of deep neural networks, as noted with the likes of BERT and other natural language processing models, comes increased resource requirements, namely memory, computation, energy, and time. Furthermore, training is far more resource intensive than inference. Resource-constrained on-device learning is thus doubly difficult, especially with large BERT-like models. By reducing the memory usage of fine-tuning, pre-trained BERT models can become efficient enough to fine-tune on resource-constrained devices. We propose Freeze And Reconfigure (FAR), a memory-efficient training regime for BERT-like models that reduces the memory usage of activation maps during fine-tuning by avoiding unnecessary parameter updates. FAR reduces fine-tuning time on the DistilBERT model and CoLA dataset by 30%, and time spent on memory operations by 47%. More broadly, reductions in metric performance on the GLUE and SQuAD datasets are around 1% on average.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ISCAS48785.2022.9937567

2205.01541

Country:

North America > United States > Minnesota (0.14)
North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Learning Recurrent Binary/Ternary Weights

Ardakani, Arash, Ji, Zhengyun, Smithson, Sean C., Meyer, Brett H., Gross, Warren J.

arXiv.org Machine LearningSep-28-2018

Recurrent neural networks (RNNs) have shown excellent performance in processing sequence data. However, they are both complex and memory intensive due to their recursive nature. These limitations make RNNs difficult to embed on mobile devices requiring real-time processes with limited hardware resources. To address the above issues, we introduce a method that can learn binary and ternary weights during the training phase to facilitate hardware implementations of RNNs. As a result, using this approach replaces all multiply-accumulate operations by simple accumulations, bringing significant benefits to custom hardware in terms of silicon area and power consumption. On the software side, we evaluate the performance (in terms of accuracy) of our method using long short-term memories (LSTMs) on various sequential models including sequence classification and language modeling. We demonstrate that our method achieves competitive results on the aforementioned tasks while using binary/ternary weights during the runtime. On the hardware side, we present custom hardware for accelerating the recurrent computations of LSTMs with binary/ternary weights. Ultimately, we show that LSTMs with binary/ternary weights can achieve up to 12x memory saving and 10x inference speedup compared to the full-precision implementation on an ASIC platform.

deep learning, neural network, ternary weight, (19 more...)

arXiv.org Machine Learning

1809.11086

Country:

Europe > Italy (0.28)
North America > United States > New York (0.14)
Asia > Middle East > Qatar (0.14)

Genre: Research Report (0.83)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback