AITopics | Maboud, Yassaman Ebrahimzadeh

Plotting

Maboud, Yassaman Ebrahimzadeh

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Heterogeneous Acceleration Pipeline for Recommendation System Training

Adnan, Muhammad, Maboud, Yassaman Ebrahimzadeh, Mahajan, Divya, Nair, Prashant J.

arXiv.org Artificial IntelligenceApr-28-2024

Recommendation models rely on deep learning networks and large embedding tables, resulting in computationally and memory-intensive processes. These models are typically trained using hybrid CPU-GPU or GPU-only configurations. The hybrid mode combines the GPU's neural network acceleration with the CPUs' memory storage and supply for embedding tables but may incur significant CPU-to-GPU transfer time. In contrast, the GPU-only mode utilizes High Bandwidth Memory (HBM) across multiple GPUs for storing embedding tables. However, this approach is expensive and presents scaling concerns. This paper introduces Hotline, a heterogeneous acceleration pipeline that addresses these concerns. Hotline develops a data-aware and model-aware scheduling pipeline by leveraging the insight that only a few embedding entries are frequently accessed (popular). This approach utilizes CPU main memory for non-popular embeddings and GPUs' HBM for popular embeddings. To achieve this, Hotline accelerator fragments a mini-batch into popular and non-popular micro-batches. It gathers the necessary working parameters for non-popular micro-batches from the CPU, while GPUs execute popular micro-batches. The hardware accelerator dynamically coordinates the execution of popular embeddings on GPUs and non-popular embeddings from the CPU's main memory. Real-world datasets and models confirm Hotline's effectiveness, reducing average end-to-end training time by 2.2x compared to Intel-optimized CPU-GPU DLRM baseline.

artificial intelligence, hotline, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2204.05436

Country:

North America > Canada (0.28)
North America > United States > New York (0.14)

Genre: Research Report (0.50)

Industry: Information Technology (0.69)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Accelerating Recommender Model Training by Dynamically Skipping Stale Embeddings

Maboud, Yassaman Ebrahimzadeh, Adnan, Muhammad, Mahajan, Divya, Nair, Prashant J.

arXiv.org Artificial IntelligenceMar-21-2024

Training recommendation models pose significant challenges regarding resource utilization and performance. Prior research has proposed an approach that categorizes embeddings into popular and non-popular classes to reduce the training time for recommendation models. We observe that, even among the popular embeddings, certain embeddings undergo rapid training and exhibit minimal subsequent variation, resulting in saturation. Consequently, updates to these embeddings lack any contribution to model quality. This paper presents Slipstream, a software framework that identifies stale embeddings on the fly and skips their updates to enhance performance. This capability enables Slipstream to achieve substantial speedup, optimize CPU-GPU bandwidth usage, and eliminate unnecessary memory access. SlipStream showcases training time reductions of 2x, 2.4x, 1.2x, and 1.175x across real-world datasets and configurations, compared to Baseline XDL, Intel-optimized DRLM, FAE, and Hotline, respectively.

artificial intelligence, machine learning, slipstream, (17 more...)

arXiv.org Artificial Intelligence

2404.0427

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)

Genre: Research Report (0.64)

Industry: Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.94)
Information Technology > Hardware (0.93)
Information Technology > Communications (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

High-Performance Training by Exploiting Hot-Embeddings in Recommendation Systems

Adnan, Muhammad, Maboud, Yassaman Ebrahimzadeh, Mahajan, Divya, Nair, Prashant J.

arXiv.org Artificial IntelligenceMar-2-2021

Recommendation models are commonly used learning models that suggest relevant items to a user for e-commerce and online advertisement-based applications. Current recommendation models include deep-learning-based (DLRM) and time-based sequence (TBSM) models. These models use massive embedding tables to store a numerical representation of item's and user's categorical variables (memory-bound) while also using neural networks to generate outputs (compute-bound). Due to these conflicting compute and memory requirements, the training process for recommendation models is divided across CPU and GPU for embedding and neural network executions, respectively. Such a training process naively assigns the same level of importance to each embedding entry. This paper observes that some training inputs and their accesses into the embedding tables are heavily skewed with certain entries being accessed up to 10000x more. This paper tries to leverage skewed embedded table accesses to efficiently use the GPU resources during training. To this end, this paper proposes a Frequently Accessed Embeddings (FAE) framework that exposes a dynamic knob to the software based on the GPU memory capacity and the input popularity index. This framework efficiently estimates and varies the size of the hot portions of the embedding tables within GPUs and reallocates the rest of the embeddings on the CPU. Overall, our framework speeds-up the training of the recommendation models on Kaggle, Terabyte, and Alibaba datasets by 2.34x as compared to a baseline that uses Intel-Xeon CPUs and Nvidia Tesla-V100 GPUs, while maintaining accuracy.

dataset, deep learning, neural network, (20 more...)

arXiv.org Artificial Intelligence

2103.00686

Country:

North America > United States > New York (0.14)
North America > Canada > British Columbia (0.14)

Genre: Research Report (0.64)

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback