AITopics | Ardalani, Newsha

Collaborating Authors

Ardalani, Newsha

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Text Quality-Based Pruning for Efficient Training of Language Models

Sharma, Vasu, Padthe, Karthik, Ardalani, Newsha, Tirumala, Kushal, Howes, Russell, Xu, Hu, Huang, Po-Yao, Li, Shang-Wen, Aghajanyan, Armen, Ghosh, Gargi, Zettlemoyer, Luke

arXiv.org Artificial IntelligenceMay-10-2024

By leveraging attention in recent years due to their impressive this numerical text quality score, we demonstrate performance in various natural language processing how it can be used to prune the original dataset, (NLP) tasks (Zhang et al., 2022; Penedo et al., enabling the training of LMs using only a fraction 2023; Touvron et al., 2023; Zhou et al., 2023; Liu of the data. Our approach aims to identify et al., 2019). However, their training process often and eliminate low-quality text instances, thereby relies on computationally intensive procedures that streamlining the training process and mitigating the involve massive datasets and compute requirements burden of handling large-scale datasets. We also remove which hinders training large scale LMs on noisy potentially harmful content from the data by real-world or domain specific datasets. What's ensuring that harmful content is rated poorly by our worse is that several of these datasets are uncurated text quality score which can then be pruned. We and may contain harmful content which the observe an absolute improvement of 0.9% averaged LM model can potentially pick up during the training over 14 downstream evaluation tasks for multiple process (Deshpande et al., 2023; Schramowski LM models while using 40% lesser data and training et al., 2022; Kuchnik et al., 2023).

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2405.01582

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Decoding Data Quality via Synthetic Corruptions: Embedding-guided Pruning of Code Data

Yang, Yu, Singh, Aaditya K., Elhoushi, Mostafa, Mahmoud, Anas, Tirumala, Kushal, Gloeckle, Fabian, Rozière, Baptiste, Wu, Carole-Jean, Morcos, Ari S., Ardalani, Newsha

arXiv.org Artificial IntelligenceDec-4-2023

Code datasets, often collected from diverse and uncontrolled sources such as GitHub, potentially suffer from quality issues, thereby affecting the performance and training efficiency of Large Language Models (LLMs) optimized for code generation. Previous studies demonstrated the benefit of using embedding spaces for data pruning, but they mainly focused on duplicate removal or increasing variety, and in other modalities, such as images. Our work focuses on using embeddings to identify and remove "low-quality" code data. First, we explore features of "low-quality" code in embedding space, through the use of synthetic corruptions. Armed with this knowledge, we devise novel pruning metrics that operate in embedding space to identify and remove low-quality entries in the Stack dataset. We demonstrate the benefits of this synthetic corruption informed pruning (SCIP) approach on the well-established HumanEval and MBPP benchmarks, outperforming existing embedding-based methods. Importantly, we achieve up to a 3% performance improvement over no pruning, thereby showing the promise of insights from synthetic corruptions for data pruning.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2312.02418

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Data Acquisition: A New Frontier in Data-centric AI

Chen, Lingjiao, Acun, Bilge, Ardalani, Newsha, Sun, Yifan, Kang, Feiyang, Lyu, Hanrui, Kwon, Yongchan, Jia, Ruoxi, Wu, Carole-Jean, Zaharia, Matei, Zou, James

arXiv.org Artificial IntelligenceNov-22-2023

Datasets, the cornerstone of modern machine learning (ML) systems, have been increasingly sold and purchased for different ML pipelines [2]. Several data marketplaces have emerged to serve different stages of building ML-enhanced data applications. For example, NASDAQ Data Link [3] offers financial datasets cleaned and structured for model training, Amazon AWS data exchange [4] focuses on generic tabular datasets, and Databricks Marketplace [5] integrates raw datasets and ML pipelines to deliver insights. The data-as-a-service market size was more than 30 billions and is expected to double in the next five years [6]. While the data marketplaces are increasingly expanding, unfortunately, data acquisition for ML remains challenging, partially due to its ad-hoc nature: Based on discussions with real-world users, data acquirers often need to negotiate varying contracts with different data providers first, then purchase multiple datasets with different formats, and finally filtering out unnecessary data from the purchased datasets.

artificial intelligence, machine learning, provider, (16 more...)

arXiv.org Artificial Intelligence

2311.13712

Country:

North America > United States > California > Santa Clara County (0.14)
North America > United States > California > Alameda County > Berkeley (0.14)

Genre: Research Report (0.82)

Industry:

Health & Medicine (1.00)
Information Technology > Security & Privacy (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

MAD Max Beyond Single-Node: Enabling Large Machine Learning Model Acceleration on Distributed Systems

Hsia, Samuel, Golden, Alicia, Acun, Bilge, Ardalani, Newsha, DeVito, Zachary, Wei, Gu-Yeon, Brooks, David, Wu, Carole-Jean

arXiv.org Artificial IntelligenceOct-18-2023

Training and deploying large machine learning (ML) models is time-consuming and requires significant distributed computing infrastructures. Based on real-world large model training on datacenter-scale infrastructures, we show 14~32% of all GPU hours are spent on communication with no overlapping computation. To minimize the outstanding communication latency, in this work, we develop an agile performance modeling framework to guide parallelization and hardware-software co-design strategies. Using the suite of real-world large ML models on state-of-the-art GPU training hardware, we demonstrate 2.24x and 5.27x throughput improvement potential for pre-training and inference scenarios, respectively.

artificial intelligence, machine learning model acceleration, single-node, (2 more...)

arXiv.org Artificial Intelligence

2310.02784

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

DataPerf: Benchmarks for Data-Centric AI Development

Mazumder, Mark, Banbury, Colby, Yao, Xiaozhe, Karlaš, Bojan, Rojas, William Gaviria, Diamos, Sudnya, Diamos, Greg, He, Lynn, Parrish, Alicia, Kirk, Hannah Rose, Quaye, Jessica, Rastogi, Charvi, Kiela, Douwe, Jurado, David, Kanter, David, Mosquera, Rafael, Ciro, Juan, Aroyo, Lora, Acun, Bilge, Chen, Lingjiao, Raje, Mehul Smriti, Bartolo, Max, Eyuboglu, Sabri, Ghorbani, Amirata, Goodman, Emmett, Inel, Oana, Kane, Tariq, Kirkpatrick, Christine R., Kuo, Tzu-Sheng, Mueller, Jonas, Thrush, Tristan, Vanschoren, Joaquin, Warren, Margaret, Williams, Adina, Yeung, Serena, Ardalani, Newsha, Paritosh, Praveen, Bat-Leah, Lilith, Zhang, Ce, Zou, James, Wu, Carole-Jean, Coleman, Cody, Ng, Andrew, Mattson, Peter, Reddi, Vijay Janapa

arXiv.org Artificial IntelligenceOct-13-2023

Machine learning research has long focused on models rather than datasets, and prominent datasets are used for common ML tasks without regard to the breadth, difficulty, and faithfulness of the underlying problems. Neglecting the fundamental importance of data has given rise to inaccuracy, bias, and fragility in real-world applications, and research is hindered by saturation across existing dataset benchmarks. In response, we present DataPerf, a community-led benchmark suite for evaluating ML datasets and data-centric algorithms. We aim to foster innovation in data-centric AI through competition, comparability, and reproducibility. We enable the ML community to iterate on datasets, instead of just architectures, and we provide an open, online platform with multiple rounds of challenges to support this iterative development. The first iteration of DataPerf contains five benchmarks covering a wide spectrum of data-centric techniques, tasks, and modalities in vision, speech, acquisition, debugging, and diffusion prompting, and we support hosting new contributed benchmarks from the community. The benchmarks, online evaluation platform, and baseline implementations are open source, and the MLCommons Association will maintain DataPerf to ensure long-term benefits to academia and industry.

benchmark, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2207.10062

Country: North America > United States (0.28)

Genre: Research Report > Promising Solution (0.46)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

Add feedback

Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference

Huang, Haiyang, Ardalani, Newsha, Sun, Anna, Ke, Liu, Lee, Hsien-Hsin S., Sridhar, Anjali, Bhosale, Shruti, Wu, Carole-Jean, Lee, Benjamin

arXiv.org Artificial IntelligenceJun-17-2023

Mixture-of-Experts (MoE) models have gained popularity in achieving state-of-the-art performance in a wide range of tasks in computer vision and natural language processing. They effectively expand the model capacity while incurring a minimal increase in computation cost during training. However, deploying such models for inference is difficult due to their large size and complex communication pattern. In this work, we provide a characterization of two MoE workloads, namely Language Modeling (LM) and Machine Translation (MT) and identify their sources of inefficiencies at deployment. We propose three optimization techniques to mitigate sources of inefficiencies, namely (1) Dynamic gating, (2) Expert Buffering, and (3) Expert load balancing. We show that dynamic gating improves maximum throughput by 6.21-11.23$\times$ for LM, 5.75-10.98$\times$ for MT Encoder and 2.58-5.71$\times$ for MT Decoder. It also reduces memory usage by up to 1.36$\times$ for LM and up to 1.1$\times$ for MT. We further propose Expert Buffering, a new caching mechanism that only keeps hot, active experts in GPU memory while buffering the rest in CPU memory. This reduces static memory allocation by up to 1.47$\times$. We finally propose a load balancing methodology that provides additional scalability to the workload.

artificial intelligence, arxiv preprint arxiv, natural language, (14 more...)

arXiv.org Artificial Intelligence

2303.06182

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Industry: Energy (0.55)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

MP-Rec: Hardware-Software Co-Design to Enable Multi-Path Recommendation

Hsia, Samuel, Gupta, Udit, Acun, Bilge, Ardalani, Newsha, Zhong, Pan, Wei, Gu-Yeon, Brooks, David, Wu, Carole-Jean

arXiv.org Artificial IntelligenceFeb-21-2023

Deep learning recommendation systems serve personalized content under diverse tail-latency targets and input-query loads. In order to do so, state-of-the-art recommendation models rely on terabyte-scale embedding tables to learn user preferences over large bodies of contents. The reliance on a fixed embedding representation of embedding tables not only imposes significant memory capacity and bandwidth requirements but also limits the scope of compatible system solutions. This paper challenges the assumption of fixed embedding representations by showing how synergies between embedding representations and hardware platforms can lead to improvements in both algorithmic- and system performance. Based on our characterization of various embedding representations, we propose a hybrid embedding representation that achieves higher quality embeddings at the cost of increased memory and compute requirements. To address the system performance challenges of the hybrid representation, we propose MP-Rec -- a co-design technique that exploits heterogeneity and dynamic selection of embedding representations and underlying hardware platforms. On real system hardware, we demonstrate how matching custom accelerators, i.e., GPUs, TPUs, and IPUs, with compatible embedding representations can lead to 16.65x performance speedup. Additionally, in query-serving scenarios, MP-Rec achieves 2.49x and 3.76x higher correct prediction throughput and 0.19% and 0.22% better model quality on a CPU-GPU system for the Kaggle and Terabyte datasets, respectively.

artificial intelligence, machine learning, representation, (16 more...)

arXiv.org Artificial Intelligence

2302.10872

Country: North America > United States (0.30)

Genre: Research Report (0.64)

Industry: Information Technology > Services (0.93)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Sustainable AI: Environmental Implications, Challenges and Opportunities

Wu, Carole-Jean, Raghavendra, Ramya, Gupta, Udit, Acun, Bilge, Ardalani, Newsha, Maeng, Kiwan, Chang, Gloria, Behram, Fiona Aga, Huang, James, Bai, Charles, Gschwind, Michael, Gupta, Anurag, Ott, Myle, Melnikov, Anastasia, Candido, Salvatore, Brooks, David, Chauhan, Geeta, Lee, Benjamin, Lee, Hsien-Hsin S., Akyildiz, Bugra, Balandat, Maximilian, Spisak, Joe, Jain, Ravi, Rabbat, Mike, Hazelwood, Kim

arXiv.org Artificial IntelligenceJan-9-2022

This paper explores the environmental impact of the super-linear growth trends for AI from a holistic perspective, spanning Data, Algorithms, and System Hardware. We characterize the carbon footprint of AI computing by examining the model development cycle across industry-scale machine learning use cases and, at the same time, considering the life cycle of system hardware. Taking a step further, we capture the operational and manufacturing carbon footprint of AI computing and present an end-to-end analysis for what and how hardware-software design and at-scale optimization can help reduce the overall carbon footprint of AI. Based on the industry experience and lessons learned, we share the key challenges and chart out important development directions across the many dimensions of AI. We hope the key messages and insights presented in this paper can inspire the community to advance the field of AI in an environmentally-responsible manner.

information technology services, machine learning, natural language, (23 more...)

arXiv.org Artificial Intelligence

2111.00364

Country: North America > United States (0.46)

Genre: Research Report (0.82)

Industry:

Information Technology > Services (1.00)
Energy (1.00)
Health & Medicine (0.68)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

Deep Learning Scaling is Predictable, Empirically

Hestness, Joel, Narang, Sharan, Ardalani, Newsha, Diamos, Gregory, Jun, Heewoo, Kianinejad, Hassan, Patwary, Md. Mostofa Ali, Yang, Yang, Zhou, Yanqi

arXiv.org Machine LearningDec-1-2017

Deep learning (DL) creates impactful advances following a virtuous recipe: model architecture search, creating large training data sets, and scaling computation. It is widely believed that growing training sets and models should improve accuracy and result in better products. As DL application domains grow, we would like a deeper understanding of the relationships between training set size, computational scale, and model accuracy improvements to advance the state-of-the-art. This paper presents a large scale empirical characterization of generalization error and model size growth as training sets grow. We introduce a methodology for this measurement and test four machine learning domains: machine translation, language modeling, image processing, and speech recognition. Our empirical results show power-law generalization error scaling across a breadth of factors, resulting in power-law exponents---the "steepness" of the learning curve---yet to be explained by theoretical work. Further, model improvements only shift the error but do not appear to affect the power-law exponent. We also show that model size scales sublinearly with data size. These scaling relationships have significant implications on deep learning research, practice, and systems. They can assist model debugging, setting accuracy targets, and decisions about data set growth. They can also guide computing system design and underscore the importance of continued computational scaling.

deep learning, generalization error, neural network, (18 more...)

arXiv.org Machine Learning

1712.00409

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback