AITopics | Kanter, David

Collaborating Authors

Kanter, David

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MLPerf Power: Benchmarking the Energy Efficiency of Machine Learning Systems from Microwatts to Megawatts for Sustainable AI

Tschand, Arya, Rajan, Arun Tejusve Raghunath, Idgunji, Sachin, Ghosh, Anirban, Holleman, Jeremy, Kiraly, Csaba, Ambalkar, Pawan, Borkar, Ritika, Chukka, Ramesh, Cockrell, Trevor, Curtis, Oliver, Fursin, Grigori, Hodak, Miro, Kassa, Hiwot, Lokhmotov, Anton, Miskovic, Dejan, Pan, Yuechao, Manmathan, Manu Prasad, Raymond, Liz, John, Tom St., Suresh, Arjun, Taubitz, Rowan, Zhan, Sean, Wasson, Scott, Kanter, David, Reddi, Vijay Janapa

arXiv.org Artificial IntelligenceOct-15-2024

Rapid adoption of machine learning (ML) technologies has led to a surge in power consumption across diverse systems, from tiny IoT devices to massive datacenter clusters. Benchmarking the energy efficiency of these systems is crucial for optimization, but presents novel challenges due to the variety of hardware platforms, workload characteristics, and system-level interactions. This paper introduces MLPerf Power, a comprehensive benchmarking methodology with capabilities to evaluate the energy efficiency of ML systems at power levels ranging from microwatts to megawatts. Developed by a consortium of industry professionals from more than 20 organizations, MLPerf Power establishes rules and best practices to ensure comparability across diverse architectures. We use representative workloads from the MLPerf benchmark suite to collect 1,841 reproducible measurements from 60 systems across the entire range of ML deployment scales. Our analysis reveals trade-offs between performance, complexity, and energy efficiency across this wide range of systems, providing actionable insights for designing optimized ML solutions from the smallest edge devices to the largest cloud infrastructures. This work emphasizes the importance of energy efficiency as a key metric in the evaluation and comparison of the ML system, laying the foundation for future research in this critical area. We discuss the implications for developing sustainable AI solutions and standardizing energy efficiency benchmarking for ML systems.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2410.12032

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry:

Law (0.93)
Information Technology > Services (0.48)
Energy > Power Industry (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.67)

Add feedback

DataPerf: Benchmarks for Data-Centric AI Development

Mazumder, Mark, Banbury, Colby, Yao, Xiaozhe, Karlaš, Bojan, Rojas, William Gaviria, Diamos, Sudnya, Diamos, Greg, He, Lynn, Parrish, Alicia, Kirk, Hannah Rose, Quaye, Jessica, Rastogi, Charvi, Kiela, Douwe, Jurado, David, Kanter, David, Mosquera, Rafael, Ciro, Juan, Aroyo, Lora, Acun, Bilge, Chen, Lingjiao, Raje, Mehul Smriti, Bartolo, Max, Eyuboglu, Sabri, Ghorbani, Amirata, Goodman, Emmett, Inel, Oana, Kane, Tariq, Kirkpatrick, Christine R., Kuo, Tzu-Sheng, Mueller, Jonas, Thrush, Tristan, Vanschoren, Joaquin, Warren, Margaret, Williams, Adina, Yeung, Serena, Ardalani, Newsha, Paritosh, Praveen, Bat-Leah, Lilith, Zhang, Ce, Zou, James, Wu, Carole-Jean, Coleman, Cody, Ng, Andrew, Mattson, Peter, Reddi, Vijay Janapa

arXiv.org Artificial IntelligenceOct-13-2023

Machine learning research has long focused on models rather than datasets, and prominent datasets are used for common ML tasks without regard to the breadth, difficulty, and faithfulness of the underlying problems. Neglecting the fundamental importance of data has given rise to inaccuracy, bias, and fragility in real-world applications, and research is hindered by saturation across existing dataset benchmarks. In response, we present DataPerf, a community-led benchmark suite for evaluating ML datasets and data-centric algorithms. We aim to foster innovation in data-centric AI through competition, comparability, and reproducibility. We enable the ML community to iterate on datasets, instead of just architectures, and we provide an open, online platform with multiple rounds of challenges to support this iterative development. The first iteration of DataPerf contains five benchmarks covering a wide spectrum of data-centric techniques, tasks, and modalities in vision, speech, acquisition, debugging, and diffusion prompting, and we support hosting new contributed benchmarks from the community. The benchmarks, online evaluation platform, and baseline implementations are open source, and the MLCommons Association will maintain DataPerf to ensure long-term benefits to academia and industry.

benchmark, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2207.10062

Country: North America > United States (0.28)

Genre: Research Report > Promising Solution (0.46)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

Add feedback

Speech Wikimedia: A 77 Language Multilingual Speech Dataset

Gómez, Rafael Mosquera, Eusse, Julián, Ciro, Juan, Galvez, Daniel, Hileman, Ryan, Bollacker, Kurt, Kanter, David

arXiv.org Artificial IntelligenceAug-29-2023

The Speech Wikimedia Dataset is a publicly available compilation of audio with transcriptions extracted from Wikimedia Commons. It includes 1780 hours (195 GB) of CC-BY-SA licensed transcribed speech from a diverse set of scenarios and speakers, in 77 different languages. Each audio file has one or more transcriptions in different languages, making this dataset suitable for training speech recognition, speech translation, and machine translation models.

artificial intelligence, language multilingual speech dataset, natural language, (3 more...)

arXiv.org Artificial Intelligence

2308.1571

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.87)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.53)

Add feedback

MedPerf: Open Benchmarking Platform for Medical Artificial Intelligence using Federated Evaluation

Karargyris, Alexandros, Umeton, Renato, Sheller, Micah J., Aristizabal, Alejandro, George, Johnu, Bala, Srini, Beutel, Daniel J., Bittorf, Victor, Chaudhari, Akshay, Chowdhury, Alexander, Coleman, Cody, Desinghu, Bala, Diamos, Gregory, Dutta, Debo, Feddema, Diane, Fursin, Grigori, Guo, Junyi, Huang, Xinyuan, Kanter, David, Kashyap, Satyananda, Lane, Nicholas, Mallick, Indranil, Mascagni, Pietro, Mehta, Virendra, Natarajan, Vivek, Nikolov, Nikola, Padoy, Nicolas, Pekhimenko, Gennady, Reddi, Vijay Janapa, Reina, G Anthony, Ribalta, Pablo, Rosenthal, Jacob, Singh, Abhishek, Thiagarajan, Jayaraman J., Wuest, Anna, Xenochristou, Maria, Xu, Daguang, Yadav, Poonam, Rosenthal, Michael, Loda, Massimo, Johnson, Jason M., Mattson, Peter

arXiv.org Artificial IntelligenceDec-28-2021

Medical AI has tremendous potential to advance healthcare by supporting the evidence-based practice of medicine, personalizing patient treatment, reducing costs, and improving provider and patient experience. We argue that unlocking this potential requires a systematic way to measure the performance of medical AI models on large-scale heterogeneous data. To meet this need, we are building MedPerf, an open framework for benchmarking machine learning in the medical domain. MedPerf will enable federated evaluation in which models are securely distributed to different facilities for evaluation, thereby empowering healthcare organizations to assess and verify the performance of AI models in an efficient and human-supervised process, while prioritizing privacy. We describe the current challenges healthcare and AI communities face, the need for an open platform, the design philosophy of MedPerf, its current implementation status, and our roadmap. We call for researchers and organizations to join us in creating the MedPerf open benchmarking platform.

federated evaluation, medical artificial intelligence, open benchmarking platform, (1 more...)

arXiv.org Artificial Intelligence

doi: 10.1038/s42256-023-00652-2

2110.01406

Genre: Research Report (0.40)

Industry: Health & Medicine (0.93)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

The People's Speech: A Large-Scale Diverse English Speech Recognition Dataset for Commercial Usage

Galvez, Daniel, Diamos, Greg, Ciro, Juan, Cerón, Juan Felipe, Achorn, Keith, Gopi, Anjali, Kanter, David, Lam, Maximilian, Mazumder, Mark, Reddi, Vijay Janapa

arXiv.org Machine LearningNov-17-2021

The People's Speech is a free-to-download 30,000-hour and growing supervised conversational English speech recognition dataset licensed for academic and commercial usage under CC-BY-SA (with a CC-BY subset). The data is collected via searching the Internet for appropriately licensed audio data with existing transcriptions. We describe our data collection methodology and release our data collection system under the Apache 2.0 license. We show that a model trained on this dataset achieves a 9.98% word error rate on Librispeech's test-clean test set. Finally, we discuss the legal and ethical issues surrounding the creation of a sizable machine learning corpora and plans for continued maintenance of the project under MLCommons's sponsorship.

artificial intelligence, machine learning, speech recognition, (15 more...)

arXiv.org Machine Learning

2111.09344

Country: North America > United States (0.46)

Genre: Research Report (0.50)

Industry: Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

MLPerf Training Benchmark

Mattson, Peter, Cheng, Christine, Coleman, Cody, Diamos, Greg, Micikevicius, Paulius, Patterson, David, Tang, Hanlin, Wei, Gu-Yeon, Bailis, Peter, Bittorf, Victor, Brooks, David, Chen, Dehao, Dutta, Debojyoti, Gupta, Udit, Hazelwood, Kim, Hock, Andrew, Huang, Xinyuan, Jia, Bill, Kang, Daniel, Kanter, David, Kumar, Naveen, Liao, Jeffery, Narayanan, Deepak, Oguntebi, Tayo, Pekhimenko, Gennady, Pentecost, Lillian, Reddi, Vijay Janapa, Robie, Taylor, John, Tom St., Wu, Carole-Jean, Xu, Lingjie, Young, Cliff, Zaharia, Matei

arXiv.org Machine LearningOct-2-2019

Machine learning is experiencing an explosion of software and hardware solutions, and needs industry-standard performance benchmarks to drive design and enable competitive evaluation. However, machine learning training presents a number of unique challenges to benchmarking that do not exist in other domains: (1) some optimizations that improve training throughput actually increase time to solution, (2) training is stochastic and time to solution has high variance, and (3) the software and hardware systems are so diverse that they cannot be fairly benchmarked with the same binary, code, or even hyperparameters. We present MLPerf, a machine learning benchmark that overcomes these challenges. We quantitatively evaluate the efficacy of MLPerf in driving community progress on performance and scalability across two rounds of results from multiple vendors.

benchmark, deep learning, it software, (22 more...)

arXiv.org Machine Learning

1910.015

Country:

North America > United States > California (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment > Games (0.68)
Information Technology > Software (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback