Lu, Yung-Hsiang
Detecting Music Performance Errors with Transformers
Chou, Benjamin Shiue-Hal, Jajal, Purvish, Eliopoulos, Nicholas John, Nadolsky, Tim, Yang, Cheng-Yun, Ravi, Nikita, Davis, James C., Yun, Kristen Yeon-Ji, Lu, Yung-Hsiang
Beginner musicians often struggle to identify specific errors in their performances, such as playing incorrect notes or rhythms. There are two limitations in existing tools for music error detection: (1) Existing approaches rely on automatic alignment; therefore, they are prone to errors caused by small deviations between alignment targets.; (2) There is a lack of sufficient data to train music error detection models, resulting in over-reliance on heuristics. To address (1), we propose a novel transformer model, Polytune, that takes audio inputs and outputs annotated music scores. This model can be trained end-to-end to implicitly align and compare performance audio with music scores through latent space representations. To address (2), we present a novel data generation technique capable of creating large-scale synthetic music error datasets. Our approach achieves a 64.1% average Error Detection F1 score, improving upon prior work by 40 percentage points across 14 instruments. Additionally, compared with existing transcription methods repurposed for music error detection, our model can handle multiple instruments. Our source code and datasets are available at https://github.com/ben2002chou/Polytune.
An Empirical Study of Pre-Trained Model Reuse in the Hugging Face Deep Learning Model Registry
Jiang, Wenxin, Synovic, Nicholas, Hyatt, Matt, Schorlemmer, Taylor R., Sethi, Rohan, Lu, Yung-Hsiang, Thiruvathukal, George K., Davis, James C.
Deep Neural Networks (DNNs) are being adopted as components in software systems. Creating and specializing DNNs from scratch has grown increasingly difficult as state-of-the-art architectures grow more complex. Following the path of traditional software engineering, machine learning engineers have begun to reuse large-scale pre-trained models (PTMs) and fine-tune these models for downstream tasks. Prior works have studied reuse practices for traditional software packages to guide software engineers towards better package maintenance and dependency management. We lack a similar foundation of knowledge to guide behaviors in pre-trained model ecosystems. In this work, we present the first empirical investigation of PTM reuse. We interviewed 12 practitioners from the most popular PTM ecosystem, Hugging Face, to learn the practices and challenges of PTM reuse. From this data, we model the decision-making process for PTM reuse. Based on the identified practices, we describe useful attributes for model reuse, including provenance, reproducibility, and portability. Three challenges for PTM reuse are missing attributes, discrepancies between claimed and actual performance, and model risks. We substantiate these identified challenges with systematic measurements in the Hugging Face ecosystem. Our work informs future directions on optimizing deep learning ecosystems by automated measuring useful attributes and potential attacks, and envision future research on infrastructure and standardization for model registries.
An Experience Report on Machine Learning Reproducibility: Guidance for Practitioners and TensorFlow Model Garden Contributors
Banna, Vishnu, Chinnakotla, Akhil, Yan, Zhengxin, Vegesana, Anirudh, Vivek, Naveen, Krishnappa, Kruthi, Jiang, Wenxin, Lu, Yung-Hsiang, Thiruvathukal, George K., Davis, James C.
Machine learning techniques are becoming a fundamental tool for scientific and engineering progress. These techniques are applied in contexts as diverse as astronomy and spam filtering. However, correctly applying these techniques requires careful engineering. Much attention has been paid to the technical potential; relatively little attention has been paid to the software engineering process required to bring research-based machine learning techniques into practical utility. Technology companies have supported the engineering community through machine learning frameworks such as TensorFLow and PyTorch, but the details of how to engineer complex machine learning models in these frameworks have remained hidden. To promote best practices within the engineering community, academic institutions and Google have partnered to launch a Special Interest Group on Machine Learning Models (SIGMODELS) whose goal is to develop exemplary implementations of prominent machine learning models in community locations such as the TensorFlow Model Garden (TFMG). The purpose of this report is to define a process for reproducing a state-of-the-art machine learning model at a level of quality suitable for inclusion in the TFMG. We define the engineering process and elaborate on each step, from paper analysis to model release. We report on our experiences implementing the YOLO model family with a team of 26 student researchers, share the tools we developed, and describe the lessons we learned along the way.
Low-Power Computer Vision: Status, Challenges, Opportunities
Alyamkin, Sergei, Ardi, Matthew, Berg, Alexander C., Brighton, Achille, Chen, Bo, Chen, Yiran, Cheng, Hsin-Pai, Fan, Zichen, Feng, Chen, Fu, Bo, Gauen, Kent, Goel, Abhinav, Goncharenko, Alexander, Guo, Xuyang, Ha, Soonhoi, Howard, Andrew, Hu, Xiao, Huang, Yuanjun, Kang, Donghyun, Kim, Jaeyoun, Ko, Jong Gook, Kondratyev, Alexander, Lee, Junhyeok, Lee, Seungjae, Lee, Suwoong, Li, Zichao, Liang, Zhiyu, Liu, Juzheng, Liu, Xin, Lu, Yang, Lu, Yung-Hsiang, Malik, Deeptanshu, Nguyen, Hong Hanh, Park, Eunbyung, Repin, Denis, Shen, Liang, Sheng, Tao, Sun, Fei, Svitov, David, Thiruvathukal, George K., Zhang, Baiwu, Zhang, Jingchi, Zhang, Xiaopeng, Zhuo, Shaojie
Computer vision has achieved impressive progress in recent years. Meanwhile, mobile phones have become the primary computing platforms for millions of people. In addition to mobile phones, many autonomous systems rely on visual data for making decisions and some of these systems have limited energy (such as unmanned aerial vehicles also called drones and mobile robots). These systems rely on batteries and energy efficiency is critical. This article serves two main purposes: (1) Examine the state-of-the-art for low-power solutions to detect objects in images. Since 2015, the IEEE Annual International Low-Power Image Recognition Challenge (LPIRC) has been held to identify the most energy-efficient computer vision solutions. This article summarizes 2018 winners' solutions. (2) Suggest directions for research as well as opportunities for low-power computer vision.
Low-Power Image Recognition Challenge
Lu, Yung-Hsiang (Purdue University) | Berg, Alexander C. (University of North Carolina at Chapel Hill) | Chen, Yiran (Duke University)
Energy is limited in mobile systems, however, so for this possibility to become a viable opportunity, energy usage must be conservative. The Low-Power Image Recognition Challenge (LPIRC) is the only competition integrating image recognition with low power. LPIRC has been held annually since 2015 as an on-site competition. To encourage innovation, LPIRC has no restriction on hardware or software platforms: the only requirement is that a solution be able to use HTTP to communicate with the referee system to retrieve images and report answers. Each team has 10 minutes to recognize the objects in 5,000 (year 2015) or 20,000 (years 2016 and 2017) images.