AITopics | Li, Cheng

Plotting

Li, Cheng

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing

Li, Conglong, Yao, Zhewei, Wu, Xiaoxia, Zhang, Minjia, Holmes, Connor, Li, Cheng, He, Yuxiong

arXiv.org Artificial IntelligenceFeb-7-2023

Recent advances on deep learning models come at the price of formidable training cost. The increasing model size is one of the root causes, but another less-emphasized fact is that data scale is actually increasing at a similar speed as model scale, and the training cost is proportional to both of them. Compared to the rapidly evolving model architecture, how to efficiently use the training data (especially for the expensive foundation model pretraining) is both less explored and difficult to realize due to the lack of a convenient framework that focus on data efficiency capabilities. To this end, we present DeepSpeed Data Efficiency, a framework that makes better use of data, increases training efficiency, and improves model quality. Specifically, we propose and combine two novel data efficiency techniques: efficient data sampling via a general curriculum learning library, and efficient data routing via a novel random layerwise token dropping technique. DeepSpeed Data Efficiency also takes extensibility, flexibility and composability into consideration, so that users can easily utilize the framework to compose multiple techniques and apply customized strategies. By applying our solution to GPT-3 1.3B and BERT-large language model pretraining, we can achieve similar model quality with up to 2x less data and 2x less time, or achieve better model quality under similar amount of data and time.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2212.03597

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SuperScaler: Supporting Flexible DNN Parallelization via a Unified Abstraction

Lin, Zhiqi, Miao, Youshan, Liu, Guodong, Shi, Xiaoxiang, Zhang, Quanlu, Yang, Fan, Maleki, Saeed, Zhu, Yi, Cao, Xu, Li, Cheng, Yang, Mao, Zhang, Lintao, Zhou, Lidong

arXiv.org Artificial IntelligenceJan-21-2023

With the growing model size, deep neural networks (DNN) are increasingly trained over massive GPU accelerators, which demands a proper parallelization plan that transforms a DNN model into fine-grained tasks and then schedules them to GPUs for execution. Due to the large search space, the contemporary parallelization plan generators often rely on empirical rules that couple transformation and scheduling, and fall short in exploring more flexible schedules that yield better memory usage and compute efficiency. This tension can be exacerbated by the emerging models with increasing complexity in their structure and model size. SuperScaler is a system that facilitates the design and generation of highly flexible parallelization plans. It formulates the plan design and generation into three sequential phases explicitly: model transformation, space-time scheduling, and data dependency preserving. Such a principled approach decouples multiple seemingly intertwined factors and enables the composition of highly flexible parallelization plans. As a result, SuperScaler can not only generate empirical parallelization plans, but also construct new plans that achieve up to 3.5X speedup compared to state-of-the-art solutions like DeepSpeed, Megatron and Alpa, for emerging DNN models like Swin-Transformer and AlphaFold2, as well as well-optimized models like GPT-3.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2301.08984

Country: Europe (0.28)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers

Yao, Zhewei, Wu, Xiaoxia, Li, Conglong, Holmes, Connor, Zhang, Minjia, Li, Cheng, He, Yuxiong

arXiv.org Artificial IntelligenceNov-17-2022

Large-scale transformer models have become the de-facto architectures for various machine learning applications, e.g., CV and NLP. However, those large models also introduce prohibitive training costs. To mitigate this issue, we propose a novel random and layerwise token dropping method (random-LTD), which skips the computation of a subset of the input tokens at all middle layers. Particularly, random-LTD achieves considerable speedups and comparable accuracy as the standard training baseline. Compared to other token dropping methods, random-LTD does not require (1) any importance score-based metrics, (2) any special token treatment (e.g., [CLS]), and (3) many layers in full sequence length training except the first and the last layers. Besides, a new LayerToken learning rate schedule is proposed for pretraining problems that resolve the heavy tuning requirement for our proposed training mechanism. Finally, we demonstrate that random-LTD can be applied to broader applications, including GPT and BERT pretraining as well as ViT and GPT finetuning tasks. Our results show that random-LTD can save about 33.3% theoretical compute cost and 25.6% wall-clock training time while achieving similar zero-shot evaluations on GPT-31.3B as compared to baseline.

arxiv preprint arxiv, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2211.11586

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)

Add feedback

Uncertainty-Aware Multi-Parametric Magnetic Resonance Image Information Fusion for 3D Object Segmentation

Li, Cheng, Osman, Yousuf Babiker M., Huang, Weijian, Xue, Zhenzhen, Han, Hua, Zheng, Hairong, Wang, Shanshan

arXiv.org Artificial IntelligenceNov-16-2022

Multi-parametric magnetic resonance (MR) imaging is an indispensable tool in the clinic. Consequently, automatic volume-of-interest segmentation based on multi-parametric MR imaging is crucial for computer-aided disease diagnosis, treatment planning, and prognosis monitoring. Despite the extensive studies conducted in deep learning-based medical image analysis, further investigations are still required to effectively exploit the information provided by different imaging parameters. How to fuse the information is a key question in this field. Here, we propose an uncertainty-aware multi-parametric MR image feature fusion method to fully exploit the information for enhanced 3D image segmentation. Uncertainties in the independent predictions of individual modalities are utilized to guide the fusion of multi-modal image features. Extensive experiments on two datasets, one for brain tissue segmentation and the other for abdominal multi-organ segmentation, have been conducted, and our proposed method achieves better segmentation performance when compared to existing models.

artificial intelligence, machine learning, segmentation, (15 more...)

arXiv.org Artificial Intelligence

2211.08783

Country: Asia > China > Guangdong Province (0.16)

Genre: Research Report (0.83)

Industry:

Health & Medicine > Therapeutic Area (0.51)
Health & Medicine > Diagnostic Medicine > Imaging (0.50)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.73)

Add feedback

PARCEL: Physics-based unsupervised contrastive representation learning for parallel MR imaging

Wang, Shanshan, Wu, Ruoyou, Li, Cheng, Zou, Juan, Zheng, Hairong

arXiv.org Artificial IntelligenceFeb-3-2022

With the successful application of deep learning in magnetic resonance imaging, parallel imaging techniques based on neural networks have attracted wide attentions. However, without high-quality fully sampled datasets for training, the performance of these methods tends to be limited. To address this issue, this paper proposes a physics based unsupervised contrastive representation learning (PARCEL) method to speed up parallel MR imaging. Specifically, PARCEL has three key ingredients to achieve direct deep learning from the undersampled k-space data. Namely, a parallel framework has been developed by learning two branches of model-based networks unrolled with the conjugate gradient algorithm; Augmented undersampled k-space data randomly drawn from the obtained k-space data are used to help the parallel network to capture the detailed information. A specially designed co-training loss is designed to guide the two networks to capture the inherent features and representations of the-to-be-reconstructed MR image. The proposed method has been evaluated on in vivo datasets and compared to five state-of-the-art methods, whose results show PARCEL is able to learn useful representations for more accurate MR reconstructions without the reliance on the fully-sampled datasets.

artificial intelligence, diagnostic medicine, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2202.01494

Country:

Asia > China > Guangdong Province (0.14)
North America > United States > Utah (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Classification Trees for Imbalanced and Sparse Data: Surface-to-Volume Regularization

Zhu, Yichen, Li, Cheng, Dunson, David B.

arXiv.org Machine LearningJun-14-2021

Classification algorithms face difficulties when one or more classes have limited training data. We are particularly interested in classification trees, due to their interpretability and flexibility. When data are limited in one or more of the classes, the estimated decision boundaries are often irregularly shaped due to the limited sample size, leading to poor generalization error. We propose a novel approach that penalizes the Surface-to-Volume Ratio (SVR) of the decision set, obtaining a new class of SVR-Tree algorithms. We develop a simple and computationally efficient implementation while proving estimation consistency for SVR-Tree and rate of convergence for an idealized empirical risk minimizer of SVR-Tree. SVR-Tree is compared with multiple algorithms that are designed to deal with imbalance through real data applications.

artificial intelligence, machine learning, node, (17 more...)

arXiv.org Machine Learning

2004.12293

Country:

Asia > Singapore (0.28)
North America > United States (0.28)

Genre: Research Report > Promising Solution (0.34)

Industry: Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Supervised Learning in the Presence of Noise: Application in ICD-10 Code Classification

Kim, Youngwoo, Li, Cheng, Ye, Bingyang, Tahmasebi, Amir, Aslam, Javed

arXiv.org Artificial IntelligenceMar-13-2021

ICD coding is the international standard for capturing and reporting health conditions and diagnosis for revenue cycle management in healthcare. Manually assigning ICD codes is prone to human error due to the large code vocabulary and the similarities between codes. Since machine learning based approaches require ground truth training data, the inconsistency among human coders is manifested as noise in labeling, which makes the training and evaluation of ICD classifiers difficult in presence of such noise. This paper investigates the characteristics of such noise in manually-assigned ICD-10 codes and furthermore, proposes a method to train robust ICD-10 classifiers in the presence of labeling noise. Our research concluded that the nature of such noise is systematic. Most of the existing methods for handling label noise assume that the noise is completely random and independent of features or labels, which is not the case for ICD data. Therefore, we develop a new method for training robust classifiers in the presence of systematic noise. We first identify ICD-10 codes that human coders tend to misuse or confuse, based on the codes' locations in the ICD-10 hierarchy, the types of the codes, and baseline classifier's prediction behaviors; we then develop a novel training strategy that accounts for such noise. We compared our method with the baseline that does not handle label noise and the baseline methods that assume random noise, and demonstrated that our proposed method outperforms all baselines when evaluated on expert validated labels.

deep learning, neural network, noise, (21 more...)

arXiv.org Artificial Intelligence

2103.07808

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine > Health Care Providers & Services (0.50)
Health & Medicine > Consumer Health (0.48)
Health & Medicine > Therapeutic Area (0.48)
Health & Medicine > Health Care Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.64)

Add feedback

Robustness Testing of Language Understanding in Dialog Systems

Liu, Jiexi, Takanobu, Ryuichi, Wen, Jiaxin, Wan, Dazhen, Nie, Weiran, Li, Hongyan, Li, Cheng, Peng, Wei, Huang, Minlie

arXiv.org Artificial IntelligenceDec-30-2020

Most language understanding models in dialog systems are trained on a small amount of annotated training data, and evaluated in a small set from the same distribution. However, these models can lead to system failure or undesirable outputs when being exposed to natural perturbation in practice. In this paper, we conduct comprehensive evaluation and analysis with respect to the robustness of natural language understanding models, and introduce three important aspects related to language understanding in real-world dialog systems, namely, language variety, speech characteristics, and noise perturbation. We propose a model-agnostic toolkit LAUG to approximate natural perturbation for testing the robustness issues in dialog systems. Four data augmentation approaches covering the three aspects are assembled in LAUG, which reveals critical robustness issues in state-of-the-art models. The augmented dataset through LAUG can be used to facilitate future research on the robustness testing of language understanding in dialog systems.

deep learning, neural network, robustness, (16 more...)

arXiv.org Artificial Intelligence

2012.15262

Country: Asia > China (0.28)

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Complex KBQA System using Multiple Reasoning Paths

Qin, Kechen, Wang, Yu, Li, Cheng, Gunaratna, Kalpa, Jin, Hongxia, Pavlu, Virgil, Aslam, Javed A.

arXiv.org Machine LearningMay-21-2020

Multi-hop knowledge based question answering (KBQA) is a complex task for natural language understanding. Many KBQA approaches have been proposed in recent years, and most of them are trained based on labeled reasoning path. This hinders the system's performance as many correct reasoning paths are not labeled as ground truth, and thus they cannot be learned. In this paper, we introduce an end-to-end KBQA system which can leverage multiple reasoning paths' information and only requires labeled answer as supervision. We conduct experiments on several benchmark datasets containing both single-hop simple questions as well as muti-hop complex questions, including WebQuestionSP (WQSP), ComplexWebQuestion-1.1 (CWQ), and PathQuestion-Large (PQL), and demonstrate strong performance.

artificial intelligence, inductive learning, reasoning path, (19 more...)

arXiv.org Machine Learning

2005.1097

Country:

Europe (0.93)
North America > United States > California (0.14)
North America > United States > Texas (0.14)
(2 more...)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.68)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.46)

Add feedback

DLSpec: A Deep Learning Task Exchange Specification

Dakkak, Abdul, Li, Cheng, Xiong, Jinjun, Hwu, Wen-Mei

arXiv.org Artificial IntelligenceFeb-25-2020

Deep Learning (DL) innovations are being introduced at a rapid pace. However, the current lack of standard specification of DL tasks makes sharing, running, reproducing, and comparing these innovations difficult. To address this problem, we propose DLSpec, a model-, dataset-, software-, and hardware-agnostic DL specification that captures the different aspects of DL tasks. DLSpec has been tested by specifying and running hundreds of DL tasks.

deep learning, dl task, neural network, (20 more...)

arXiv.org Artificial Intelligence

2002.11262

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback