AITopics | Peng, Yifan

Plotting

Peng, Yifan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit

Yan, Brian, Shi, Jiatong, Tang, Yun, Inaguma, Hirofumi, Peng, Yifan, Dalmia, Siddharth, Polák, Peter, Fernandes, Patrick, Berrebbi, Dan, Hayashi, Tomoki, Zhang, Xiaohui, Ni, Zhaoheng, Hira, Moto, Maiti, Soumi, Pino, Juan, Watanabe, Shinji

arXiv.org Artificial IntelligenceJul-6-2023

ESPnet-ST-v2 is a revamp of the open-source ESPnet-ST toolkit necessitated by the broadening interests of the spoken language translation community. ESPnet-ST-v2 supports 1) offline speech-to-text translation (ST), 2) simultaneous speech-to-text translation (SST), and 3) offline speech-to-speech translation (S2ST) -- each task is supported with a wide variety of approaches, differentiating ESPnet-ST-v2 from other open source spoken language translation toolkits. This toolkit offers state-of-the-art architectures such as transducers, hybrid CTC/attention, multi-decoders with searchable intermediates, time-synchronous blockwise CTC/attention, Translatotron models, and direct discrete unit models. In this paper, we describe the overall design, example models for each task, and performance benchmarking behind ESPnet-ST-v2, which is publicly available at https://github.com/espnet/espnet.

artificial intelligence, natural language, translation, (17 more...)

arXiv.org Artificial Intelligence

2304.04596

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

An empirical study of using radiology reports and images to improve ICU mortality prediction

Lin, Mingquan, Wang, Song, Ding, Ying, Zhao, Lihui, Wang, Fei, Peng, Yifan

arXiv.org Artificial IntelligenceJun-20-2023

Background: The predictive Intensive Care Unit (ICU) scoring system plays an important role in ICU management because it predicts important outcomes, especially mortality. Many scoring systems have been developed and used in the ICU. These scoring systems are primarily based on the structured clinical data in the electronic health record (EHR), which may suffer the loss of important clinical information in the narratives and images. Methods: In this work, we build a deep learning based survival prediction model with multi-modality data to predict ICU mortality. Four sets of features are investigated: (1) physiological measurements of Simplified Acute Physiology Score (SAPS) II, (2) common thorax diseases pre-defined by radiologists, (3) BERT-based text representations, and (4) chest X-ray image features. We use the Medical Information Mart for Intensive Care IV (MIMIC-IV) dataset to evaluate the proposed model. Results: Our model achieves the average C-index of 0.7829 (95% confidence interval, 0.7620-0.8038), which substantially exceeds that of the baseline with SAPS-II features (0.7470 (0.7263-0.7676)). Ablation studies further demonstrate the contributions of pre-defined labels (2.00%), text features (2.44%), and image features (2.82%).

artificial intelligence, machine learning, prediction, (17 more...)

arXiv.org Artificial Intelligence

2307.07513

Country:

Europe (0.46)
North America > United States > Texas (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Health Care Technology > Medical Record (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.94)

Add feedback

Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute

Chen, William, Chang, Xuankai, Peng, Yifan, Ni, Zhaoheng, Maiti, Soumi, Watanabe, Shinji

arXiv.org Artificial IntelligenceJun-11-2023

Self-supervised learning (SSL) has led to great strides in speech processing. However, the resources needed to train these models has become prohibitively large as they continue to scale. Currently, only a few groups with substantial resources are capable of creating SSL models, which harms reproducibility. In this work, we optimize HuBERT SSL to fit in academic constraints. We reproduce HuBERT independently from the original implementation, with no performance loss. Our code and training optimizations make SSL feasible with only 8 GPUs, instead of the 32 used in the original work. We also explore a semi-supervised route, using an ASR model to skip the first pre-training iteration. Within one iteration of pre-training, our models improve over HuBERT on several tasks. Furthermore, our HuBERT Large variant requires only 8 GPUs, achieving similar performance to the original trained on 128. As our contribution to the community, all models, configurations, and code are made open-source in ESPnet.

artificial intelligence, machine learning, proc, (17 more...)

arXiv.org Artificial Intelligence

2306.06672

Country: North America > United States (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Less Likely Brainstorming: Using Language Models to Generate Alternative Hypotheses

Tang, Liyan, Peng, Yifan, Wang, Yanshan, Ding, Ying, Durrett, Greg, Rousseau, Justin F.

arXiv.org Artificial IntelligenceMay-30-2023

A human decision-maker benefits the most from an AI assistant that corrects for their biases. For problems such as generating interpretation of a radiology report given findings, a system predicting only highly likely outcomes may be less useful, where such outcomes are already obvious to the user. To alleviate biases in human decision-making, it is worth considering a broad differential diagnosis, going beyond the most likely options. We introduce a new task, "less likely brainstorming," that asks a model to generate outputs that humans think are relevant but less likely to happen. We explore the task in two settings: a brain MRI interpretation generation setting and an everyday commonsense reasoning setting. We found that a baseline approach of training with less likely hypotheses as targets generates outputs that humans evaluate as either likely or irrelevant nearly half of the time; standard MLE training is not effective. To tackle this problem, we propose a controlled text generation method that uses a novel contrastive learning strategy to encourage models to differentiate between generating likely and less likely outputs according to humans. We compare our method with several state-of-the-art controlled text generation models via automatic and human evaluations and show that our models' capability of generating less likely outputs is improved.

artificial intelligence, expert system, natural language, (19 more...)

arXiv.org Artificial Intelligence

2305.19339

Country: North America > United States > Texas (0.14)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.88)
Health & Medicine > Diagnostic Medicine > Imaging (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.34)

Add feedback

DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models

Peng, Yifan, Sudo, Yui, Muhammad, Shakeel, Watanabe, Shinji

arXiv.org Artificial IntelligenceMay-28-2023

Self-supervised learning (SSL) has achieved notable success in many speech processing tasks, but the large model size and heavy computational cost hinder the deployment. Knowledge distillation trains a small student model to mimic the behavior of a large teacher model. However, the student architecture usually needs to be manually designed and will remain fixed during training, which requires prior knowledge and can lead to suboptimal performance. Inspired by recent success of task-specific structured pruning, we propose DPHuBERT, a novel task-agnostic compression method for speech SSL based on joint distillation and pruning. Experiments on SUPERB show that DPHuBERT outperforms pure distillation methods in almost all tasks. Moreover, DPHuBERT requires little training time and performs well with limited training data, making it suitable for resource-constrained applications. Our method can also be applied to various speech SSL models. Our code and models will be publicly available.

artificial intelligence, dphubert, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2305.17651

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report (0.64)

Industry: Education (0.37)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

SODA: A Natural Language Processing Package to Extract Social Determinants of Health for Cancer Studies

Yu, Zehao, Yang, Xi, Dang, Chong, Adekkanattu, Prakash, Patra, Braja Gopal, Peng, Yifan, Pathak, Jyotishman, Wilson, Debbie L., Chang, Ching-Yuan, Lo-Ciganic, Wei-Hsuan, George, Thomas J., Hogan, William R., Guo, Yi, Bian, Jiang, Wu, Yonghui

arXiv.org Artificial IntelligenceMay-18-2023

Objective: We aim to develop an open-source natural language processing (NLP) package, SODA (i.e., SOcial DeterminAnts), with pre-trained transformer models to extract social determinants of health (SDoH) for cancer patients, examine the generalizability of SODA to a new disease domain (i.e., opioid use), and evaluate the extraction rate of SDoH using cancer populations. Methods: We identified SDoH categories and attributes and developed an SDoH corpus using clinical notes from a general cancer cohort. We compared four transformer-based NLP models to extract SDoH, examined the generalizability of NLP models to a cohort of patients prescribed with opioids, and explored customization strategies to improve performance. We applied the best NLP model to extract 19 categories of SDoH from the breast (n=7,971), lung (n=11,804), and colorectal cancer (n=6,240) cohorts. Results and Conclusion: We developed a corpus of 629 cancer patients notes with annotations of 13,193 SDoH concepts/attributes from 19 categories of SDoH. The Bidirectional Encoder Representations from Transformers (BERT) model achieved the best strict/lenient F1 scores of 0.9216 and 0.9441 for SDoH concept extraction, 0.9617 and 0.9626 for linking attributes to SDoH concepts. Fine-tuning the NLP models using new annotations from opioid use patients improved the strict/lenient F1 scores from 0.8172/0.8502 to 0.8312/0.8679. The extraction rates among 19 categories of SDoH varied greatly, where 10 SDoH could be extracted from >70% of cancer patients, but 9 SDoH had a low extraction rate (<70% of cancer patients). The SODA package with pre-trained transformer models is publicly available at https://github.com/uf-hobiinformatics-lab/SDoH_SODA.

machine learning, natural language, sdoh, (15 more...)

arXiv.org Artificial Intelligence

2212.03

Country: North America > United States > Florida > Alachua County > Gainesville (0.28)

Genre: Research Report > Observational Study (0.34)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Addiction Disorder (1.00)
Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks

Peng, Yifan, Kim, Kwangyoun, Wu, Felix, Yan, Brian, Arora, Siddhant, Chen, William, Tang, Jiyang, Shon, Suwon, Sridhar, Prashant, Watanabe, Shinji

arXiv.org Artificial IntelligenceMay-18-2023

Conformer, a convolution-augmented Transformer variant, has become the de facto encoder architecture for speech processing due to its superior performance in various tasks, including automatic speech recognition (ASR), speech translation (ST) and spoken language understanding (SLU). Recently, a new encoder called E-Branchformer has outperformed Conformer in the LibriSpeech ASR benchmark, making it promising for more general speech applications. This work compares E-Branchformer and Conformer through extensive experiments using different types of end-to-end sequence-to-sequence models. Results demonstrate that E-Branchformer achieves comparable or better performance than Conformer in almost all evaluation sets across 15 ASR, 2 ST, and 3 SLU benchmarks, while being more stable during training. We will release our training configurations and pre-trained models for reproducibility, which can benefit the speech community.

artificial intelligence, proc, speech recognition, (13 more...)

arXiv.org Artificial Intelligence

2305.11073

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)

Add feedback

The Pipeline System of ASR and NLU with MLM-based Data Augmentation toward STOP Low-resource Challenge

Futami, Hayato, Huynh, Jessica, Arora, Siddhant, Wu, Shih-Lun, Kashiwagi, Yosuke, Peng, Yifan, Yan, Brian, Tsunoo, Emiru, Watanabe, Shinji

arXiv.org Artificial IntelligenceMay-11-2023

This paper describes our system for the low-resource domain adaptation track (Track 3) in Spoken Language Understanding Grand Challenge, which is a part of ICASSP Signal Processing Grand Challenge 2023. In the track, we adopt a pipeline approach of ASR and NLU. For ASR, we fine-tune Whisper for each domain with upsampling. For NLU, we fine-tune BART on all the Track3 data and then on low-resource domain data. We apply masked LM (MLM) -based data augmentation, where some of input tokens and corresponding target labels are replaced using MLM. We also apply a retrieval-based approach, where model input is augmented with similar training samples. As a result, we achieved exact match (EM) accuracy 63.3/75.0 (average: 69.15) for reminder/weather domain, and won the 1st place at the challenge.

artificial intelligence, data augmentation, natural language, (16 more...)

arXiv.org Artificial Intelligence

2305.01194

Country: North America > United States (0.15)

Genre: Research Report (0.40)

Industry: Energy > Oil & Gas > Midstream (0.41)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.51)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.35)

Add feedback

A Study on the Integration of Pipeline and E2E SLU systems for Spoken Semantic Parsing toward STOP Quality Challenge

Arora, Siddhant, Futami, Hayato, Wu, Shih-Lun, Huynh, Jessica, Peng, Yifan, Kashiwagi, Yosuke, Tsunoo, Emiru, Yan, Brian, Watanabe, Shinji

arXiv.org Artificial IntelligenceMay-6-2023

Recently there have been efforts to introduce new benchmark tasks for spoken language understanding (SLU), like semantic parsing. In this paper, we describe our proposed spoken semantic parsing system for the quality track (Track 1) in Spoken Language Understanding Grand Challenge which is part of ICASSP Signal Processing Grand Challenge 2023. We experiment with both end-to-end and pipeline systems for this task. Strong automatic speech recognition (ASR) models like Whisper and pretrained Language models (LM) like BART are utilized inside our SLU framework to boost performance. We also investigate the output level combination of various models to get an exact match accuracy of 80.8, which won the 1st place at the challenge.

artificial intelligence, natural language, slu model, (17 more...)

arXiv.org Artificial Intelligence

2305.0162

Genre: Research Report (0.65)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback

Generative Modeling via Hierarchical Tensor Sketching

Peng, Yifan, Chen, Yian, Stoudenmire, E. Miles, Khoo, Yuehaw

arXiv.org Artificial IntelligenceApr-11-2023

We propose a hierarchical tensor-network approach for approximating high-dimensional probability density via empirical distribution. This leverages randomized singular value decomposition (SVD) techniques and involves solving linear equations for tensor cores in this tensor network. The complexity of the resulting algorithm scales linearly in the dimension of the high-dimensional density. An analysis of estimation error demonstrates the effectiveness of this method through several numerical experiments.

artificial intelligence, ising model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2304.05305

Country: North America > United States (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback