AITopics | Chung, Ho-Lam

Collaborating Authors

Chung, Ho-Lam

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Researchy Questions: A Dataset of Multi-Perspective, Decompositional Questions for LLM Web Agents

Rosset, Corby, Chung, Ho-Lam, Qin, Guanghui, Chau, Ethan C., Feng, Zhuo, Awadallah, Ahmed, Neville, Jennifer, Rao, Nikhil

arXiv.org Artificial IntelligenceFeb-27-2024

Existing question answering (QA) datasets are no longer challenging to most powerful Large Language Models (LLMs). Traditional QA benchmarks like TriviaQA, NaturalQuestions, ELI5 and HotpotQA mainly study ``known unknowns'' with clear indications of both what information is missing, and how to find it to answer the question. Hence, good performance on these benchmarks provides a false sense of security. A yet unmet need of the NLP community is a bank of non-factoid, multi-perspective questions involving a great deal of unclear information needs, i.e. ``unknown uknowns''. We claim we can find such questions in search engine logs, which is surprising because most question-intent queries are indeed factoid. We present Researchy Questions, a dataset of search engine queries tediously filtered to be non-factoid, ``decompositional'' and multi-perspective. We show that users spend a lot of ``effort'' on these questions in terms of signals like clicks and session length, and that they are also challenging for GPT-4. We also show that ``slow thinking'' answering techniques, like decomposition into sub-questions shows benefit over answering directly. We release $\sim$ 100k Researchy Questions, along with the Clueweb22 URLs that were clicked.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2402.17896

Country:

Asia (0.67)
Europe (0.67)
North America > United States > New York (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Transportation > Infrastructure & Services (1.00)
Leisure & Entertainment (1.00)
Law (1.00)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Towards General-Purpose Text-Instruction-Guided Voice Conversion

Kuan, Chun-Yi, Li, Chen An, Hsu, Tsu-Yuan, Lin, Tse-Yang, Chung, Ho-Lam, Chang, Kai-Wei, Chang, Shuo-yiin, Lee, Hung-yi

arXiv.org Artificial IntelligenceJan-16-2024

This paper introduces a novel voice conversion (VC) model, guided by text instructions such as "articulate slowly with a deep tone" or "speak in a cheerful boyish voice". Unlike traditional methods that rely on reference utterances to determine the attributes of the converted speech, our model adds versatility and specificity to voice conversion. The proposed VC model is a neural codec language model which processes a sequence of discrete codes, resulting in the code sequence of converted speech. It utilizes text instructions as style prompts to modify the prosody and emotional information of the given speech. In contrast to previous approaches, which often rely on employing separate encoders like prosody and content encoders to handle different aspects of the source speech, our model handles various information of speech in an end-to-end manner. Experiments have demonstrated the impressive capabilities of our model in comprehending instructions and delivering reasonable results.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2309.14324

Country:

Asia (0.28)
North America > United States (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.68)

Add feedback

GSQA: An End-to-End Model for Generative Spoken Question Answering

Shih, Min-Han, Chung, Ho-Lam, Pai, Yu-Chi, Hsu, Ming-Hao, Lin, Guan-Ting, Li, Shang-Wen, Lee, Hung-yi

arXiv.org Artificial IntelligenceDec-25-2023

In recent advancements in spoken question answering (QA), end-to-end models have made significant strides. However, previous research has primarily focused on extractive span selection. While this extractive-based approach is effective when answers are present directly within the input, it falls short in addressing abstractive questions, where answers are not directly extracted but inferred from the given information. To bridge this gap, we introduce the first end-to-end Generative Spoken Question Answering (GSQA) model that empowers the system to engage in abstractive reasoning. The challenge in training our GSQA model lies in the absence of a spoken abstractive QA dataset. We propose using text models for initialization and leveraging the extractive QA dataset to transfer knowledge from the text generative model to the spoken generative model. Experimental results indicate that our model surpasses the previous extractive model by 3% on extractive QA datasets. Furthermore, the GSQA model has only been fine-tuned on the spoken extractive QA dataset. Despite not having seen any spoken abstractive QA data, it can still closely match the performance of the cascade model. In conclusion, our GSQA model shows the potential to generalize to a broad spectrum of questions, thus further expanding the spoken question answering capabilities of abstractive QA. Our code is available at https://voidful.github.io/GSQA

artificial intelligence, natural language, question answering, (18 more...)

arXiv.org Artificial Intelligence

2312.09781

Country:

Europe (0.68)
North America > United States > Texas (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)

Add feedback

ML-SUPERB: Multilingual Speech Universal PERformance Benchmark

Shi, Jiatong, Berrebbi, Dan, Chen, William, Chung, Ho-Lam, Hu, En-Pei, Huang, Wei Ping, Chang, Xuankai, Li, Shang-Wen, Mohamed, Abdelrahman, Lee, Hung-yi, Watanabe, Shinji

arXiv.org Artificial IntelligenceAug-11-2023

Speech processing Universal PERformance Benchmark (SUPERB) is a leaderboard to benchmark the performance of Self-Supervised Learning (SSL) models on various speech processing tasks. However, SUPERB largely considers English speech in its evaluation. This paper presents multilingual SUPERB (ML-SUPERB), covering 143 languages (ranging from high-resource to endangered), and considering both automatic speech recognition and language identification. Following the concept of SUPERB, ML-SUPERB utilizes frozen SSL features and employs a simple framework for multilingual tasks by learning a shallow downstream model. Similar to the SUPERB benchmark, we find speech SSL models can significantly improve performance compared to FBANK features. Furthermore, we find that multilingual models do not always perform better than their monolingual counterparts. We will release ML-SUPERB as a challenge with organized datasets and reproducible training scripts for future multilingual representation research.

artificial intelligence, machine learning, proc, (13 more...)

arXiv.org Artificial Intelligence

2305.10615

Country: North America (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.89)

Add feedback

T5lephone: Bridging Speech and Text Self-supervised Models for Spoken Language Understanding via Phoneme level T5

Hsu, Chan-Jan, Chung, Ho-Lam, Lee, Hung-yi, Tsao, Yu

arXiv.org Artificial IntelligenceNov-1-2022

In Spoken language understanding (SLU), a natural solution is concatenating pre-trained speech models (e.g. HuBERT) and pretrained language models (PLM, e.g. T5). Most previous works use pretrained language models with subword-based tokenization. However, the granularity of input units affects the alignment of speech model outputs and language model inputs, and PLM with character-based tokenization is underexplored. In this work, we conduct extensive studies on how PLMs with different tokenization strategies affect spoken language understanding task including spoken question answering (SQA) and speech translation (ST). We further extend the idea to create T5lephone(pronounced as telephone), a variant of T5 that is pretrained using phonemicized text. We initialize T5lephone with existing PLMs to pretrain it using relatively lightweight computational resources. We reached state-of-the-art on NMSQA, and the T5lephone model exceeds T5 with other types of units on end-to-end SQA and ST.

artificial intelligence, natural language, sequence, (17 more...)

arXiv.org Artificial Intelligence

2211.00586

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Improving Controllability of Educational Question Generation by Keyword Provision

Chan, Ying-Hong, Chung, Ho-Lam, Fan, Yao-Chung

arXiv.org Artificial IntelligenceDec-2-2021

Question Generation (QG) receives increasing research attention in NLP community. One motivation for QG is that QG significantly facilitates the preparation of educational reading practice and assessments. While the significant advancement of QG techniques was reported, current QG results are not ideal for educational reading practice assessment in terms of \textit{controllability} and \textit{question difficulty}. This paper reports our results toward the two issues. First, we report a state-of-the-art exam-like QG model by advancing the current best model from 11.96 to 20.19 (in terms of BLEU 4 score). Second, we propose to investigate a variant of QG setting by allowing users to provide keywords for guiding QG direction. We also present a simple but effective model toward the QG controllability task. Experiments are also performed and the results demonstrate the feasibility and potentials of improving QG diversity and controllability by the proposed keyword provision QG model.

arxiv preprint arxiv, machine learning, question answering, (16 more...)

arXiv.org Artificial Intelligence

2112.01012

Country: North America > United States (0.30)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.87)

Add feedback

A BERT-based Distractor Generation Scheme with Multi-tasking and Negative Answer Training Strategies

Chung, Ho-Lam, Chan, Ying-Hong, Fan, Yao-Chung

arXiv.org Artificial IntelligenceOct-11-2020

In this paper, we investigate the following two limitations for the existing distractor generation (DG) methods. First, the quality of the existing DG methods are still far from practical use. There is still room for DG quality improvement. Second, the existing DG designs are mainly for single distractor generation. However, for practical MCQ preparation, multiple distractors are desired. Aiming at these goals, in this paper, we present a new distractor generation scheme with multi-tasking and negative answer training strategies for effectively generating \textit{multiple} distractors. The experimental results show that (1) our model advances the state-of-the-art result from 28.65 to 39.81 (BLEU 1 score) and (2) the generated multiple distractors are diverse and show strong distracting power for multiple choice question.

deep learning, distractor, neural network, (21 more...)

arXiv.org Artificial Intelligence

2010.05384

Country: Asia (0.14)

Genre: Research Report > New Finding (0.66)

Industry: Education (0.90)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback