AITopics | Huang, Chieh-Yang

Collaborating Authors

Huang, Chieh-Yang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Do Large Multimodal Models Solve Caption Generation for Scientific Figures? Lessons Learned from SciCap Challenge 2023

Hsu, Ting-Yao E., Hsu, Yi-Li, Rohatgi, Shaurya, Huang, Chieh-Yang, Ng, Ho Yin Sam, Rossi, Ryan, Kim, Sungchul, Yu, Tong, Ku, Lun-Wei, Giles, C. Lee, Huang, Ting-Hao K.

arXiv.org Artificial IntelligenceFeb-18-2025

Since the SciCap datasets launch in 2021, the research community has made significant progress in generating captions for scientific figures in scholarly articles. In 2023, the first SciCap Challenge took place, inviting global teams to use an expanded SciCap dataset to develop models for captioning diverse figure types across various academic fields. At the same time, text generation models advanced quickly, with many powerful pre-trained large multimodal models (LMMs) emerging that showed impressive capabilities in various vision-and-language tasks. This paper presents an overview of the first SciCap Challenge and details the performance of various models on its data, capturing a snapshot of the fields state. We found that professional editors overwhelmingly preferred figure captions generated by GPT-4V over those from all other models and even the original captions written by authors. Following this key finding, we conducted detailed analyses to answer this question: Have advanced LMMs solved the task of generating captions for scientific figures?

caption, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2501.19353

Country:

Europe (1.00)
North America > United States > Kansas > Cowley County (0.24)

Genre: Research Report > New Finding (1.00)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Using Contextually Aligned Online Reviews to Measure LLMs' Performance Disparities Across Language Varieties

Tang, Zixin, Huang, Chieh-Yang, Li, Tsung-Chi, Ng, Ho Yin Sam, Huang, Hen-Hsen, Huang, Ting-Hao 'Kenneth'

arXiv.org Artificial IntelligenceFeb-12-2025

A language can have different varieties. These varieties can affect the performance of natural language processing (NLP) models, including large language models (LLMs), which are often trained on data from widely spoken varieties. This paper introduces a novel and cost-effective approach to benchmark model performance across language varieties. We argue that international online review platforms, such as Booking.com, can serve as effective data sources for constructing datasets that capture comments in different language varieties from similar real-world scenarios, like reviews for the same hotel with the same rating using the same language (e.g., Mandarin Chinese) but different language varieties (e.g., Taiwan Mandarin, Mainland Mandarin). To prove this concept, we constructed a contextually aligned dataset comprising reviews in Taiwan Mandarin and Mainland Mandarin and tested six LLMs in a sentiment analysis task. Our results show that LLMs consistently underperform in Taiwan Mandarin.

large language model, machine learning, mandarin, (21 more...)

arXiv.org Artificial Intelligence

2502.07058

Country:

Asia > Taiwan (0.70)
North America > Mexico > Mexico City (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

Multi-LLM Collaborative Caption Generation in Scientific Documents

Kim, Jaeyoung, Lee, Jongho, Choi, Hong-Jun, Hsu, Ting-Yao, Huang, Chieh-Yang, Kim, Sungchul, Rossi, Ryan, Yu, Tong, Giles, Clyde Lee, Huang, Ting-Hao 'Kenneth', Choi, Sungchul

arXiv.org Artificial IntelligenceJan-5-2025

Scientific figure captioning is a complex task that requires generating contextually appropriate descriptions of visual content. However, existing methods often fall short by utilizing incomplete information, treating the task solely as either an image-to-text or text summarization problem. This limitation hinders the generation of high-quality captions that fully capture the necessary details. Moreover, existing data sourced from arXiv papers contain low-quality captions, posing significant challenges for training large language models (LLMs). In this paper, we introduce a framework called Multi-LLM Collaborative Figure Caption Generation (MLBCAP) to address these challenges by leveraging specialized LLMs for distinct sub-tasks. Our approach unfolds in three key modules: (Quality Assessment) We utilize multimodal LLMs to assess the quality of training data, enabling the filtration of low-quality captions. (Diverse Caption Generation) We then employ a strategy of fine-tuning/prompting multiple LLMs on the captioning task to generate candidate captions. (Judgment) Lastly, we prompt a prominent LLM to select the highest quality caption from the candidates, followed by refining any remaining inaccuracies. Human evaluations demonstrate that informative captions produced by our approach rank better than human-written captions, highlighting its effectiveness. Our code is available at https://github.com/teamreboott/MLBCAP

caption, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2501.02552

Country:

North America > United States > Pennsylvania (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Add feedback

Generating Educational Materials with Different Levels of Readability using LLMs

Huang, Chieh-Yang, Wei, Jing, Huang, Ting-Hao 'Kenneth'

arXiv.org Artificial IntelligenceJun-18-2024

We assess the capability of GPT-3.5, LLaMA-2 iterative editing to ensure that the revised texts meet the 70B, and Mixtral 8x7B, to generate content at various readability desired difficulty criteria. This readability assessment is based on levels through zero-shot and few-shot prompting. Evaluating 100 various linguistic features, with sentence length and word frequency processed educational materials reveals that few-shot prompting identified as key factors in previous studies [11]. Although this significantly improves performance in readability manipulation and process appears straightforward, accurately adjusting these elements information preservation. LLaMA-2 70B performs better in achieving to achieve the target reading difficulty is challenging. This the desired difficulty range, while GPT-3.5 maintains original task becomes even more complex for young learners, where factors meaning. However, manual inspection highlights concerns such such as decodability [19], information load [15], and other elements as misinformation introduction and inconsistent edit distribution.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2406.12787

Country:

North America > United States > North Carolina (0.14)
North America > United States > Pennsylvania (0.14)

Genre:

Research Report (1.00)
Instructional Material (1.00)

Industry: Education > Educational Setting (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SciCapenter: Supporting Caption Composition for Scientific Figures with Machine-Generated Captions and Ratings

Hsu, Ting-Yao, Huang, Chieh-Yang, Huang, Shih-Hong, Rossi, Ryan, Kim, Sungchul, Yu, Tong, Giles, C. Lee, Huang, Ting-Hao K.

arXiv.org Artificial IntelligenceMar-26-2024

Crafting effective captions for figures is important. Readers heavily depend on these captions to grasp the figure's message. However, despite a well-developed set of AI technologies for figures and captions, these have rarely been tested for usefulness in aiding caption writing. This paper introduces SciCapenter, an interactive system that puts together cutting-edge AI technologies for scientific figure captions to aid caption composition. SciCapenter generates a variety of captions for each figure in a scholarly article, providing scores and a comprehensive checklist to assess caption quality across multiple critical aspects, such as helpfulness, OCR mention, key takeaways, and visual properties reference. Users can directly edit captions in SciCapenter, resubmit for revised evaluations, and iteratively refine them. A user study with Ph.D. students indicates that SciCapenter significantly lowers the cognitive load of caption writing. Participants' feedback further offers valuable design insights for future systems aiming to enhance caption writing.

caption, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3613905.3650738

2403.17784

Country:

North America > United States > Pennsylvania (0.15)
North America > United States > California (0.14)
Asia > Middle East > UAE (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)

Industry:

Health & Medicine (0.68)
Education (0.68)

Technology:

Information Technology > Human Computer Interaction (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)

Add feedback

ConvXAI: Delivering Heterogeneous AI Explanations via Conversations to Support Human-AI Scientific Writing

Shen, Hua, Huang, Chieh-Yang, Wu, Tongshuang, Huang, Ting-Hao 'Kenneth'

arXiv.org Artificial IntelligenceOct-27-2023

Despite a surge collection of XAI methods, users still struggle to obtain required AI explanations. Previous research suggests chatbots as dynamic solutions, but the effective design of conversational XAI agents for practical human needs remains under-explored. This paper focuses on Conversational XAI for AI-assisted scientific writing tasks. Drawing from human linguistic theories and formative studies, we identify four design rationales: "multifaceted", "controllability", "mix-initiative", "context-aware drill-down". We incorporate them into an interactive prototype, ConvXAI, which facilitates heterogeneous AI explanations for scientific writing through dialogue. In two studies with 21 users, ConvXAI outperforms a GUI-based baseline on improving human-perceived understanding and writing improvement. The paper further discusses the practical human usage patterns in interacting with ConvXAI for scientific co-writing.

explanation, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2305.0977

Country:

Asia (0.67)
Europe (0.67)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.16)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.92)
Information Technology > Artificial Intelligence > Cognitive Science (0.92)
(4 more...)

Add feedback

GPT-4 as an Effective Zero-Shot Evaluator for Scientific Figure Captions

Hsu, Ting-Yao, Huang, Chieh-Yang, Rossi, Ryan, Kim, Sungchul, Giles, C. Lee, Huang, Ting-Hao K.

arXiv.org Artificial IntelligenceOct-23-2023

There is growing interest in systems that generate captions for scientific figures. However, assessing these systems output poses a significant challenge. Human evaluation requires academic expertise and is costly, while automatic evaluation depends on often low-quality author-written captions. This paper investigates using large language models (LLMs) as a cost-effective, reference-free method for evaluating figure captions. We first constructed SCICAP-EVAL, a human evaluation dataset that contains human judgments for 3,600 scientific figure captions, both original and machine-made, for 600 arXiv figures. We then prompted LLMs like GPT-4 and GPT-3 to score (1-6) each caption based on its potential to aid reader understanding, given relevant context such as figure-mentioning paragraphs. Results show that GPT-4, used as a zero-shot evaluator, outperformed all other models and even surpassed assessments made by Computer Science and Informatics undergraduates, achieving a Kendall correlation score of 0.401 with Ph.D. students rankings

large language model, machine learning, natural language, (5 more...)

arXiv.org Artificial Intelligence

2310.15405

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.44)

Add feedback

Summaries as Captions: Generating Figure Captions for Scientific Documents with Automated Text Summarization

Huang, Chieh-Yang, Hsu, Ting-Yao, Rossi, Ryan, Nenkova, Ani, Kim, Sungchul, Chan, Gromit Yeuk-Yin, Koh, Eunyee, Giles, Clyde Lee, Huang, Ting-Hao 'Kenneth'

arXiv.org Artificial IntelligenceAug-11-2023

Good figure captions help paper readers understand complex scientific figures. Unfortunately, even published papers often have poorly written captions. Automatic caption generation could aid paper writers by providing good starting captions that can be refined for better quality. Prior work often treated figure caption generation as a vision-to-language task. In this paper, we show that it can be more effectively tackled as a text summarization task in scientific documents. We fine-tuned PEGASUS, a pre-trained abstractive summarization model, to specifically summarize figure-referencing paragraphs (e.g., "Figure 3 shows...") into figure captions. Experiments on large-scale arXiv figures show that our method outperforms prior vision methods in both automatic and human evaluations. We further conducted an in-depth investigation focused on two key challenges: (i) the common presence of low-quality author-written captions and (ii) the lack of clear standards for good captions. Our code and data are available at: https://github.com/Crowd-AI-Lab/Generating-Figure-Captions-as-a-Text-Summarization-Task.

artificial intelligence, caption, natural language, (16 more...)

arXiv.org Artificial Intelligence

2302.12324

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.46)

Add feedback

Good Data, Large Data, or No Data? Comparing Three Approaches in Developing Research Aspect Classifiers for Biomedical Papers

Chandrasekhar, Shreya, Huang, Chieh-Yang, Huang, Ting-Hao 'Kenneth'

arXiv.org Artificial IntelligenceJun-7-2023

The rapid growth of scientific publications, particularly during the COVID-19 pandemic, emphasizes the need for tools to help researchers efficiently comprehend the latest advancements. One essential part of understanding scientific literature is research aspect classification, which categorizes sentences in abstracts to Background, Purpose, Method, and Finding. In this study, we investigate the impact of different datasets on model performance for the crowd-annotated CODA-19 research aspect classification task. Specifically, we explore the potential benefits of using the large, automatically curated PubMed 200K RCT dataset and evaluate the effectiveness of large language models (LLMs), such as LLaMA, GPT-3, ChatGPT, and GPT-4. Our results indicate that using the PubMed 200K RCT dataset does not improve performance for the CODA-19 task. We also observe that while GPT-4 performs well, it does not outperform the SciBERT model fine-tuned on the CODA-19 dataset, emphasizing the importance of a dedicated and task-aligned datasets dataset for the target task. Our code is available at https://github.com/Crowd-AI-Lab/CODA-19-exp.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2306.0482

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

What Types of Questions Require Conversation to Answer? A Case Study of AskReddit Questions

Huang, Shih-Hong, Huang, Chieh-Yang, Lin, Ya-Fang, Huang, Ting-Hao 'Kenneth'

arXiv.org Artificial IntelligenceApr-3-2023

The proliferation of automated conversational systems such as chatbots, spoken-dialogue systems, and smart speakers, has significantly impacted modern digital life. However, these systems are primarily designed to provide answers to well-defined questions rather than to support users in exploring complex, ill-defined questions. In this paper, we aim to push the boundaries of conversational systems by examining the types of nebulous, open-ended questions that can best be answered through conversation. We first sampled 500 questions from one million open-ended requests posted on AskReddit, and then recruited online crowd workers to answer eight inquiries about these questions. We also performed open coding to categorize the questions into 27 different domains. We found that the issues people believe require conversation to resolve satisfactorily are highly social and personal. Our work provides insights into how future research could be geared to align with users' needs.

artificial intelligence, category, natural language, (16 more...)

arXiv.org Artificial Intelligence

2303.1771

Country: North America > United States > Pennsylvania (0.15)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.46)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback