AITopics | Huang, Yun

Collaborating Authors

Huang, Yun

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Team, M-A-P, Du, Xinrun, Yao, Yifan, Ma, Kaijing, Wang, Bingli, Zheng, Tianyu, Zhu, Kang, Liu, Minghao, Liang, Yiming, Jin, Xiaolong, Wei, Zhenlin, Zheng, Chujie, Deng, Kaixin, Jia, Shian, Jiang, Sichao, Liao, Yiyan, Li, Rui, Li, Qinrui, Li, Sirun, Li, Yizhi, Li, Yunwen, Ma, Dehua, Ni, Yuansheng, Que, Haoran, Wang, Qiyao, Wen, Zhoufutu, Wu, Siwei, Xing, Tianshun, Xu, Ming, Yang, Zhenzhu, Wang, Zekun Moore, Zhou, Junting, Bai, Yuelin, Bu, Xingyuan, Cai, Chenglin, Chen, Liang, Chen, Yifan, Cheng, Chengtuo, Cheng, Tianhao, Ding, Keyi, Huang, Siming, Huang, Yun, Li, Yaoru, Li, Yizhe, Li, Zhaoqun, Liang, Tianhao, Lin, Chengdong, Lin, Hongquan, Ma, Yinghao, Pang, Tianyang, Peng, Zhongyuan, Peng, Zifan, Qi, Qige, Qiu, Shi, Qu, Xingwei, Quan, Shanghaoran, Tan, Yizhou, Wang, Zili, Wang, Chenqing, Wang, Hao, Wang, Yiya, Wang, Yubo, Xu, Jiajun, Yang, Kexin, Yuan, Ruibin, Yue, Yuanhao, Zhan, Tianyang, Zhang, Chun, Zhang, Jinyang, Zhang, Xiyue, Zhang, Xingjian, Zhang, Yue, Zhao, Yongchi, Zheng, Xiangyu, Zhong, Chenghua, Gao, Yang, Li, Zhoujun, Liu, Dayiheng, Liu, Qian, Liu, Tianyu, Ni, Shiwen, Peng, Junran, Qin, Yujia, Su, Wenbo, Wang, Guoyin, Wang, Shi, Yang, Jian, Yang, Min, Cao, Meng, Yue, Xiang, Zhang, Zhaoxiang, Zhou, Wangchunshu, Liu, Jiaheng, Lin, Qunshu, Huang, Wenhao, Zhang, Ge

arXiv.org Artificial IntelligenceMar-4-2025

Large language models (LLMs) have demonstrated remarkable proficiency in mainstream academic disciplines such as mathematics, physics, and computer science. However, human knowledge encompasses over 200 specialized disciplines, far exceeding the scope of existing benchmarks. The capabilities of LLMs in many of these specialized fields-particularly in light industry, agriculture, and service-oriented disciplines-remain inadequately evaluated. To address this gap, we present SuperGPQA, a comprehensive benchmark that evaluates graduate-level knowledge and reasoning capabilities across 285 disciplines. Our benchmark employs a novel Human-LLM collaborative filtering mechanism to eliminate trivial or ambiguous questions through iterative refinement based on both LLM responses and expert feedback. Our experimental results reveal significant room for improvement in the performance of current state-of-the-art LLMs across diverse knowledge domains (e.g., the reasoning-focused model DeepSeek-R1 achieved the highest accuracy of 61.82% on SuperGPQA), highlighting the considerable gap between current model capabilities and artificial general intelligence. Additionally, we present comprehensive insights from our management of a large-scale annotation process, involving over 80 expert annotators and an interactive Human-LLM collaborative system, offering valuable methodological guidance for future research initiatives of comparable scope.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.14739

Country:

North America > United States > Michigan > Isabella County (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China > Guangdong Province (0.14)
Africa > Cameroon > Gulf of Guinea (0.14)

Genre: Research Report (0.82)

Industry:

Law (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Energy > Oil & Gas > Upstream (1.00)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

An Integrated Platform for Studying Learning with Intelligent Tutoring Systems: CTAT+TutorShop

Aleven, Vincent, Borchers, Conrad, Huang, Yun, Nagashima, Tomohiro, McLaren, Bruce, Carvalho, Paulo, Popescu, Octav, Sewall, Jonathan, Koedinger, Kenneth

arXiv.org Artificial IntelligenceJan-17-2025

Intelligent tutoring systems (ITSs) are effective in helping students learn; further research could make them even more effective. Particularly desirable is research into how students learn with these systems, how these systems best support student learning, and what learning sciences principles are key in ITSs. CTAT+Tutorshop provides a full stack integrated platform that facilitates a complete research lifecycle with ITSs, which includes using ITS data to discover learner challenges, to identify opportunities for system improvements, and to conduct experimental studies. The platform includes authoring tools to support and accelerate development of ITS, which provide automatic data logging in a format compatible with DataShop, an independent site that supports the analysis of ed tech log data to study student learnings. Among the many technology platforms that exist to support learning sciences research, CTAT+Tutorshop may be the only one that offers researchers the possibility to author elements of ITSs, or whole ITSs, as part of designing studies. This platform has been used to develop and conduct an estimated 147 research studies which have run in a wide variety of laboratory and real-world educational settings, including K-12 and higher education, and have addressed a wide range of research questions. This paper presents five case studies of research conducted on the CTAT+Tutorshop platform, and summarizes what has been accomplished and what is possible for future researchers. We reflect on the distinctive elements of this platform that have made it so effective in facilitating a wide range of ITS research.

machine learning, natural language, platform, (16 more...)

arXiv.org Artificial Intelligence

2502.10395

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.87)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (1.00)
Education > Educational Setting (1.00)
Education > Curriculum (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (0.68)
(2 more...)

Add feedback

VicSim: Enhancing Victim Simulation with Emotional and Linguistic Fidelity

Li, Yerong, Liu, Yiren, Huang, Yun

arXiv.org Artificial IntelligenceJan-6-2025

Scenario-based training has been widely adopted in many public service sectors. Recent advancements in Large Language Models (LLMs) have shown promise in simulating diverse personas to create these training scenarios. However, little is known about how LLMs can be developed to simulate victims for scenario-based training purposes. In this paper, we introduce VicSim (victim simulator), a novel model that addresses three key dimensions of user simulation: informational faithfulness, emotional dynamics, and language style (e.g., grammar usage). We pioneer the integration of scenario-based victim modeling with GAN-based training workflow and key-information-based prompting, aiming to enhance the realism of simulated victims. Our adversarial training approach teaches the discriminator to recognize grammar and emotional cues as reliable indicators of synthetic content. According to evaluations by human raters, the VicSim model outperforms GPT-4 in terms of human-likeness.

large language model, machine learning, vicsim, (19 more...)

arXiv.org Artificial Intelligence

2501.03139

Country: North America > United States > Illinois (0.29)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Health & Medicine > Therapeutic Area (1.00)
Education > Educational Setting (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.36)

Add feedback

Improving Emotional Support Delivery in Text-Based Community Safety Reporting Using Large Language Models

Liu, Yiren, Li, Yerong, Mayfield, Ryan, Huang, Yun

arXiv.org Artificial IntelligenceSep-23-2024

Emotional support is a crucial aspect of communication between community members and police dispatchers during incident reporting. However, there is a lack of understanding about how emotional support is delivered through text-based systems, especially in various non-emergency contexts. In this study, we analyzed two years of chat logs comprising 57,114 messages across 8,239 incidents from 130 higher education institutions. Our empirical findings revealed significant variations in emotional support provided by dispatchers, influenced by the type of incident, service time, and a noticeable decline in support over time across multiple organizations. To improve the consistency and quality of emotional support, we developed and implemented a fine-tuned Large Language Model (LLM), named dispatcherLLM. We evaluated dispatcherLLM by comparing its generated responses to those of human dispatchers and other off-the-shelf models using real chat messages. Additionally, we conducted a human evaluation to assess the perceived effectiveness of the support provided by dispatcherLLM. This study not only contributes new empirical understandings of emotional support in text-based dispatch systems but also demonstrates the significant potential of generative AI in improving service delivery.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2409.15706

Country: North America > United States > Illinois (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (1.00)
Education (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

From Experts to the Public: Governing Multimodal Language Models in Politically Sensitive Video Analysis

Sharma, Tanusree, Potter, Yujin, Kilhoffer, Zachary, Huang, Yun, Song, Dawn, Wang, Yang

arXiv.org Artificial IntelligenceSep-14-2024

This paper examines the governance of multimodal large language models (MM-LLMs) through individual and collective deliberation, focusing on analyses of politically sensitive videos. We conducted a two-step study: first, interviews with 10 journalists established a baseline understanding of expert video interpretation; second, 114 individuals from the general public engaged in deliberation using Inclusive.AI, a platform that facilitates democratic decision-making through decentralized autonomous organization (DAO) mechanisms. Our findings show that while experts emphasized emotion and narrative, the general public prioritized factual clarity, objectivity of the situation, and emotional neutrality. Additionally, we explored the impact of different governance mechanisms: quadratic vs. weighted ranking voting and equal vs. 20-80 power distributions on users decision-making on how AI should behave. Specifically, quadratic voting enhanced perceptions of liberal democracy and political equality, and participants who were more optimistic about AI perceived the voting process to have a higher level of participatory democracy. Our results suggest the potential of applying DAO mechanisms to help democratize AI governance.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.01817

Country:

North America > United States (1.00)
Europe (1.00)

Genre: Research Report > New Finding (1.00)

Industry:

Media > News (1.00)
Law (1.00)
Information Technology > Security & Privacy (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(2 more...)

Add feedback

Reference-based Metrics Disprove Themselves in Question Generation

Nguyen, Bang, Yu, Mengxia, Huang, Yun, Jiang, Meng

arXiv.org Artificial IntelligenceJun-17-2024

Reference-based metrics such as BLEU and BERTScore are widely used to evaluate question generation (QG). In this study, on QG benchmarks such as SQuAD and HotpotQA, we find that using human-written references cannot guarantee the effectiveness of the reference-based metrics. Most QG benchmarks have only one reference; we replicated the annotation process and collect another reference. A good metric was expected to grade a human-validated question no worse than generated questions. However, the results of reference-based metrics on our newly collected reference disproved the metrics themselves. We propose a reference-free metric consisted of multi-dimensional criteria such as naturalness, answerability, and complexity, utilizing large language models. These criteria are not constrained to the syntactic or semantic of a single reference question, and the metric does not require a diverse set of references. Experiments reveal that our metric accurately distinguishes between high-quality questions and flawed ones, and achieves state-of-the-art alignment with human judgment.

large language model, machine learning, question answering, (21 more...)

arXiv.org Artificial Intelligence

2403.12242

Country:

Asia > Middle East > UAE (0.14)
North America > United States > Illinois (0.14)

Genre: Research Report > New Finding (0.34)

Industry:

Government > Military (0.93)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Social Life Simulation for Non-Cognitive Skills Learning

Yan, Zihan, Xiang, Yaohong, Huang, Yun

arXiv.org Artificial IntelligenceApr-30-2024

Non-cognitive skills are crucial for personal and social life well-being, and such skill development can be supported by narrative-based (e.g., storytelling) technologies. While generative AI enables interactive and role-playing storytelling, little is known about how users engage with and perceive the use of AI in social life simulation for non-cognitive skills learning. To this end, we introduced SimuLife++, an interactive platform enabled by a large language model (LLM). The system allows users to act as protagonists, creating stories with one or multiple AI-based characters in diverse social scenarios. In particular, we expanded the Human-AI interaction to a Human-AI-AI collaboration by including a sage agent, who acts as a bystander to provide users with more insightful perspectives on their choices and conversations. Through a within-subject user study, we found that the inclusion of the sage agent significantly enhanced narrative immersion, according to the narrative transportation scale, leading to more messages, particularly in group chats. Participants' interactions with the sage agent were also associated with significantly higher scores in their perceived motivation, self-perceptions, and resilience and coping, indicating positive impacts on non-cognitive skills reflection. Participants' interview results further explained the sage agent's aid in decision-making, solving ethical dilemmas, and problem-solving; on the other hand, they suggested improvements in user control and balanced responses from multiple characters. We provide design implications on the application of generative AI in narrative solutions for non-cognitive skill development in broader social contexts.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2405.00273

Country:

Europe (1.00)
North America > United States > Illinois (0.28)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Education (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.68)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
(2 more...)

Add feedback

Synergizing Human-AI Agency: A Guide of 23 Heuristics for Service Co-Creation with LLM-Based Agents

Zheng, Qingxiao, Xu, Zhongwei, Choudhry, Abhinav, Chen, Yuting, Li, Yongming, Huang, Yun

arXiv.org Artificial IntelligenceNov-29-2023

This empirical study serves as a primer for interested service providers to determine if and how Large Language Models (LLMs) technology will be integrated for their practitioners and the broader community. We investigate the mutual learning journey of non-AI experts and AI through CoAGent, a service co-creation tool with LLM-based agents. Engaging in a three-stage participatory design processes, we work with with 23 domain experts from public libraries across the U.S., uncovering their fundamental challenges of integrating AI into human workflows. Our findings provide 23 actionable "heuristics for service co-creation with AI", highlighting the nuanced shared responsibilities between humans and AI. We further exemplar 9 foundational agency aspects for AI, emphasizing essentials like ownership, fair treatment, and freedom of expression. Our innovative approach enriches the participatory design model by incorporating AI as crucial stakeholders and utilizing AI-AI interaction to identify blind spots. Collectively, these insights pave the way for synergistic and ethical human-AI co-creation in service contexts, preparing for workforce ecosystems where AI coexists.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2310.15065

Country:

Asia (0.92)
North America > United States > Illinois (0.28)

Genre:

Research Report > New Finding (0.66)
Overview > Innovation (0.48)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Education (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

The Self 2.0: How AI-Enhanced Self-Clones Transform Self-Perception and Improve Presentation Skills

Zheng, Qingxiao, Huang, Yun

arXiv.org Artificial IntelligenceOct-23-2023

This study explores the impact of AI-generated digital self-clones on improving online presentation skills. We carried out a mixed-design experiment involving 44 international students, comparing self-recorded videos (control) with self-clone videos (AI group) for English presentation practice. The AI videos utilized voice cloning, face swapping, lip-sync, and body-language simulation to refine participants' original presentations in terms of repetition, filler words, and pronunciation. Machine-rated scores indicated enhancements in speech performance for both groups. Though the groups didn't significantly differ, the AI group exhibited a heightened depth of reflection, self-compassion, and a meaningful transition from a corrective to an enhancive approach to self-critique. Within the AI group, congruence between self-perception and AI self-clones resulted in diminished speech anxiety and increased enjoyment. Our findings recommend the ethical employment of digital self-clones to enhance the emotional and cognitive facets of skill development.

artificial intelligence, machine learning, social media, (17 more...)

arXiv.org Artificial Intelligence

2310.15112

Country:

North America > United States (0.28)
Europe > Spain (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Research Report > Strength Medium (0.95)

Industry:

Information Technology > Security & Privacy (1.00)
Education > Educational Setting > Higher Education (0.68)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.67)
Health & Medicine > Therapeutic Area > Oncology (0.67)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback