Wei, Jing
Generating Educational Materials with Different Levels of Readability using LLMs
Huang, Chieh-Yang, Wei, Jing, Huang, Ting-Hao 'Kenneth'
We assess the capability of GPT-3.5, LLaMA-2 iterative editing to ensure that the revised texts meet the 70B, and Mixtral 8x7B, to generate content at various readability desired difficulty criteria. This readability assessment is based on levels through zero-shot and few-shot prompting. Evaluating 100 various linguistic features, with sentence length and word frequency processed educational materials reveals that few-shot prompting identified as key factors in previous studies [11]. Although this significantly improves performance in readability manipulation and process appears straightforward, accurately adjusting these elements information preservation. LLaMA-2 70B performs better in achieving to achieve the target reading difficulty is challenging. This the desired difficulty range, while GPT-3.5 maintains original task becomes even more complex for young learners, where factors meaning. However, manual inspection highlights concerns such such as decodability [19], information load [15], and other elements as misinformation introduction and inconsistent edit distribution.
Learning to Model Diverse Driving Behaviors in Highly Interactive Autonomous Driving Scenarios with Multi-Agent Reinforcement Learning
Weiwei, Liu, Wenxuan, Hu, Wei, Jing, Lanxin, Lei, Lingping, Gao, Yong, Liu
Autonomous vehicles trained through Multi-Agent Reinforcement Learning (MARL) have shown impressive results in many driving scenarios. However, the performance of these trained policies can be impacted when faced with diverse driving styles and personalities, particularly in highly interactive situations. This is because conventional MARL algorithms usually operate under the assumption of fully cooperative behavior among all agents and focus on maximizing team rewards during training. To address this issue, we introduce the Personality Modeling Network (PeMN), which includes a cooperation value function and personality parameters to model the varied interactions in high-interactive scenarios. The PeMN also enables the training of a background traffic flow with diverse behaviors, thereby improving the performance and generalization of the ego vehicle. Our extensive experimental studies, which incorporate different personality parameters in high-interactive driving scenarios, demonstrate that the personality parameters effectively model diverse driving styles and that policies trained with PeMN demonstrate better generalization compared to traditional MARL methods.
Comparing Large Language Model AI and Human-Generated Coaching Messages for Behavioral Weight Loss
Huang, Zhuoran, Berry, Michael P., Chwyl, Christina, Hsieh, Gary, Wei, Jing, Forman, Evan M.
Automated coaching messages for weight control can save time and costs, but their repetitive, generic nature may limit their effectiveness compared to human coaching. Large language model (LLM) based artificial intelligence (AI) chatbots, like ChatGPT, could offer more personalized and novel messages to address repetition with their data-processing abilities. While LLM AI demonstrates promise to encourage healthier lifestyles, studies have yet to examine the feasibility and acceptability of LLM-based BWL coaching. 87 adults in a weight-loss trial rated ten coaching messages' helpfulness (five human-written, five ChatGPT-generated) using a 5-point Likert scale, providing additional open-ended feedback to justify their ratings. Participants also identified which messages they believed were AI-generated. The evaluation occurred in two phases: messages in Phase 1 were perceived as impersonal and negative, prompting revisions for Phase 2 messages. In Phase 1, AI-generated messages were rated less helpful than human-written ones, with 66 percent receiving a helpfulness rating of 3 or higher. However, in Phase 2, the AI messages matched the human-written ones regarding helpfulness, with 82% scoring three or above. Additionally, 50% were misidentified as human-written, suggesting AI's sophistication in mimicking human-generated content. A thematic analysis of open-ended feedback revealed that participants appreciated AI's empathy and personalized suggestions but found them more formulaic, less authentic, and too data-focused. This study reveals the preliminary feasibility and acceptability of LLM AIs, like ChatGPT, in crafting potentially effective weight control coaching messages. Our findings also underscore areas for future enhancement.
Leveraging Large Language Models to Power Chatbots for Collecting User Self-Reported Data
Wei, Jing, Kim, Sungdong, Jung, Hyunhoon, Kim, Young-Ho
Large language models (LLMs) provide a new way to build chatbots by accepting natural language prompts. Yet, it is unclear how to design prompts to power chatbots to carry on naturalistic conversations while pursuing a given goal, such as collecting self-report data from users. We explore what design factors of prompts can help steer chatbots to talk naturally and collect data reliably. To this aim, we formulated four prompt designs with different structures and personas. Through an online study (N = 48) where participants conversed with chatbots driven by different designs of prompts, we assessed how prompt designs and conversation topics affected the conversation flows and users' perceptions of chatbots. Our chatbots covered 79% of the desired information slots during conversations, and the designs of prompts and topics significantly influenced the conversation flows and the data collection performance. We discuss the opportunities and challenges of building chatbots with LLMs.