conversation length
The Collective Turing Test: Large Language Models Can Generate Realistic Multi-User Discussions
Bouleimen, Azza, De Marzo, Giordano, Kim, Taehee, Pagan, Nicol`o, Metzler, Hannah, Giordano, Silvia, Garcia, David
Large Language Models (LLMs) offer new avenues to simulate online communities and social media. Potential applications range from testing the design of content recommendation algorithms to estimating the effects of content policies and interventions. However, the validity of using LLMs to simulate conversations between various users remains largely untested. We evaluated whether LLMs can convincingly mimic human group conversations on social media. We collected authentic human conversations from Reddit and generated artificial conversations on the same topic with two LLMs: Llama 3 70B and GPT-4o. When presented side-by-side to study participants, LLM-generated conversations were mistaken for human-created content 39\% of the time. In particular, when evaluating conversations generated by Llama 3, participants correctly identified them as AI-generated only 56\% of the time, barely better than random chance. Our study demonstrates that LLMs can generate social media conversations sufficiently realistic to deceive humans when reading them, highlighting both a promising potential for social simulation and a warning message about the potential misuse of LLMs to generate new inauthentic social media content.
- North America > United States (0.14)
- Europe > Switzerland > Zürich > Zürich (0.04)
- Europe > Austria > Vienna (0.04)
- (4 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Questionnaire & Opinion Survey (1.00)
- Media > News (0.96)
- Health & Medicine > Therapeutic Area (0.69)
- Government > Regional Government (0.68)
Unveiling the Impact of Multi-Modal Interactions on User Engagement: A Comprehensive Evaluation in AI-driven Conversations
Zhang, Lichao, Yu, Jia, Zhang, Shuai, Li, Long, Zhong, Yangyang, Liang, Guanbao, Yan, Yuming, Ma, Qing, Weng, Fangsheng, Pan, Fayu, Li, Jing, Xu, Renjun, Lan, Zhenzhong
Large Language Models (LLMs) have significantly advanced user-bot interactions, enabling more complex and coherent dialogues. However, the prevalent text-only modality might not fully exploit the potential for effective user engagement. This paper explores the impact of multi-modal interactions, which incorporate images and audio alongside text, on user engagement in chatbot conversations. We conduct a comprehensive analysis using a diverse set of chatbots and real-user interaction data, employing metrics such as retention rate and conversation length to evaluate user engagement. Our findings reveal a significant enhancement in user engagement with multi-modal interactions compared to text-only dialogues. Notably, the incorporation of a third modality significantly amplifies engagement beyond the benefits observed with just two modalities. These results suggest that multi-modal interactions optimize cognitive processing and facilitate richer information comprehension. This study underscores the importance of multi-modality in chatbot design, offering valuable insights for creating more engaging and immersive AI communication experiences and informing the broader AI community about the benefits of multi-modal interactions in enhancing user engagement.
- Europe > United Kingdom (0.04)
- Asia > Malaysia (0.04)
- Asia > Japan > Shikoku > Ehime Prefecture > Matsuyama (0.04)
Keep the Conversation Going: Fixing 162 out of 337 bugs for $0.42 each using ChatGPT
Xia, Chunqiu Steven, Zhang, Lingming
Automated Program Repair (APR) aims to automatically generate patches for buggy programs. Recent APR work has been focused on leveraging modern Large Language Models (LLMs) to directly generate patches for APR. Such LLM-based APR tools work by first constructing an input prompt built using the original buggy code and then queries the LLM to generate patches. While the LLM-based APR tools are able to achieve state-of-the-art results, it still follows the classic Generate and Validate repair paradigm of first generating lots of patches and then validating each one afterwards. This not only leads to many repeated patches that are incorrect but also miss the crucial information in test failures as well as in plausible patches. To address these limitations, we propose ChatRepair, the first fully automated conversation-driven APR approach that interleaves patch generation with instant feedback to perform APR in a conversational style. ChatRepair first feeds the LLM with relevant test failure information to start with, and then learns from both failures and successes of earlier patching attempts of the same bug for more powerful APR. For earlier patches that failed to pass all tests, we combine the incorrect patches with their corresponding relevant test failure information to construct a new prompt for the LLM to generate the next patch. In this way, we can avoid making the same mistakes. For earlier patches that passed all the tests, we further ask the LLM to generate alternative variations of the original plausible patches. In this way, we can further build on and learn from earlier successes to generate more plausible patches to increase the chance of having correct patches. While our approach is general, we implement ChatRepair using state-of-the-art dialogue-based LLM -- ChatGPT. By calculating the cost of accessing ChatGPT, we can fix 162 out of 337 bugs for \$0.42 each!
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > District of Columbia > Washington (0.05)
- North America > United States > New York > New York County > New York City (0.05)
- (10 more...)
Rewarding Chatbots for Real-World Engagement with Millions of Users
Irvine, Robert, Boubert, Douglas, Raina, Vyas, Liusie, Adian, Zhu, Ziyi, Mudupalli, Vineet, Korshuk, Aliaksei, Liu, Zongyi, Cremer, Fritz, Assassi, Valentin, Beauchamp, Christie-Carol, Lu, Xiaoding, Rialan, Thomas, Beauchamp, William
The emergence of pretrained large language models has led to the deployment of a range of social chatbots for chitchat. Although these chatbots demonstrate language ability and fluency, they are not guaranteed to be engaging and can struggle to retain users. This work investigates the development of social chatbots that prioritize user engagement to enhance retention, specifically examining the use of human feedback to efficiently develop highly engaging chatbots. The proposed approach uses automatic pseudo-labels collected from user interactions to train a reward model that can be used to reject low-scoring sample responses generated by the chatbot model at inference time. Intuitive evaluation metrics, such as mean conversation length (MCL), are introduced as proxies to measure the level of engagement of deployed chatbots. A/B testing on groups of 10,000 new daily chatbot users on the Chai Research platform shows that this approach increases the MCL by up to 70%, which translates to a more than 30% increase in user retention for a GPT-J 6B model. Future work aims to use the reward model to realise a data fly-wheel, where the latest user conversations can be used to alternately fine-tune the language model and the reward model.
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (4 more...)
Modeling Performance in Open-Domain Dialogue with PARADISE
Walker, Marilyn, Harmon, Colin, Graupera, James, Harrison, Davan, Whittaker, Steve
There has recently been an explosion of work on spoken dialogue systems, along with an increased interest in open-domain systems that engage in casual conversations on popular topics such as movies, books and music. These systems aim to socially engage, entertain, and even empathize with their users. Since the achievement of such social goals is hard to measure, recent research has used dialogue length or human ratings as evaluation metrics, and developed methods for automatically calculating novel metrics, such as coherence, consistency, relevance and engagement. Here we develop a PARADISE model for predicting the performance of Athena, a dialogue system that has participated in thousands of conversations with real users, while competing as a finalist in the Alexa Prize. We use both user ratings and dialogue length as metrics for dialogue quality, and experiment with predicting these metrics using automatic features that are both system dependent and independent. Our goal is to learn a general objective function that can be used to optimize the dialogue choices of any Alexa Prize system in real time and evaluate its performance. Our best model for predicting user ratings gets an R$^2$ of .136 with a DistilBert model, and the best model for predicting length with system independent features gets an R$^2$ of .865, suggesting that conversation length may be a more reliable measure for automatic training of dialogue systems.
- Leisure & Entertainment (0.93)
- Media > Film (0.46)
Otter Starter Guide
Select the conversation in which you want to tag speakers. Otter will automatically tag speakers who have previously been identified. For new speakers, please teach Otter their voice by identifying them in the conversation. Select the unknown speaker icon to start identifying the speaker. Otter will list recent speakers for you to choose.
Sounding Board: A User-Centric and Content-Driven Social Chatbot
Fang, Hao, Cheng, Hao, Sap, Maarten, Clark, Elizabeth, Holtzman, Ari, Choi, Yejin, Smith, Noah A., Ostendorf, Mari
We present Sounding Board, a social chatbot that won the 2017 Amazon Alexa Prize. The system architecture consists of several components including spoken language processing, dialogue management, language generation, and content management, with emphasis on user-centric and content-driven design. We also share insights gained from large-scale online logs based on 160,000 conversations with real-world users.
- Europe > Netherlands > South Holland > Dordrecht (0.04)
- Asia > Middle East > Republic of Türkiye > Batman Province > Batman (0.04)