AITopics | wildchat

Collaborating Authors

wildchat

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

User Feedback in Human-LLM Dialogues: A Lens to Understand Users But Noisy as a Learning Signal

Liu, Yuhan, Zhang, Michael J. Q., Choi, Eunsol

arXiv.org Artificial IntelligenceOct-7-2025

Once language models (LMs) are deployed, they can interact with users long-term, ideally evolving based on their feedback. Asking for direct user feedback can be disruptive; thus, we study harvesting implicit user feedback from user-LM interaction logs. We study two user-LM interaction datasets (WildChat and LMSYS). First, we analyze user feedback in the user-LLM conversation logs, providing insights into when and why such feedback occurs. Second, we study harvesting learning signals from such implicit user feedback. Specifically, we study whether incorporating the contents of user feedback (e.g., user wanted clarification), in addition to the polarity of the feedback, can improve the model performance. We observe mixed results, showing this helps in short human-designed questions (MTBench) but not on longer and more complex questions (WildBench). Together, we provide an in-depth study of implicit user feedback, showing its potential and limitations.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2507.23158

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Navigating Rifts in Human-LLM Grounding: Study and Benchmark

Shaikh, Omar, Mozannar, Hussein, Bansal, Gagan, Fourney, Adam, Horvitz, Eric

arXiv.org Artificial IntelligenceMar-18-2025

Language models excel at following instructions but often struggle with the collaborative aspects of conversation that humans naturally employ. This limitation in grounding -- the process by which conversation participants establish mutual understanding -- can lead to outcomes ranging from frustrated users to serious consequences in high-stakes scenarios. To systematically study grounding challenges in human-LLM interactions, we analyze logs from three human-assistant datasets: WildChat, MultiWOZ, and Bing Chat. We develop a taxonomy of grounding acts and build models to annotate and forecast grounding behavior. Our findings reveal significant differences in human-human and human-LLM grounding: LLMs were three times less likely to initiate clarification and sixteen times less likely to provide follow-up requests than humans. Additionally, early grounding failures predicted later interaction breakdowns. Building on these insights, we introduce RIFTS: a benchmark derived from publicly available LLM interaction data containing situations where LLMs fail to initiate grounding. We note that current frontier models perform poorly on RIFTS, highlighting the need to reconsider how we train and prompt LLMs for human interaction. To this end, we develop a preliminary intervention that mitigates grounding failures.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.13975

Country:

North America > Canada > Ontario > Toronto (0.04)
North America > United States > New York > Monroe County > Rochester (0.04)
North America > United States > Colorado > Denver County > Denver (0.04)
(3 more...)

Genre: Research Report > New Finding (0.34)

Industry: Banking & Finance > Trading (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Packing Analysis: Packing Is More Appropriate for Large Models or Datasets in Supervised Fine-tuning

Wang, Shuhe, Wang, Guoyin, Wang, Yizhong, Li, Jiwei, Hovy, Eduard, Guo, Chen

arXiv.org Artificial IntelligenceNov-6-2024

Packing, initially utilized in the pre-training phase, is an optimization technique designed to maximize hardware resource efficiency by combining different training sequences to fit the model's maximum input length. Although it has demonstrated effectiveness during pre-training, there remains a lack of comprehensive analysis for the supervised fine-tuning (SFT) stage on the following points: (1) whether packing can effectively enhance training efficiency while maintaining performance, (2) the suitable size of the model and dataset for fine-tuning with the packing method, and (3) whether packing unrelated or related training samples might cause the model to either excessively disregard or over-rely on the context. In this paper, we perform extensive comparisons between SFT methods using padding and packing, covering SFT datasets ranging from 69K to 1.2M and models from 8B to 70B. This provides the first comprehensive analysis of the advantages and limitations of packing versus padding, as well as practical considerations for implementing packing in various training scenarios. Our analysis covers various benchmarks, including knowledge, reasoning, and coding, as well as GPT-based evaluations, time efficiency, and other fine-tuning parameters. We also open-source our code for fine-tuning and evaluation and provide checkpoints fine-tuned on datasets of different sizes, aiming to advance future research on packing methods. Code is available at: https://github.com/ShuheWang1998/Packing-Analysis?tab=readme-ov-file.

arxiv preprint arxiv, dataset, sequence, (12 more...)

arXiv.org Artificial Intelligence

2410.08081

Country:

Asia > Middle East > Jordan (0.04)
Asia > China (0.04)

Genre: Research Report (0.82)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

WildVis: Open Source Visualizer for Million-Scale Chat Logs in the Wild

Deng, Yuntian, Zhao, Wenting, Hessel, Jack, Ren, Xiang, Cardie, Claire, Choi, Yejin

arXiv.org Artificial IntelligenceSep-9-2024

The increasing availability of real-world conversation data offers exciting opportunities for researchers to study user-chatbot interactions. However, the sheer volume of this data makes manually examining individual conversations impractical. To overcome this challenge, we introduce WildVis, an interactive tool that enables fast, versatile, and large-scale conversation analysis. WildVis provides search and visualization capabilities in the text and embedding spaces based on a list of criteria. To manage million-scale datasets, we implemented optimizations including search index construction, embedding precomputation and compression, and caching to ensure responsive user interactions within seconds. We demonstrate WildVis' utility through three case studies: facilitating chatbot misuse research, visualizing and comparing topic distributions across datasets, and characterizing user-specific conversation patterns. WildVis is open-source and designed to be extendable, supporting additional datasets and customized search and visualization functionalities.

dataset, visualization, wildvisualizer, (16 more...)

arXiv.org Artificial Intelligence

2409.03753

Country:

North America > United States > California (0.14)
South America > Argentina (0.05)
North America > Dominican Republic (0.04)

Genre: Research Report (0.50)

Industry: Media > News (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.72)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)

Add feedback

Breaking News: Case Studies of Generative AI's Use in Journalism

Brigham, Natalie Grace, Gao, Chongjiu, Kohno, Tadayoshi, Roesner, Franziska, Mireshghallah, Niloofar

arXiv.org Artificial IntelligenceJun-19-2024

Journalists are among the many users of large language models (LLMs). To better understand the journalist-AI interactions, we conduct a study of LLM usage by two news agencies through browsing the WildChat dataset, identifying candidate interactions, and verifying them by matching to online published articles. Our analysis uncovers instances where journalists provide sensitive material such as confidential correspondence with sources or articles from other agencies to the LLM as stimuli and prompt it to generate articles, and publish these machine-generated articles with limited intervention (median output-publication ROUGE-L of 0.62). Based on our findings, we call for further research into what constitutes responsible use of AI, and the establishment of clear guidelines and best practices on using LLMs in a journalistic context.

article generation, interaction, wildchat, (16 more...)

arXiv.org Artificial Intelligence

2406.13706

Country:

North America > United States > California (0.14)
Asia > Singapore (0.04)
Europe > Switzerland (0.04)
(2 more...)

Genre: Research Report > New Finding (0.35)

Industry: Media > News (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.51)

Add feedback