Goto

Collaborating Authors

 Personal


Single- vs. Dual-Prompt Dialogue Generation with LLMs for Job Interviews in Human Resources

arXiv.org Artificial Intelligence

Optimizing language models for use in conversational agents requires large quantities of example dialogues. Increasingly, these dialogues are synthetically generated by using powerful large language models (LLMs), especially in domains with challenges to obtain authentic human data. One such domain is human resources (HR). In this context, we compare two LLM-based dialogue generation methods for the use case of generating HR job interviews, and assess whether one method generates higher-quality dialogues that are more challenging to distinguish from genuine human discourse. The first method uses a single prompt to generate the complete interview dialog. The second method uses two agents that converse with each other. To evaluate dialogue quality under each method, we ask a judge LLM to determine whether AI was used for interview generation, using pairwise interview comparisons. We demonstrate that despite a sixfold increase in token cost, interviews generated with the dual-prompt method achieve a win rate up to ten times higher than those generated with the single-prompt method. This difference remains consistent regardless of whether GPT-4o or Llama 3.3 70B is used for either interview generation or judging quality.


Why Grimes No Longer Believes That Art Is Dead

TIME - Tech

A couple of years ago, Grimes thought art might be dying. She worried that TikTok was overwhelming attention spans; that transgressive artists were becoming more sanitized; that gimmicky NFTs like the Bored Ape Yacht Club--digital cartoon monkeys which were selling for millions of dollars--were warping value systems. "I just went through this whole big'art isn't worth anything' internal existential crisis," the Canadian singer-songwriter says. "But I've come out the other end thinking, actually, maybe it's the main thing that matters. In the last year, I feel like things became way more about artists again." The rise of AI, Grimes believes, has played a role in that shift, perhaps paradoxically. Earlier this month, Grimes was honored at the TIME100 AI Impact Awards in Dubai for her role in shaping the present and future of the technology. While many other artists are terrified of AI and its potential to replace them, Grimes has embraced the technology, even releasing an AI tool allowing people to sing through her voice. Grimes' penchant for seriously engaging with what others fear or distrust makes her one of pop culture's most singular--and at times divisive--figures. But Grimes wears her contrarianism as a badge of honor, and doesn't hesitate to offer insights and perspectives on a variety of issues. "I'm so canceled that I basically have nothing left to lose," she says. She argues that hyper-partisan hysteria has consumed social media, and wishes people would have more measured, nuanced conversations, even with people that they disagree with. "A lot of people think I'm one way or the other, but my whole vibe is just like, I just want people to think well," she says.


Generative AI, online platforms and compensation for content: the need for a new framework

AIHub

The emergence of generative artificial intelligence has put the issue of compensation for content producers back on the table. Generative AI offers undeniable benefits but raises familiar fears tied to disruptive technologies. Legal battles are already emerging worldwide, with intellectual property owners and AI developers clashing over rights. Alongside these legal and ethical concerns lies the economic question: how should revenues generated by AI be fairly distributed? Individual contributions to AI-generated outputs are often too complex to quantify, making it difficult to apply the principle of proportional remuneration, which holds that payment for an individual work is tied to the revenue it generates.


LitLinker: Supporting the Ideation of Interdisciplinary Contexts with Large Language Models for Teaching Literature in Elementary Schools

arXiv.org Artificial Intelligence

Teaching literature under interdisciplinary contexts (e.g., science, art) that connect reading materials has become popular in elementary schools. However, constructing such contexts is challenging as it requires teachers to explore substantial amounts of interdisciplinary content and link it to the reading materials. In this paper, we develop LitLinker via an iterative design process involving 13 teachers to facilitate the ideation of interdisciplinary contexts for teaching literature. Powered by a large language model (LLM), LitLinker can recommend interdisciplinary topics and contextualize them with the literary elements (e.g., paragraphs, viewpoints) in the reading materials. A within-subjects study (N=16) shows that compared to an LLM chatbot, LitLinker can improve the integration depth of different subjects and reduce workload in this ideation task. Expert interviews (N=9) also demonstrate LitLinker's usefulness for supporting the ideation of interdisciplinary contexts for teaching literature. We conclude with concerns and design considerations for supporting interdisciplinary teaching with LLMs.


Understand User Opinions of Large Language Models via LLM-Powered In-the-Moment User Experience Interviews

arXiv.org Artificial Intelligence

Which large language model (LLM) is better? Every evaluation tells a story, but what do users really think about current LLMs? This paper presents CLUE, an LLM-powered interviewer that conducts in-the-moment user experience interviews, right after users interacted with LLMs, and automatically gathers insights about user opinions from massive interview logs. We conduct a study with thousands of users to understand user opinions on mainstream LLMs, recruiting users to first chat with a target LLM and then interviewed by CLUE. Our experiments demonstrate that CLUE captures interesting user opinions, for example, the bipolar views on the displayed reasoning process of DeepSeek-R1 and demands for information freshness and multi-modality. Our collected chat-and-interview logs will be released.


Eeyore: Realistic Depression Simulation via Supervised and Preference Optimization

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have been previously explored for mental healthcare training and therapy client simulation, but they still fall short in authentically capturing diverse client traits and psychological conditions. We introduce \textbf{Eeyore}, an 8B model optimized for realistic depression simulation through a structured alignment framework, incorporating expert input at every stage. First, we systematically curate real-world depression-related conversations, extracting depressive traits to guide data filtering and psychological profile construction, and use this dataset to instruction-tune Eeyore for profile adherence. Next, to further enhance realism, Eeyore undergoes iterative preference optimization -- first leveraging model-generated preferences and then calibrating with a small set of expert-annotated preferences. Throughout the entire pipeline, we actively collaborate with domain experts, developing interactive interfaces to validate trait extraction and iteratively refine structured psychological profiles for clinically meaningful role-play customization. Despite its smaller model size, the Eeyore depression simulation outperforms GPT-4o with SOTA prompting strategies, both in linguistic authenticity and profile adherence.


Making Sense of AI Limitations: How Individual Perceptions Shape Organizational Readiness for AI Adoption

arXiv.org Artificial Intelligence

This study investigates how individuals' perceptions of artificial intelligence (AI) limitations influence organizational readiness for AI adoption. Through semi-structured interviews with seven AI implementation experts, analyzed using the Gioia methodology, the research reveals that organizational readiness emerges through dynamic interactions between individual sensemaking, social learning, and formal integration processes. The findings demonstrate that hands-on experience with AI limitations leads to more realistic expectations and increased trust, mainly when supported by peer networks and champion systems. Organizations that successfully translate these individual and collective insights into formal governance structures achieve more sustainable AI adoption. The study advances theory by showing how organizational readiness for AI adoption evolves through continuous cycles of individual understanding, social learning, and organizational adaptation. These insights suggest that organizations should approach AI adoption not as a one-time implementation but as an ongoing strategic learning process that balances innovation with practical constraints. The research contributes to organizational readiness theory and practice by illuminating how micro-level perceptions and experiences shape macro-level adoption outcomes.


Program Merge: What's Deep Learning Got to Do with It?

Communications of the ACM

If you regularly work with open-source code or produce software for a large organization, you are already familiar with many of the challenges posed by collaborative programming at scale. Some of the most vexing of these tend to surface as a consequence of the many independent alterations inevitably made to code, which, unsurprisingly, can lead to updates that do not synchronize. Difficult merges are nothing new, of course, but the scale of the problem has gotten much worse. This is what led a group of researchers at Microsoft Research (MSR) to take on the task of complicated merges as a grand program-repair challenge--one they believed might be addressed at least in part by machine learning (ML). To understand the thinking that led to this effort and then follow where that led, ACM Queue asked Erik Meijer and Terry Coatta to speak with three of the leading figures in the MSR research effort, called DeepMerge.a Meijer was long a member of MSR, but at the time of this discussion was director of engineering at Meta. Coatta is the chief technology officer of Marine Learning Systems. Shuvendu Lahiri and Christian Bird, two of the researchers who helped drive this effort, represent MSR, as does Alexey Svyatkovskiy, who was with Microsoft DevDiv (Development Division) at the time. Terry Coatta: What inspired you to focus on merge conflicts in the first place? And what made you think you'd be able to gain some advantage by applying AI techniques? Christian Bird: Back in the winter of 2020, some of us started talking about ways in which we might be able to use machine learning to improve the state of software engineering.


Charlotte Bunne on developing AI-based diagnostic tools

AIHub

Charlotte Bunne, head of EPFL's Artificial Intelligence in Molecular Medicine Group, is developing AI algorithms to better understand the incredibly complex and high-dimensional data that represent the hundreds of tissue layers and protein markers in an individual cell. EPFL magazine Dimensions spoke to Charlotte Bunne about her work at the cutting-edge of AI in medicine and biology. Could you describe the focus of your research? We are developing diagnostic tools for clinics that are driven by AI technologies. This includes forecasting the best treatment that a patient should receive, trying to understand the state of disease that a patient is in, and deciphering important biomarkers or potential drug targets that we should investigate further.


Natural Language Generation

arXiv.org Artificial Intelligence

This book provides a broad overview of Natural Language Generation (NLG), including technology, user requirements, evaluation, and real-world applications. The focus is on concepts and insights which hopefully will remain relevant for many years, not on the latest LLM innovations. It draws on decades of work by the author and others on NLG. The book has the following chapters: Introduction to NLG; Rule-Based NLG; Machine Learning and Neural NLG; Requirements; Evaluation; Safety, Maintenance, and Testing; and Applications. All chapters include examples and anecdotes from the author's personal experiences, and end with a Further Reading section. The book should be especially useful to people working on applied NLG, including NLG researchers, people in other fields who want to use NLG, and commercial developers. It will not however be useful to people who want to understand the latest LLM technology. There is a companion site with more information at https://ehudreiter.com/book/