Goto

Collaborating Authors

 Personal


We're Entering Uncharted Territory for Math

The Atlantic - Technology

Terence Tao, a mathematics professor at UCLA, is a real-life superintelligence. The "Mozart of Math," as he is sometimes called, is widely considered the world's greatest living mathematician. He has won numerous awards, including the equivalent of a Nobel Prize for mathematics, for his advances and proofs. Right now, AI is nowhere close to his level. But technology companies are trying to get it there.


Engadget Podcast: Why the Windows 11 2024 update is all about Copilot AI

Engadget

This week, Microsoft started rolling out the Windows 11 2024 update, but it quickly became clear that the company was far more eager to unveil new features for its Copilot AI and Copilot AI PCs. In this episode, Devindra and Cherlynn chat about Microsoft's current AI priorities, and what it means for people with older PCs. Also, we discuss the death of HoloLens and Microsoft giving up on AR as Meta, Apple and even Snap build for an augmented reality future. Listen below or subscribe on your podcast app of choice. If you've got suggestions or topics you'd like covered on the show, be sure to email us or drop a note in the comments! And be sure to check out our other podcast, Engadget News! Tech debt led to Sonos' disastrous app relaunch, will they be able to win users back? Google is making Gmail summaries more useful and adding a "happening soon" tab to your inbox โ€“ 41:11 Harvard students hack together facial recognition for Meta's smart glasses that instantly doxes strangers โ€“ 44:00 ...


Mixed-Session Conversation with Egocentric Memory

arXiv.org Artificial Intelligence

Recently introduced dialogue systems have demonstrated high usability. However, they still fall short of reflecting real-world conversation scenarios. Current dialogue systems exhibit an inability to replicate the dynamic, continuous, long-term interactions involving multiple partners. This shortfall arises because there have been limited efforts to account for both aspects of real-world dialogues: deeply layered interactions over the long-term dialogue and widely expanded conversation networks involving multiple participants. As the effort to incorporate these aspects combined, we introduce Mixed-Session Conversation, a dialogue system designed to construct conversations with various partners in a multi-session dialogue setup. We propose a new dataset called MiSC to implement this system. The dialogue episodes of MiSC consist of 6 consecutive sessions, with four speakers (one main speaker and three partners) appearing in each episode. Also, we propose a new dialogue model with a novel memory management mechanism, called Egocentric Memory Enhanced Mixed-Session Conversation Agent (EMMA). EMMA collects and retains memories from the main speaker's perspective during conversations with partners, enabling seamless continuity in subsequent interactions. Extensive human evaluations validate that the dialogues in MiSC demonstrate a seamless conversational flow, even when conversation partners change in each session. EMMA trained with MiSC is also evaluated to maintain high memorability without contradiction throughout the entire conversation.


ACE: A LLM-based Negotiation Coaching System

arXiv.org Artificial Intelligence

The growing prominence of LLMs has led to an increase in the development of AI tutoring systems. These systems are crucial in providing underrepresented populations with improved access to valuable education. One important area of education that is unavailable to many learners is strategic bargaining related to negotiation. To address this, we develop a LLM-based Assistant for Coaching nEgotiation (ACE). ACE not only serves as a negotiation partner for users but also provides them with targeted feedback for improvement. To build our system, we collect a dataset of negotiation transcripts between MBA students. These transcripts come from trained negotiators and emulate realistic bargaining scenarios. We use the dataset, along with expert consultations, to design an annotation scheme for detecting negotiation mistakes. ACE employs this scheme to identify mistakes and provide targeted feedback to users. To test the effectiveness of ACE-generated feedback, we conducted a user experiment with two consecutive trials of negotiation and found that it improves negotiation performances significantly compared to a system that doesn't provide feedback and one which uses an alternative method of providing feedback.


MedQA-CS: Benchmarking Large Language Models Clinical Skills Using an AI-SCE Framework

arXiv.org Artificial Intelligence

Artificial intelligence (AI) and large language models (LLMs) in healthcare require advanced clinical skills (CS), yet current benchmarks fail to evaluate these comprehensively. We introduce MedQA-CS, an AI-SCE framework inspired by medical education's Objective Structured Clinical Examinations (OSCEs), to address this gap. MedQA-CS evaluates LLMs through two instruction-following tasks, LLM-as-medical-student and LLM-as-CS-examiner, designed to reflect real clinical scenarios. Our contributions include developing MedQA-CS, a comprehensive evaluation framework with publicly available data and expert annotations, and providing the quantitative and qualitative assessment of LLMs as reliable judges in CS evaluation. Our experiments show that MedQA-CS is a more challenging benchmark for evaluating clinical skills than traditional multiple-choice QA benchmarks (e.g., MedQA). Combined with existing benchmarks, MedQA-CS enables a more comprehensive evaluation of LLMs' clinical capabilities for both open- and closed-source LLMs.


The Playwright in the Age of AI

The Atlantic - Technology

Ayad Akhtar's brilliant new play, McNeal, currently at the Lincoln Center Theater, is transfixing in part because it tracks without flinching the disintegration of a celebrated writer, and in part because Akhtar goes to a place that few writers have visited so effectively--the very near future, in which large language models threaten to undo our self-satisfied understanding of creativity, plagiarism, and originality. And also because Robert Downey Jr., performing onstage for the first time in more than 40 years, perfectly embodies the genius and brokenness of the title character. Check out more from this issue and find your next story to read. I've been in conversation for quite some time with Akhtar, whose play Disgraced won the Pulitzer Prize in 2013, about artificial generative intelligence and its impact on cognition and creation. He's one of the few writers I know whose position on AI can't be reduced to the (understandable) plea For God's sake, stop threatening my existence! In McNeal, he not only suggests that LLMs might be nondestructive utilities for human writers, but also deployed LLMs as he wrote (he's used many of them, ChatGPT, Claude, and Gemini included). To my chagrin and astonishment, they seem to have helped him make an even better play. As you will see in our conversation, he doesn't believe that this should be controversial. In early September, Akhtar, Downey, Bartlett Sher--the Tony Award winner who directed McNeal--and I met at Downey's home in New York for what turned out to be an amusing, occasionally frenetic, and sometimes even borderline profound discussion of the play, its origins, the flummoxing issues it raises, and, yes, Avengers: Age of Ultron. We were joined intermittently by Susan Downey, Robert's wife (and producing partner), and the person who believed that Akhtar's play would tempt her husband to return to the stage. The conversation that follows is a condensed and edited version of our sprawling discussion, but I think it captures something about art and AI, and it certainly captures the exceptional qualities of three people, writer, director, and actor, who are operating at the pinnacle of their trade, without fear--perhaps without enough fear--of what is inescapably coming.


RecSys Challenge 2024: Balancing Accuracy and Editorial Values in News Recommendations

arXiv.org Artificial Intelligence

The RecSys Challenge 2024 aims to advance news recommendation by addressing both the technical and normative challenges inherent in designing effective and responsible recommender systems for news publishing. This paper describes the challenge, including its objectives, problem setting, and the dataset provided by the Danish news publishers Ekstra Bladet and JP/Politikens Media Group ("Ekstra Bladet"). The challenge explores the unique aspects of news recommendation, such as modeling user preferences based on behavior, accounting for the influence of the news agenda on user interests, and managing the rapid decay of news items. Additionally, the challenge embraces normative complexities, investigating the effects of recommender systems on news flow and their alignment with editorial values. We summarize the challenge setup, dataset characteristics, and evaluation metrics. Finally, we announce the winners and highlight their contributions. The dataset is available at: https://recsys.eb.dk.


Knowledge Graph Embedding by Normalizing Flows

arXiv.org Artificial Intelligence

A key to knowledge graph embedding (KGE) is to choose a proper representation space, e.g., point-wise Euclidean space and complex vector space. In this paper, we propose a unified perspective of embedding and introduce uncertainty into KGE from the view of group theory. Our model can incorporate existing models (i.e., generality), ensure the computation is tractable (i.e., efficiency) and enjoy the expressive power of complex random variables (i.e., expressiveness). The core idea is that we embed entities/relations as elements of a symmetric group, i.e., permutations of a set. Permutations of different sets can reflect different properties of embedding. And the group operation of symmetric groups is easy to compute. In specific, we show that the embedding of many existing models, point vectors, can be seen as elements of a symmetric group. To reflect uncertainty, we first embed entities/relations as permutations of a set of random variables. A permutation can transform a simple random variable into a complex random variable for greater expressiveness, called a normalizing flow. We then define scoring functions by measuring the similarity of two normalizing flows, namely NFE. We construct several instantiating models and prove that they are able to learn logical rules. Experimental results demonstrate the effectiveness of introducing uncertainty and our model. The code is available at https://github.com/changyi7231/NFE.


'Simulacrum of Stories': Examining Large Language Models as Qualitative Research Participants

arXiv.org Artificial Intelligence

The recent excitement around generative models has sparked a wave of proposals suggesting the replacement of human participation and labor in research and development--e.g., through surveys, experiments, and interviews--with synthetic research data generated by large language models (LLMs). We conducted interviews with 19 qualitative researchers to understand their perspectives on this paradigm shift. Initially skeptical, researchers were surprised to see similar narratives emerge in the LLM-generated data when using the interview probe. However, over several conversational turns, they went on to identify fundamental limitations, such as how LLMs foreclose participants' consent and agency, produce responses lacking in palpability and contextual depth, and risk delegitimizing qualitative research methods. We argue that the use of LLMs as proxies for participants enacts the surrogate effect, raising ethical and epistemological concerns that extend beyond the technical limitations of current models to the core of whether LLMs fit within qualitative ways of knowing.


Thematic Analysis with Open-Source Generative AI and Machine Learning: A New Method for Inductive Qualitative Codebook Development

arXiv.org Artificial Intelligence

This paper aims to answer one central question: to what extent can open-source generative text models be used in a workflow to approximate thematic analysis in social science research? To answer this question, we present the Generative AI-enabled Theme Organization and Structuring (GATOS) workflow, which uses open-source machine learning techniques, natural language processing tools, and generative text models to facilitate thematic analysis. To establish validity of the method, we present three case studies applying the GATOS workflow, leveraging these models and techniques to inductively create codebooks similar to traditional procedures using thematic analysis. Specifically, we investigate the extent to which a workflow comprising open-source models and tools can inductively produce codebooks that approach the known space of themes and sub-themes. To address the challenge of gleaning insights from these texts, we combine open-source generative text models, retrieval-augmented generation, and prompt engineering to identify codes and themes in large volumes of text, i.e., generate a qualitative codebook. The process mimics an inductive coding process that researchers might use in traditional thematic analysis by reading text one unit of analysis at a time, considering existing codes already in the codebook, and then deciding whether or not to generate a new code based on whether the extant codebook provides adequate thematic coverage. We demonstrate this workflow using three synthetic datasets from hypothetical organizational research settings: a study of teammate feedback in teamwork settings, a study of organizational cultures of ethical behavior, and a study of employee perspectives about returning to their offices after the pandemic. We show that the GATOS workflow is able to identify themes in the text that were used to generate the original synthetic datasets.