Goto

Collaborating Authors

 Personal


PSYCHE: A Multi-faceted Patient Simulation Framework for Evaluation of Psychiatric Assessment Conversational Agents

arXiv.org Artificial Intelligence

Recent advances in large language models (LLMs) have accelerated the development of conversational agents capable of generating human-like responses. Since psychiatric assessments typically involve complex conversational interactions between psychiatrists and patients, there is growing interest in developing LLM-based psychiatric assessment conversational agents (PACAs) that aim to simulate the role of psychiatrists in clinical evaluations. However, standardized methods for benchmarking the clinical appropriateness of PACAs' interaction with patients still remain underexplored. Here, we propose PSYCHE, a novel framework designed to enable the 1) clinically relevant, 2) ethically safe, 3) cost-efficient, and 4) quantitative evaluation of PACAs. This is achieved by simulating psychiatric patients based on a multi-faceted psychiatric construct that defines the simulated patients' profiles, histories, and behaviors, which PACAs are expected to assess. We validate the effectiveness of PSYCHE through a study with 10 board-certified psychiatrists, supported by an in-depth analysis of the simulated patient utterances.


Proactive Conversational Agents with Inner Thoughts

arXiv.org Artificial Intelligence

One of the long-standing aspirations in conversational AI is to allow them to autonomously take initiatives in conversations, i.e., being proactive. This is especially challenging for multi-party conversations. Prior NLP research focused mainly on predicting the next speaker from contexts like preceding conversations. In this paper, we demonstrate the limitations of such methods and rethink what it means for AI to be proactive in multi-party, human-AI conversations. We propose that just like humans, rather than merely reacting to turn-taking cues, a proactive AI formulates its own inner thoughts during a conversation, and seeks the right moment to contribute. Through a formative study with 24 participants and inspiration from linguistics and cognitive psychology, we introduce the Inner Thoughts framework. Our framework equips AI with a continuous, covert train of thoughts in parallel to the overt communication process, which enables it to proactively engage by modeling its intrinsic motivation to express these thoughts. We instantiated this framework into two real-time systems: an AI playground web app and a chatbot. Through a technical evaluation and user studies with human participants, our framework significantly surpasses existing baselines on aspects like anthropomorphism, coherence, intelligence, and turn-taking appropriateness.


Why 2025 will be the year Arm dominates PCs

PCWorld

Qualcomm's 2024 debut of new Arm processors for Windows laptops was arguably the most important PC hardware announcement since the introduction of Intel's 486 processors in 1989. Just as that CPU line heralded an age of Intel-driven x86 dominance, Qualcomm's Snapdragon X Elite chips have now taken us into a new era of competition. But 2024 was only the preview. Qualcomm's Snapdragon debut was limited, targeting a specific subset of premium, thin-and-light Windows laptops that don't require discrete graphics. I spoke with two expert analysts in the hardware space for insights on how Arm PCs will continue to grow going forward.


Towards Effective Discrimination Testing for Generative AI

arXiv.org Artificial Intelligence

Generative AI (GenAI) models present new challenges in regulating against discriminatory behavior. In this paper, we argue that GenAI fairness research still has not met these challenges; instead, a significant gap remains between existing bias assessment methods and regulatory goals. This leads to ineffective regulation that can allow deployment of reportedly fair, yet actually discriminatory, GenAI systems. Towards remedying this problem, we connect the legal and technical literature around GenAI bias evaluation and identify areas of misalignment. Through four case studies, we demonstrate how this misalignment between fairness testing techniques and regulatory goals can result in discriminatory outcomes in real-world deployments, especially in adaptive or complex environments. We offer practical recommendations for improving discrimination testing to better align with regulatory goals and enhance the reliability of fairness assessments in future deployments.


LLM-as-an-Interviewer: Beyond Static Testing Through Dynamic LLM Evaluation

arXiv.org Artificial Intelligence

We introduce LLM-as-an-Interviewer, a novel paradigm for evaluating large language models (LLMs). This approach leverages multi-turn interactions where the LLM interviewer actively provides feedback on responses and poses follow-up questions to the evaluated LLM. At the start of the interview, the LLM interviewer dynamically modifies datasets to generate initial questions, mitigating data contamination. We apply the LLM-as-an-Interviewer framework to evaluate six models on the MATH and DepthQA tasks. Our results show that the framework effectively provides insights into LLM performance, including the quality of initial responses, adaptability to feedback, and ability to address follow-up queries like clarification or additional knowledge requests. The framework also addresses key limitations of conventional methods like LLM-as-a-Judge, including verbosity bias and inconsistency across runs. Finally, we propose the Interview Report, which aggregates insights from the interview process, providing examples and a comprehensive analysis of the LLM's strengths and weaknesses. This report offers a detailed snapshot of the model's real-world applicability. The code for our framework is publicly available at https://github.com/interview-eval/.


Random Matrix Theory for Stochastic Gradient Descent

arXiv.org Artificial Intelligence

Machine learning (ML) and artificial intelligence (AI) can provide powerful tools for the scientific community, as demonstrated by the recent Nobel Prize in Chemistry. Reversely, insights from traditional physics theories also contribute to a deeper understanding of the mechanism of learning. Ref. [1] contains a broad overview of the successful cross-fertilisation between ML and the physical sciences, covering a number of domains. One way to mitigate against possible scepticism with regard to using ML as a "black box" is by unveiling the dynamics of training (or learning) and explaining how the relevant information is engraved in the model during the training stage. To further develop this programme, we study here the dynamics of first-order stochastic gradient descent as applied to weight matrices, reporting and expanding on the work presented in Ref. [2]. When training ML models, weight matrices are commonly updated by one of the variants of the stochastic gradient descent algorithm. The dynamics can then be decomposed into a drift and a fluctuating term, and such a system can be described by a discrete Langevin equation. The dynamics of stochastic matrix updates is richer than the dynamics for vector or scalar quantities, as captured by Dyson Brownian motion and random matrix theory (RMT), with the appearance of universal features for the eigenvalues [3-9]. Earlier descriptions of the statistical properties of weight matrices in terms of RMT can be found in e.g.


'Godfather of AI' shortens odds that new technology will wipe out human race over the next 30 years

Daily Mail - Science & tech

The British-Canadian computer scientist dubbed the'Godfather of AI' has shortened the odds of artificial intelligence (AI) wiping out humans over the next 30 years, warning the technology could one day'take control'. Professor Geoffrey Hinton said we need to be'very careful' and'very thoughtful' about the development of AI which he says is'potentially very dangerous'. He had previously said there was a 10 per cent chance of the technology causing the extinction of the human race - but now predicts that figure to be '10 per cent to 20 per cent', because of the rapid pace at which AI is developing. Speaking on BBC Radio 4's Today programme, Professor Hinton said: 'You see, we've never had to deal with things more intelligent than ourselves before.' He continued: 'And how many examples do you know of a more intelligent thing being controlled by a less intelligent thing?


'Godfather of AI' shortens odds of the technology wiping out humanity over next 30 years

The Guardian

The British-Canadian computer scientist often touted as a "godfather" of artificial intelligence has shortened the odds of AI wiping out humanity over the next three decades, warning the pace of change in the technology is "much faster" than expected. Prof Geoffrey Hinton, who this year was awarded the Nobel prize in physics for his work in AI, said there was a "10% to 20%" chance that AI would lead to human extinction within the next three decades. Previously Hinton had said there was a 10% chance of the technology triggering a catastrophic outcome for humanity. Asked on BBC Radio 4's Today programme if he had changed his analysis of a potential AI apocalypse and the one in 10 chance of it happening, he said: "Not really, 10% to 20%." Hinton's estimate prompted Today's guest editor, the former chancellor Sajid Javid, to say "you're going up", to which Hinton replied: "If anything. You see, we've never had to deal with things more intelligent than ourselves before."


Revealed: The best inventions of 2024 - from Tesla's futuristic Robotaxi to Huawei's tri-fold smartphone

Daily Mail - Science & tech

From the steam engine in 1712 to the first ever iPhone in 2007, each year sees the birth of ever more incredible inventions. And after a year of mind-boggling tech, it's clear that 2024 has been no exception to the rule. The last 12 months have seen brilliant minds from around the world creating some mind-blowing and potentially world-changing breakthroughs. With 2024 almost at its end, MailOnline has taken a look back at some of this year's coolest gadgets and most exciting innovations. From an AI for designing proteins to a real-life pair of Wallace and Gromit's'techno trousers', these inventions are a glimpse of how we all might be living in the future. And when it comes to big breakthroughs, this year has been a resounding success for billionaire Elon Musk.


AIhub interview highlights 2024

AIHub

Over the course of 2024, we had the pleasure of finding out more about a whole range of AI topics from researchers around the world. Here, we highlight some of our favourite interviews from the past 12 months. Please note: we have not included our interviews with AAAI/ACM SIGAI Doctoral Consortium participants – these are highlighted in this dedicated collection. Christopher Chandler tells us about model checking and how it is used in the context of autonomous robotic systems, specifically looking at creating multi-step plans for a differential-drive wheeled robot so that it can avoid immediate danger. Bo Li and colleagues won an outstanding datasets and benchmark track award at NeurIPS 2023 for their work DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models.