Generative AI
Are Generative AI Agents Effective Personalized Financial Advisors?
Takayanagi, Takehiro, Izumi, Kiyoshi, Sanz-Cruzado, Javier, McCreadie, Richard, Ounis, Iadh
Large language model-based agents are becoming increasingly popular as a low-cost mechanism to provide personalized, conversational advice, and have demonstrated impressive capabilities in relatively simple scenarios, such as movie recommendations. But how do these agents perform in complex high-stakes domains, where domain expertise is essential and mistakes carry substantial risk? This paper investigates the effectiveness of LLM-advisors in the finance domain, focusing on three distinct challenges: (1) eliciting user preferences when users themselves may be unsure of their needs, (2) providing personalized guidance for diverse investment preferences, and (3) leveraging advisor personality to build relationships and foster trust. Via a lab-based user study with 64 participants, we show that LLM-advisors often match human advisor performance when eliciting preferences, although they can struggle to resolve conflicting user needs. When providing personalized advice, the LLM was able to positively influence user behavior, but demonstrated clear failure modes. Our results show that accurate preference elicitation is key, otherwise, the LLM-advisor has little impact, or can even direct the investor toward unsuitable assets. More worryingly, users appear insensitive to the quality of advice being given, or worse these can have an inverse relationship. Indeed, users reported a preference for and increased satisfaction as well as emotional trust with LLMs adopting an extroverted persona, even though those agents provided worse advice.
OpenAI is apparently making a social network
It looks like OpenAI is building its own X-like social media network, according to a report by The Verge. We don't have many specifics, but we do know there's an internal prototype that adds a social feed to ChatGPT's image generation tool. It remains unclear if OpenAI will launch this social network as a standalone app or if it will be integrated within the ChatGPT app, which is what the prototype indicates. The report does suggest that OpenAI CEO Sam Altman has been asking for feedback about the social network from people outside of the company. One potential reason for this step is that the app would allow OpenAI to gather real-time data from users to train its AI models.
Towards Interpretable Deep Generative Models via Causal Representation Learning
Moran, Gemma E., Aragam, Bryon
Recent developments in generative artificial intelligence (AI) rely on machine learning techniques such as deep learning and generative modeling to achieve state-of-the-art performance across wide-ranging domains. These methods' surprising performance is due in part to their ability to learn implicit "representations'' of complex, multi-modal data. Unfortunately, deep neural networks are notoriously black boxes that obscure these representations, making them difficult to interpret or analyze. To resolve these difficulties, one approach is to build new interpretable neural network models from the ground up. This is the goal of the emerging field of causal representation learning (CRL) that uses causality as a vector for building flexible, interpretable, and transferable generative AI. CRL can be seen as a culmination of three intrinsically statistical problems: (i) latent variable models such as factor analysis; (ii) causal graphical models with latent variables; and (iii) nonparametric statistics and deep learning. This paper reviews recent progress in CRL from a statistical perspective, focusing on connections to classical models and statistical and causal identifiablity results. This review also highlights key application areas, implementation strategies, and open statistical questions in CRL.
Confirmation Bias in Generative AI Chatbots: Mechanisms, Risks, Mitigation Strategies, and Future Research Directions
Drawing on cognitive psychology and computational linguistics, it examines how confirmation bias--commonly understood as the tendency to seek information that aligns with existing beliefs--can be replicated and amplified by the design and functioning of large language models. The article analyzes the mechanisms by which confirmation bias may manifest in chatbot interactions, assesses the ethical and practical risks associated with such bias, and proposes a range of mitigation strategies. These include technical interventions, interface redesign, and policy measures aimed at promoting balanced AI-generated discourse. The article concludes by outlining future research directions, emphasizing the need for interdisciplinary collaboration and empirical evaluation to better understand and address confirmation bias in generative AI systems. Keywords: confirmation bias, generative AI, chatbots, large language models, AI ethics, user interaction 1. Introduction The emergence of generative AI chatbots has marked a significant turning point in the field of artificial intelligence (AI) (Chang et al., 2 0 2 4). These systems, underpinned by large-scale language models, have demonstrated a remarkable capacity for producing coherent, contextually relevant, and often creative responses to human queries (Wang et al., 2 0 2 4).
Performance of Large Language Models in Supporting Medical Diagnosis and Treatment
Sousa, Diogo, Barbosa, Guilherme, Rocha, Catarina, Oliveira, Dulce
The integration of Large Language Models (LLMs) into healthcare holds significant potential to enhance diagnostic accuracy and support medical treatment planning. These AI-driven systems can analyze vast datasets, assisting clinicians in identifying diseases, recommending treatments, and predicting patient outcomes. This study evaluates the performance of a range of contemporary LLMs, including both open-source and closed-source models, on the 2024 Portuguese National Exam for medical specialty access (PNA), a standardized medical knowledge assessment. Our results highlight considerable variation in accuracy and cost-effectiveness, with several models demonstrating performance exceeding human benchmarks for medical students on this specific task. We identify leading models based on a combined score of accuracy and cost, discuss the implications of reasoning methodologies like Chain-of-Thought, and underscore the potential for LLMs to function as valuable complementary tools aiding medical professionals in complex clinical decision-making.
"All Roads Lead to ChatGPT": How Generative AI is Eroding Social Interactions and Student Learning Communities
Hou, Irene, Man, Owen, Hamilton, Kate, Muthusekaran, Srishty, Johnykutty, Jeffin, Zadeh, Leili, MacNeil, Stephen
The widespread adoption of generative AI is already impacti ng learning and help-seeking. While the benefits of generative AI are well-understood, recent studies have also raised concernsabout increased potential for cheating and negative impacts on stud ents' metacognition and critical thinking. However, the potenti al impacts on social interactions, peer learning, and classroom dynamics are not yet well understood. To investigate these aspect s, we conducted 17 semi-structured interviews with undergraduate computing students across seven R1 universities in NorthAmerica. Our findings suggest that help-seeking requests are now often me di-ated by generative AI. For example, students often redirected questions from their peers to generative AI instead of providing assistance themselves, undermining peer interaction. Students also reported feeling increasingly isolated and demotivated as th e social support systems they rely on begin to break down. These findings are concerning given the important role that social interac tions play in students' learning and sense of belonging.
Leveraging Reasoning Model Answers to Enhance Non-Reasoning Model Capability
Wang, Haotian, Zhao, Han, Chen, Shuaiting, Tian, Xiaoyu, Zhao, Sitong, Ji, Yunjie, Peng, Yiping, Li, Xiangang
A significant trend in enhancing the capabilities of these models is test-time scaling (Yang et al., 2025; Wu et al., 2025), where increasing computational resources allocated during inference leads to notable performance improvements. Models such as OpenAI's o1 series (OpenAI, 2024) and DeepSeek-R1 (DeepSeek-AI, 2025) have demonstrated the effectiveness of this approach across various tasks and benchmarks (Lightman et al., 2023; Huang et al., 2024). The capability of these models to achieve superior results by allocating additional computational resources during inference indicates an important shift in optimizing performance for LLMs. Specifically, dedicating more computation to the answer-generation process, rather than solely relying on scaling training data and model parameters, can lead to significant improvements, particularly in tasks that require complex reasoning (Snell et al., 2024). The success of test-time scaling thus emphasizes the crucial role of computation during the answer-generation phase.
Linguistic Comparison of AI- and Human-Written Responses to Online Mental Health Queries
Saha, Koustuv, Jain, Yoshee, De Choudhury, Munmun
The ubiquity and widespread use of digital and online technologies have transformed mental health support, with online mental health communities (OMHCs) providing safe spaces for peer support. More recently, generative AI and large language models (LLMs) have introduced new possibilities for scalable, around-the-clock mental health assistance that could potentially augment and supplement the capabilities of OMHCs. Although genAI shows promise in delivering immediate and personalized responses, their effectiveness in replicating the nuanced, experience-based support of human peers remains an open question. In this study, we harnessed 24,114 posts and 138,758 online community (OC) responses from 55 OMHCs on Reddit. We prompted several state-of-the-art LLMs (GPT-4-Turbo, Llama-3, and Mistral-7B) with these posts, and compared their (AI) responses to human-written (OC) responses based on a variety of linguistic measures across psycholinguistics and lexico-semantics. Our findings revealed that AI responses are more verbose, readable, and analytically structured, but lack linguistic diversity and personal narratives inherent in human-human interactions. Through a qualitative examination, we found validation as well as complementary insights into the nature of AI responses, such as its neutrality of stance and the absence of seeking back-and-forth clarifications. We discuss the ethical and practical implications of integrating generative AI into OMHCs, advocating for frameworks that balance AI's scalability and timeliness with the irreplaceable authenticity, social interactiveness, and expertise of human connections that form the ethos of online support communities.
Generative AI in Collaborative Academic Report Writing: Advantages, Disadvantages, and Ethical Considerations
Sadeghpour, Mahshid, Arakala, Arathi, Rao, Asha
The availability and abundance of GenAI tools to administer tasks traditionally managed by people have raised concerns, particularly within the education and academic sectors, as some students may highly rely on these tools to complete the assignments designed to enable learning. This article focuses on informing students about the significance of investing their time during their studies on developing essential life-long learning skills using their own critical thinking, rather than depending on AI models that are susceptible to misinformation, hallucination, and bias. As we transition to an AI-centric era, it is important to educate students on how these models work, their pitfalls, and the ethical concerns associated with feeding data to such tools. Keywords: GenAI in Academic Writing GenAI's Ethics GenAI's Privacy Concerns. 1 Introduction Writing academic reports, and papers have been instrumental to assisting students and researchers in shaping their ideas, organising their methods, and practicing their communication skills, particularly when this process is combined with receiving constant feedback from experts. With the launch of OpenAI's first publicly available Large Language Model, namely ChatGPT (GPT-3.5), a significant concern rose within the academic and research community about the reliability of the academic and research output. Evidence suggests that as individuals began discovering the availability and efficiency in using Generative Artificial Intelligence tools in late 2022, there was a significant surge in retracted research articles resulting in more than 10,000 retracted papers [1]. The over-reliance of individuals on various Generative Artificial Intelligence (Gen AI) tools for completing tasks that require a human's critical thinking has raised concerns.
Generative AI in Live Operations: Evidence of Productivity Gains in Cybersecurity and Endpoint Management
Bono, James, Grana, Justin, Karakolios, Kleanthis, Ramakrishna, Pruthvi Hanumanthapura, Srivastava, Ankit
We measure the association between generative AI (GAI) tool adoption and four metrics spanning security operations, information protection, and endpoint management: 1) number of security alerts per incident, 2) probability of security incident reopenings, 3) time to classify a data loss prevention alert, and 4) time to resolve device policy conflicts. We find that GAI is associated with robust and statistically and practically significant improvements in the four metrics. Although unobserved confounders inhibit causal identification, these results are among the first to use observational data from live operations to investigate the relationship between GAI adoption and security operations, data loss prevention, and device policy management.