Goto

Collaborating Authors

 Generative AI


Inside OpenAI's empire: A conversation with Karen Hao

MIT Technology Review

These are our subscriber-only events where you get to listen in to conversations between editors and reporters. Now, I'm delighted to say we've got an absolute cracker of an event today. I'm very happy to have our prodigal daughter, Karen Hao, a fabulous AI journalist, here with us to talk about her new book. Hello, Karen, how are you doing? Thank you so much for having me back, Niall.


News Source Citing Patterns in AI Search Systems

arXiv.org Artificial Intelligence

AI-powered search systems are emerging as new information gatekeepers, fundamentally transforming how users access news and information. Despite their growing influence, the citation patterns of these systems remain poorly understood. We address this gap by analyzing data from the AI Search Arena, a head-to-head evaluation platform for AI search systems. The dataset comprises over 24,000 conversations and 65,000 responses from models across three major providers: OpenAI, Perplexity, and Google. Among the over 366,000 citations embedded in these responses, 9% reference news sources. We find that while models from different providers cite distinct news sources, they exhibit shared patterns in citation behavior. News citations concentrate heavily among a small number of outlets and display a pronounced liberal bias, though low-credibility sources are rarely cited. User preference analysis reveals that neither the political leaning nor the quality of cited news sources significantly influences user satisfaction. These findings reveal significant challenges in current AI search systems and have important implications for their design and governance.


Narrowing the Gap: Supervised Fine-Tuning of Open-Source LLMs as a Viable Alternative to Proprietary Models for Pedagogical Tools

arXiv.org Artificial Intelligence

Frontier Large language models (LLMs) like ChatGPT and Gemini can decipher cryptic compiler errors for novice programmers, but their computational scale, cost, and tendency to over-assist make them problematic for widespread pedagogical adoption. This work demonstrates that smaller, specialised language models, enhanced via Supervised Fine-Tuning (SFT), present a more viable alternative for educational tools. We utilise a new dataset of 40,000 C compiler error explanations, derived from real introductory programming (CS1/2) student-generated programming errors, which we used to fine-tune three open-source models: Qwen3-4B, Llama-3.1-8B, and Qwen3-32B. We performed a dual evaluation, combining expert human reviews with a large-scale automated analysis of 8,000 responses using a validated LLM-as-judge ensemble. Our results show that SFT significantly boosts the pedagogical quality of smaller models, achieving performance comparable to much larger models. We analyse the trade-offs between model size and quality, confirming that fine-tuning compact, efficient models on high-quality, domain-specific data is a potent strategy for creating specialised models to drive educational tools. We provide a replicable methodology to foster broader access to generative AI capabilities in educational contexts.


The GenAI Generation: Student Views of Awareness, Preparedness, and Concern

arXiv.org Artificial Intelligence

Generative Artificial Intelligence (GenAI) is revolutionizing education and workforce development, profoundly shaping how students learn, engage, and prepare for their future. Outpacing the development of uniform policies and structures, GenAI has heralded a unique era and given rise to the GenAI Generation. We define the GenAI Generation as a cohort of students whose education has been increasingly shaped by the opportunities and challenges GenAI presents during its widespread adoption within society. This study examines students' perceptions of GenAI through a concise survey with optional open-ended questions, focusing on their awareness, preparedness, and concerns. Notably, readiness appears increasingly tied to exposure to GenAI through one's coursework. Students with greater curricular exposure to GenAI tend to feel more prepared, while those without it more often express vulnerability and uncertainty, highlighting a new and growing divide in readiness that goes beyond traditional disciplinary boundaries. Evaluation of more than 250 responses, with over 40% providing detailed qualitative feedback, reveals a core dual sentiment: while most students express enthusiasm for GenAI, an even greater proportion voice a spectrum of concerns about ethics, job displacement, and the adequacy of educational structures given the highly transformative technology. These findings offer critical insights into how students view the potential and pitfalls of GenAI for future career impacts. The challenge ahead involves implementing associated recommendations for educational institutions, moving beyond the baseline of access toward more informed guidance on the use of these tools, while preserving critical thinking, ethical reasoning, and adaptive learning.


Evaluation of Habitat Robotics using Large Language Models

arXiv.org Artificial Intelligence

This paper focuses on evaluating the effectiveness of Large Language Models at solving embodied robotic tasks using the Meta PARTNER benchmark. Meta PARTNR provides simplified environments and robotic interactions within randomized indoor kitchen scenes. Each randomized kitchen scene is given a task where two robotic agents cooperatively work together to solve the task. We evaluated multiple frontier models on Meta PARTNER environments. Our results indicate that reasoning models like OpenAI o3-mini outperform non-reasoning models like OpenAI GPT-4o and Llama 3 when operating in PARTNR's robotic embodied environments. o3-mini displayed outperform across centralized, decentralized, full observability, and partial observability configurations. This provides a promising avenue of research for embodied robotic development.


NeoBabel: A Multilingual Open Tower for Visual Generation

arXiv.org Artificial Intelligence

Text-to-image generation advancements have been predominantly English-centric, creating barriers for non-English speakers and perpetuating digital inequities. While existing systems rely on translation pipelines, these introduce semantic drift, computational overhead, and cultural misalignment. We introduce NeoBabel, a novel multilingual image generation framework that sets a new Pareto frontier in performance, efficiency and inclusivity, supporting six languages: English, Chinese, Dutch, French, Hindi, and Persian. The model is trained using a combination of large-scale multilingual pretraining and high-resolution instruction tuning. To evaluate its capabilities, we expand two English-only benchmarks to multilingual equivalents: m-GenEval and m-DPG. NeoBabel achieves state-of-the-art multilingual performance while retaining strong English capability, scoring 0.75 on m-GenEval and 0.68 on m-DPG. Notably, it performs on par with leading models on English tasks while outperforming them by +0.11 and +0.09 on multilingual benchmarks, even though these models are built on multilingual base LLMs. This demonstrates the effectiveness of our targeted alignment training for preserving and extending crosslingual generalization. We further introduce two new metrics to rigorously assess multilingual alignment and robustness to code-mixed prompts. Notably, NeoBabel matches or exceeds English-only models while being 2-4x smaller. We release an open toolkit, including all code, model checkpoints, a curated dataset of 124M multilingual text-image pairs, and standardized multilingual evaluation protocols, to advance inclusive AI research. Our work demonstrates that multilingual capability is not a trade-off but a catalyst for improved robustness, efficiency, and cultural fidelity in generative AI.


LLMs are Introvert

arXiv.org Artificial Intelligence

The exponential growth of social media and generative AI has transformed information dissemination, fostering connectivity but also accelerating the spread of misinformation. Understanding information propagation dynamics and developing effective control strategies is essential to mitigate harmful content. Traditional models, such as SIR, provide basic insights but inadequately capture the complexities of online interactions. Advanced methods, including attention mechanisms and graph neural networks, enhance accuracy but typically overlook user psychology and behavioral dynamics. Large language models (LLMs), with their human-like reasoning, offer new potential for simulating psychological aspects of information spread. We introduce an LLM-based simulation environment capturing agents' evolving attitudes, emotions, and responses. Initial experiments, however, revealed significant gaps between LLM-generated behaviors and authentic human dynamics, especially in stance detection and psychological realism. A detailed evaluation through Social Information Processing Theory identified major discrepancies in goal-setting and feedback evaluation, stemming from the lack of emotional processing in standard LLM training. To address these issues, we propose the Social Information Processing-based Chain of Thought (SIP-CoT) mechanism enhanced by emotion-guided memory. This method improves the interpretation of social cues, personalization of goals, and evaluation of feedback. Experimental results confirm that SIP-CoT-enhanced LLM agents more effectively process social information, demonstrating behaviors, attitudes, and emotions closer to real human interactions. In summary, this research highlights critical limitations in current LLM-based propagation simulations and demonstrates how integrating SIP-CoT and emotional memory significantly enhances the social intelligence and realism of LLM agents.


Prompt Migration: Stabilizing GenAI Applications with Evolving Large Language Models

arXiv.org Artificial Intelligence

Generative AI is transforming business applications by enabling natural language interfaces and intelligent automation. However, the underlying large language models (LLMs) are evolving rapidly and so prompting them consistently is a challenge. This leads to inconsistent and unpredictable application behavior, undermining the reliability that businesses require for mission-critical workflows. In this paper, we introduce the concept of prompt migration as a systematic approach to stabilizing GenAI applications amid changing LLMs. Using the Tursio enterprise search application as a case study, we analyze the impact of successive GPT model upgrades, detail our migration framework including prompt redesign and a migration testbed, and demonstrate how these techniques restore application consistency. Our results show that structured prompt migration can fully recover the application reliability that was lost due to model drift. We conclude with practical lessons learned, emphasizing the need for prompt lifecycle management and robust testing to ensure dependable GenAI-powered business applications.


The Ethical Implications of AI in Creative Industries: A Focus on AI-Generated Art

arXiv.org Artificial Intelligence

As Artificial Intelligence (AI) continues to grow daily, more exciting (and somewhat controversial) technology emerges every other day. As we see the advancements in AI, we see more and more people becoming skeptical of it. This paper explores the complications and confusion around the ethics of generative AI art. We delve deep into the ethical side of AI, specifically generative art. We step back from the excitement and observe the impossible conundrums that this impressive technology produces. Covering environmental consequences, celebrity representation, intellectual property, deep fakes, and artist displacement. Our research found that generative AI art is responsible for increased carbon emissions, spreading misinformation, copyright infringement, unlawful depiction, and job displacement. In light of this, we propose multiple possible solutions for these problems. We address each situation's history, cause, and consequences and offer different viewpoints. At the root of it all, though, the central theme is that generative AI Art needs to be correctly legislated and regulated.


Integrating Generative AI in BIM Education: Insights from Classroom Implementation

arXiv.org Artificial Intelligence

This study evaluates the implementation of a Generative AI-powered rule checking workflow within a graduate-level Building Information Modeling (BIM) course at a U.S. university. Over two semesters, 55 students participated in a classroom-based pilot exploring the use of GenAI for BIM compliance tasks, an area with limited prior research. The instructional design included lectures on prompt engineering and AI-driven rule checking, followed by an assignment where students used a large language model (LLM) to identify code violations in designs using Autodesk Revit. Surveys and interviews were conducted to assess student workload, learning effectiveness, and overall experience, using the NASA-TLX scale and regression analysis. Findings indicate students generally achieved learning objectives but faced challenges such as difficulties debugging AI-generated code and inconsistent tool performance, probably due to their limited prompt engineering experience. These issues increased cognitive and emotional strain, especially among students with minimal programming backgrounds. Despite these challenges, students expressed strong interest in future GenAI applications, particularly with clear instructional support.