Generative AI
syftr: Pareto-Optimal Generative AI
Conway, Alexander, Dey, Debadeepta, Hackmann, Stefan, Hausknecht, Matthew, Schmidt, Michael, Steadman, Mark, Volynets, Nick
Retrieval-Augmented Generation (RAG) pipelines are central to applying large language models (LLMs) to proprietary or dynamic data. However, building effective RAG flows is complex, requiring careful selection among vector databases, embedding models, text splitters, retrievers, and synthesizing LLMs. The challenge deepens with the rise of agentic paradigms. Modules like verifiers, rewriters, and rerankers-each with intricate hyperparameter dependencies have to be carefully tuned. Balancing tradeoffs between latency, accuracy, and cost becomes increasingly difficult in performance-sensitive applications. We introduce syftr, a framework that performs efficient multi-objective search over a broad space of agentic and non-agentic RAG configurations. Using Bayesian Optimization, syftr discovers Pareto-optimal flows that jointly optimize task accuracy and cost. A novel early-stopping mechanism further improves efficiency by pruning clearly suboptimal candidates. Across multiple RAG benchmarks, syftr finds flows which are on average approximately 9 times cheaper while preserving most of the accuracy of the most accurate flows on the Pareto-frontier. Furthermore, syftr's ability to design and optimize allows integrating new modules, making it even easier and faster to realize high-performing generative AI pipelines.
Fusion Intelligence for Digital Twinning AI Data Centers: A Synergistic GenAI-PhyAI Approach
Wang, Ruihang, Li, Minghao, Cao, Zhiwei, Jia, Jimin, Guan, Kyle, Wen, Yonggang
The explosion in artificial intelligence (AI) applications is pushing the development of AI-dedicated data centers (AIDCs), creating management challenges that traditional methods and standalone AI solutions struggle to address. While digital twins are beneficial for AI-based design validation and operational optimization, current AI methods for their creation face limitations. Specifically, physical AI (PhyAI) aims to capture the underlying physical laws, which demands extensive, case-specific customization, and generative AI (GenAI) can produce inaccurate or hallucinated results. We propose Fusion Intelligence, a novel framework synergizing GenAI's automation with PhyAI's domain grounding. In this dual-agent collaboration, GenAI interprets natural language prompts to generate tokenized AIDC digital twins. Subsequently, PhyAI optimizes these generated twins by enforcing physical constraints and assimilating real-time data. Case studies demonstrate the advantages of our framework in automating the creation and validation of AIDC digital twins. These twins deliver predictive analytics to support power usage effectiveness (PUE) optimization in the design stage. With operational data collected, the digital twin accuracy is further improved compared with pure physics-based models developed by human experts. Fusion Intelligence offers a promising pathway to accelerate digital transformation. It enables more reliable and efficient AI-driven digital transformation for a broad range of mission-critical infrastructures.
Recalibrating the Compass: Integrating Large Language Models into Classical Research Methods
This paper examines how large language models (LLMs) are transforming core quantitative methods in communication research in particular, and in the social sciences more broadly-namely, content analysis, survey research, and experimental studies. Rather than replacing classical approaches, LLMs introduce new possibilities for coding and interpreting text, simulating dynamic respondents, and generating personalized and interactive stimuli. Drawing on recent interdisciplinary work, the paper highlights both the potential and limitations of LLMs as research tools, including issues of validity, bias, and interpretability. To situate these developments theoretically, the paper revisits Lasswell's foundational framework -- "Who says what, in which channel, to whom, with what effect?" -- and demonstrates how LLMs reconfigure message studies, audience analysis, and effects research by enabling interpretive variation, audience trajectory modeling, and counterfactual experimentation. Revisiting the metaphor of the methodological compass, the paper argues that classical research logics remain essential as the field integrates LLMs and generative AI. By treating LLMs not only as technical instruments but also as epistemic and cultural tools, the paper calls for thoughtful, rigorous, and imaginative use of LLMs in future communication and social science research.
Architectures of Error: A Philosophical Inquiry into AI and Human Code Generation
With the rise of generative AI (GenAI), Large Language Models are increasingly employed for code generation, becoming active co-authors alongside human programmers. Focusing specifically on this application domain, this paper articulates distinct ``Architectures of Error'' to ground an epistemic distinction between human and machine code generation. Examined through their shared vulnerability to error, this distinction reveals fundamentally different causal origins: human-cognitive versus artificial-stochastic. To develop this framework and substantiate the distinction, the analysis draws critically upon Dennett's mechanistic functionalism and Rescher's methodological pragmatism. I argue that a systematic differentiation of these error profiles raises critical philosophical questions concerning semantic coherence, security robustness, epistemic limits, and control mechanisms in human-AI collaborative software development. The paper also utilizes Floridi's levels of abstraction to provide a nuanced understanding of how these error dimensions interact and may evolve with technological advancements. This analysis aims to offer philosophers a structured framework for understanding GenAI's unique epistemological challenges, shaped by these architectural foundations, while also providing software engineers a basis for more critically informed engagement.
An Initial Exploration of Fine-tuning Small Language Models for Smart Contract Reentrancy Vulnerability Detection
Pofcher, Ignacio Mariano Andreozzi, Ellul, Joshua
Generative AI techniques have been proposed for various aspects of coding for tasks ranging from coding assistants [1] to optimisation [2] and vulnerability detection [3] for which promising results are being heeded. Indeed, for many cases traditional types of code verification (be it at compile/development time [4] or runtime [5]) often out perform generative AI-based techniques, yet such tools are often rigid and less flexible compared to how generative AI techniques can be used. Given potential future 1 advancements of generative AI techniques, and given the flexible interface with which tools can interact with generative AI tools, it is useful to evaluate'how good are generative AI techniques at undertaking such tasks?' Indeed, extensive work in the domain has already been proposed surrounding this question, of which an extensive amount of literature has focused on the state-of-the-art large language models. Whilst it may be reasonable to make use of commercially/publicly available LLMs that are operated by a service provider, they indeed raise issues of privacy and confidentiality which some entities may rather not disclose certain intellectual property to (e.g.
Digital Overconsumption and Waste: A Closer Look at the Impacts of Generative AI
Generative Artificial Intelligence (AI) systems currently contribute negatively to the production of digital waste, via the associated energy consumption and the related CO2 emissions. At this moment, a discussion is urgently needed on the replication of harmful consumer behavior, such as overconsumption, in the digital space. We outline our previous work on the climate implications of commercially available generative AI systems and the sentiment of generative AI users when confronted with AI-related climate research. We expand on this work via a discussion of digital overconsumption and waste, other related societal impacts, and a possible solution pathway
AI-Driven Climate Policy Scenario Generation for Sub-Saharan Africa
Badekale, Rafiu Adekoya, Akinfaderin, Adewale
Climate policy scenario generation and evaluation have traditionally relied on integrated assessment models (IAMs) and expert-driven qualitative analysis. These methods enable stakeholders, such as policymakers and researchers, to anticipate impacts, plan governance strategies, and develop mitigation measures. However, traditional methods are often time-intensive, reliant on simple extrapolations of past trends, and limited in capturing the complex and interconnected nature of energy and climate issues. With the advent of artificial intelligence (AI), particularly generative AI models trained on vast datasets, these limitations can be addressed, ensuring robustness even under limited data conditions. In this work, we explore the novel method that employs generative AI, specifically large language models (LLMs), to simulate climate policy scenarios for Sub-Saharan Africa. These scenarios focus on energy transition themes derived from the historical United Nations Climate Change Conference (COP) documents. By leveraging generative models, the project aims to create plausible and diverse policy scenarios that align with regional climate goals and energy challenges. Given limited access to human evaluators, automated techniques were employed for scenario evaluation. We generated policy scenarios using the llama3.2-3B model. Of the 34 generated responses, 30 (88%) passed expert validation, accurately reflecting the intended impacts provided in the corresponding prompts. We compared these validated responses against assessments from a human climate expert and two additional LLMs (gemma2-2B and mistral-7B). Our structured, embedding-based evaluation framework shows that generative AI effectively generate scenarios that are coherent, relevant, plausible, and diverse. This approach offers a transformative tool for climate policy planning in data-constrained regions.
From Reddit to Generative AI: Evaluating Large Language Models for Anxiety Support Fine-tuned on Social Media Data
Kursuncu, Ugur, Padhi, Trilok, Sinha, Gaurav, Erol, Abdulkadir, Mandivarapu, Jaya Krishna, Larrison, Christopher R.
The critical shortage of mental health services due to workforce limitations and logistical barriers, especially in underserved areas designated by the Health Resources & Services Administration (HRSA) 1, highlights the urgent need for accessible and scalable solutions. Traditional services often fail to address the diverse needs of individuals experiencing anxiety, prompting many, especially younger populations, to seek alternative emotional and psychological support online. While digital platforms offer immediate access, unregulated online interactions, including those with generative AI, may disseminate misleading information or inappropriate advice, potentially exacerbating anxiety symptoms (Tobias & Ito, 2021). Despite the great potential of generative AI to supplement mental health services, its deployment poses potentially significant risks. Unlike clinical practitioners, LLMs are not inherently equipped to manage emotionally complex or vulnerable conversations, which are critical to therapeutic relationships that create positive clinical outcomes (Rogers, 1957; Wampold, 2015).
Human-Centered AI Communication in Co-Creativity: An Initial Framework and Insights
Effective communication between AI and humans is essential for successful human-AI co-creation. However, many current co-creative AI systems lack effective communication, which limits their potential for collaboration. This paper presents the initial design of the Framework for AI Communication (FAICO) for co-creative AI, developed through a systematic review of 107 full-length papers. FAICO presents key aspects of AI communication and their impact on user experience, offering preliminary guidelines for designing human-centered AI communication. To improve the framework, we conducted a preliminary study with two focus groups involving skilled individuals in AI, HCI, and design. These sessions sought to understand participants' preferences for AI communication, gather their perceptions of the framework, collect feedback for refinement, and explore its use in co-creative domains like collaborative writing and design. Our findings reveal a preference for a human-AI feedback loop over linear communication and emphasize the importance of context in fostering mutual understanding. Based on these insights, we propose actionable strategies for applying FAICO in practice and future directions, marking the first step toward developing comprehensive guidelines for designing effective human-centered AI communication in co-creation.
Understanding Generative AI Capabilities in Everyday Image Editing Tasks
Taesiri, Mohammad Reza, Collins, Brandon, Bolton, Logan, Lai, Viet Dac, Dernoncourt, Franck, Bui, Trung, Nguyen, Anh Totti
Generative AI (GenAI) holds significant promise for automating everyday image editing tasks, especially following the recent release of GPT-4o on March 25, 2025. However, what subjects do people most often want edited? What kinds of editing actions do they want to perform (e.g., removing or stylizing the subject)? Do people prefer precise edits with predictable outcomes or highly creative ones? By understanding the characteristics of real-world requests and the corresponding edits made by freelance photo-editing wizards, can we draw lessons for improving AI-based editors and determine which types of requests can currently be handled successfully by AI editors? In this paper, we present a unique study addressing these questions by analyzing 83k requests from the past 12 years (2013-2025) on the Reddit community, which collected 305k PSR-wizard edits. According to human ratings, approximately only 33% of requests can be fulfilled by the best AI editors (including GPT-4o, Gemini-2.0-Flash, SeedEdit). Interestingly, AI editors perform worse on low-creativity requests that require precise editing than on more open-ended tasks. They often struggle to preserve the identity of people and animals, and frequently make non-requested touch-ups. On the other side of the table, VLM judges (e.g., o1) perform differently from human judges and may prefer AI edits more than human edits. Code and qualitative examples are available at: https://psrdataset.github.io