Goto

Collaborating Authors

 Generative AI


HalluLens: LLM Hallucination Benchmark

arXiv.org Artificial Intelligence

Large language models (LLMs) often generate responses that deviate from user input or training data, a phenomenon known as "hallucination." These hallucinations undermine user trust and hinder the adoption of generative AI systems. Addressing hallucinations is essential for the advancement of LLMs. This paper introduces a comprehensive hallucination benchmark, incorporating both new extrinsic and existing intrinsic evaluation tasks, built upon clear taxonomy of hallucination. A major challenge in benchmarking hallucinations is the lack of a unified framework due to inconsistent definitions and categorizations. We disentangle LLM hallucination from "factuality," proposing a clear taxonomy that distinguishes between extrinsic and intrinsic hallucinations, to promote consistency and facilitate research. Extrinsic hallucinations, where the generated content is not consistent with the training data, are increasingly important as LLMs evolve. Our benchmark includes dynamic test set generation to mitigate data leakage and ensure robustness against such leakage. We also analyze existing benchmarks, highlighting their limitations and saturation. The work aims to: (1) establish a clear taxonomy of hallucinations, (2) introduce new extrinsic hallucination tasks, with data that can be dynamically regenerated to prevent saturation by leakage, (3) provide a comprehensive analysis of existing benchmarks, distinguishing them from factuality evaluations.


Auditing the Ethical Logic of Generative AI Models

arXiv.org Artificial Intelligence

As generative AI models become increasingly integrated into high-stakes domains, the need for robust methods to evaluate their ethical reasoning becomes increasingly important. This paper introduces a five-dimensional audit model -- assessing Analytic Quality, Breadth of Ethical Considerations, Depth of Explanation, Consistency, and Decisiveness -- to evaluate the ethical logic of leading large language models (LLMs). Drawing on traditions from applied ethics and higher-order thinking, we present a multi-battery prompt approach, including novel ethical dilemmas, to probe the models' reasoning across diverse contexts. We benchmark seven major LLMs finding that while models generally converge on ethical decisions, they vary in explanatory rigor and moral prioritization. Chain-of-Thought prompting and reasoning-optimized models significantly enhance performance on our audit metrics. This study introduces a scalable methodology for ethical benchmarking of AI systems and highlights the potential for AI to complement human moral reasoning in complex decision-making contexts.


Symbolic Representation for Any-to-Any Generative Tasks

arXiv.org Artificial Intelligence

We propose a symbolic generative task description language and a corresponding inference engine capable of representing arbitrary multimodal tasks as structured symbolic flows. Unlike conventional generative models that rely on large-scale training and implicit neural representations to learn cross-modal mappings, often at high computational cost and with limited flexibility, our framework introduces an explicit symbolic representation comprising three core primitives: functions, parameters, and topological logic. Leveraging a pre-trained language model, our inference engine maps natural language instructions directly to symbolic workflows in a training-free manner. Our framework successfully performs over 12 diverse multimodal generative tasks, demonstrating strong performance and flexibility without the need for task-specific tuning. Experiments show that our method not only matches or outperforms existing state-of-the-art unified models in content quality, but also offers greater efficiency, editability, and interruptibility. We believe that symbolic task representations provide a cost-effective and extensible foundation for advancing the capabilities of generative AI.


OpenAI Wants to Go For-Profit. Experts Say Regulators Should Step In

TIME - Tech

In the latest development in an ongoing struggle over OpenAI's future direction--and potentially the future of artificial intelligence itself--dozens of prominent figures are urging the Attorneys General of California and Delaware to block OpenAI's controversial plan to convert from its unique nonprofit-controlled structure to a for-profit company. In a letter made public April 23, signatories including "AI Godfather" Geoffrey Hinton, Harvard legal professor Lawrence Lessig, and several former OpenAI researchers argue the move represents a fundamental betrayal of OpenAI's founding mission. "The proposed restructuring would eliminate essential safeguards, effectively handing control of, and profits from, what could be the most powerful technology ever created to a for-profit entity with legal duties to prioritize shareholder returns," the letter's authors write. It lands as OpenAI faces immense pressure from the other side: failing to implement the restructure by the end of the year could cost the company 20 billion and hamstring future fundraising. OpenAI was founded in 2015 as a non-profit, with its stated mission being to ensure that artificial general intelligence (AGI) "benefits all of humanity" rather than advancing "the private gain of any person."


Driving business value by optimizing the cloud

MIT Technology Review

At the same time, hosted services like generative AI and tailored industry solutions can help companies quickly launch applications and grow the business. To get the most out of these services, companies are turning to cloud optimization--the process of selecting and allocating cloud resources to reduce costs while maximizing performance. But despite all the interest in the cloud, many workloads remain stranded on-premises, and many more are not optimized for efficiency and growth, greatly limiting the forward momentum. Companies are missing out on a virtuous cycle of mutually reinforcing results that comes from even more efficient use of the cloud. Organizations can enhance security, make critical workloads more resilient, protect the customer experience, boost revenues, and generate cost savings.


Streetscape Analysis with Generative AI (SAGAI): Vision-Language Assessment and Mapping of Urban Scenes

arXiv.org Artificial Intelligence

Streetscape Analysis with Generative AI (SAGAI): Vision-Language Assessment and Mapping of Urban Scenes Joan Perez 1 and Giovanni Fusco 2 1 Urban Geo Analytics, France 2 Universit e Cห† ote-Azur-CNRS-AMU-Avignon Universit e, ESPACE, France April 2025 Abstract Streetscapes are an essential component of urban space. Their assessment is presently either limited to morphometric properties of their mass skeleton or requires labor-intensive qualitative evaluations of visually perceived qualities. This paper introduces SAGAI: Streetscape Analysis with Generative Artificial Intelligence, a modular workflow for scoring street-level urban scenes using open-access data and vision-language models. SAGAI integrates OpenStreetMap geometries, Google Street View imagery, and a lightweight version of the LLaVA model to generate structured spatial indicators from images via customizable natural language prompts. The pipeline includes an automated mapping module that aggregates visual scores at both the point and street levels, enabling direct cartographic interpretation. It operates without task-specific training or proprietary software dependencies, supporting scalable and interpretable analysis of urban environments. Two exploratory case studies in Nice and Vienna illustrate SAGAI's capacity to produce geospatial outputs from vision-language inference. The initial results show strong performance for binary urban-rural scene classification, moderate precision in commercial feature detection, and lower estimates, but still informative, of sidewalk width. Fully deployable by any user, SAGAI can be easily adapted to a wide range of urban research themes, such as walkability, safety, or urban design, through prompt modification alone. Keywords: Vision-Language Models, Street View Imagery, Streetscape Analysis, Geospatial AI, zero-shot inference 1 Introduction Assessing the qualities and functions of urban streetscapes is essential to understand walkability, safety, commercial vitality, and social life in cities [1, 2, 3]. However, traditional methods for observing and evaluating street-level conditions, such as field surveys, audits, and manual photo interpretation, remain time-consuming, labor-intensive, and difficult to scale beyond small pilot zones [2]. Geo-processing of vector models of the built environment allows the assessment of Email: jperez@urbangeoanalytics.com, ORCID: 0000-0003-3003-0895 Email: giovanni.fusco@univ-cotedazur.fr,


The Dance of Atoms-De Novo Protein Design with Diffusion Model

arXiv.org Artificial Intelligence

The de novo design of proteins refers to creating proteins with specific structures and functions that do not naturally exist. In recent years, the accumulation of high-quality protein structure and sequence data and technological advancements have paved the way for the successful application of generative artificial intelligence (AI) models in protein design. These models have surpassed traditional approaches that rely on fragments and bioinformatics. They have significantly enhanced the success rate of de novo protein design, and reduced experimental costs, leading to breakthroughs in the field. Among various generative AI models, diffusion models have yielded the most promising results in protein design. In the past two to three years, more than ten protein design models based on diffusion models have emerged. Among them, the representative model, RFDiffusion, has demonstrated success rates in 25 protein design tasks that far exceed those of traditional methods, and other AI-based approaches like RFjoint and hallucination. This review will systematically examine the application of diffusion models in generating protein backbones and sequences. We will explore the strengths and limitations of different models, summarize successful cases of protein design using diffusion models, and discuss future development directions.


FeedQUAC: Quick Unobtrusive AI-Generated Commentary

arXiv.org Artificial Intelligence

Design thrives on feedback. However, gathering constant feedback throughout the design process can be labor-intensive and disruptive. We explore how AI can bridge this gap by providing effortless, ambient feedback. We introduce FeedQUAC, a design companion that delivers real-time AI-generated commentary from a variety of perspectives through different personas. A design probe study with eight participants highlights how designers can leverage quick yet ambient AI feedback to enhance their creative workflows. Participants highlight benefits such as convenience, playfulness, confidence boost, and inspiration from this lightweight feedback agent, while suggesting additional features, like chat interaction and context curation. We discuss the role of AI feedback, its strengths and limitations, and how to integrate it into existing design workflows while balancing user involvement. Our findings also suggest that ambient interaction is a valuable consideration for both the design and evaluation of future creativity support systems.


Circinus: Efficient Query Planner for Compound ML Serving

arXiv.org Artificial Intelligence

The rise of compound AI serving -- integrating multiple operators in a pipeline that may span edge and cloud tiers -- enables end-user applications such as autonomous driving, generative AI-powered meeting companions, and immersive gaming. Achieving high service goodput -- i.e., meeting service level objectives (SLOs) for pipeline latency, accuracy, and costs -- requires effective planning of operator placement, configuration, and resource allocation across infrastructure tiers. However, the diverse SLO requirements, varying edge capabilities, and high query volumes create an enormous planning search space, rendering current solutions fundamentally limited for real-time serving and cost-efficient deployments. This paper presents Circinus, an SLO-aware query planner for large-scale compound AI workloads. Circinus novelly decomposes multi-query planning and multi-dimensional SLO objectives while preserving global decision quality. By exploiting plan similarities within and across queries, it significantly reduces search steps. It further improves per-step efficiency with a precision-aware plan profiler that incrementally profiles and strategically applies early stopping based on imprecise estimates of plan performance. At scale, Circinus selects query-plan combinations to maximize global SLO goodput. Evaluations in real-world settings show that Circinus improves service goodput by 3.2-5.0$\times$, accelerates query planning by 4.2-5.8$\times$, achieving query response in seconds, while reducing deployment costs by 3.2-4.0$\times$ over state of the arts even in their intended single-tier deployments.


QAOA-GPT: Efficient Generation of Adaptive and Regular Quantum Approximate Optimization Algorithm Circuits

arXiv.org Artificial Intelligence

--Quantum computing has the potential to improve our ability to solve certain optimization problems that are computationally difficult for classical computers, by offering new algorithmic approaches that may provide speedups under specific conditions. In this work, we introduce QAOA-GPT, a generative framework that leverages Generative Pretrained Transformers (GPT) to directly synthesize quantum circuits for solving quadratic unconstrained binary optimization problems, and demonstrate it on the MaxCut problem on graphs. T o diversify the training circuits and ensure their quality, we have generated a synthetic dataset using the adaptive QAOA approach, a method that incrementally builds and optimizes problem-specific circuits. The experiments conducted on a curated set of graph instances demonstrate that QAOA-GPT, generates high quality quantum circuits for new problem instances unseen in the training as well as successfully parametrizes QAOA. Our results show that using QAOA-GPT to generate quantum circuits will significantly decrease both the computational overhead of classical QAOA and adaptive approaches that often use gradient evaluation to generate the circuit and the classical optimization of the circuit parameters. Our work shows that generative AI could be a promising avenue to generate compact quantum circuits in a scalable way. Quantum computing is rapidly emerging technology with significant potential across various domains, including finance [1], chemical simulations [2], material science [3], combinatorial optimization [4], and machine learning [5], among others. V ariational quantum-classical algorithms represent one of the most promising classes of quantum algorithms in different domains, showing potential for both fault-tolerant quantum computers and near-term noisy intermediate-scale quantum (NISQ) devices. The Quantum Approximate Optimization Algorithm (QAOA) [6] and many of its subsequent versions and customizations [7] belong to this class and demonstrate great potential due to their problem/application flexibility and compatibility with various quantum architectures. The original QAOA framework employs a fixed ansatz structure, which can limit expressibility and hinder performance, particularly on near-term quantum devices where circuit depth is limited. This rigid design may not capture the problem-specific features needed for efficient optimization. Such methods as ADAPT -QAOA [8] address this challenge by iteratively constructing the ansatz in a problem-informed manner. At each step, ADAPT -QAOA selects operators from a predefined pool based on their gradient with respect to the cost function, incorporating only those that contribute most significantly to improving the objective.