Goto

Collaborating Authors

 Generative AI


The Human-AI Hybrid Delphi Model: A Structured Framework for Context-Rich, Expert Consensus in Complex Domains

arXiv.org Artificial Intelligence

Expert consensus plays a critical role in domains where evidence is complex, conflicting, or insufficient for direct prescription. Traditional methods, such as Delphi studies, consensus conferences, and systematic guideline synthesis, offer structure but face limitations including high panel burden, interpretive oversimplification, and suppression of conditional nuance. These challenges are now exacerbated by information overload, fragmentation of the evidence base, and increasing reliance on publicly available sources that lack expert filtering. This study introduces and evaluates a Human-AI Hybrid Delphi (HAH-Delphi) framework designed to augment expert consensus development by integrating a generative AI model (Gemini 2.5 Pro), small panels of senior human experts, and structured facilitation. The HAH-Delphi was tested in three phases: retrospective replication, prospective comparison, and applied deployment in two applied domains (endurance training and resistance and mixed cardio/strength training). The AI replicated 95% of published expert consensus conclusions in Phase I and showed 95% directional agreement with senior human experts in Phase II, though it lacked experiential and pragmatic nuance. In Phase III, compact panels of six senior experts achieved >90% consensus coverage and reached thematic saturation before the final participant. The AI provided consistent, literature-grounded scaffolding that supported divergence resolution and accelerated saturation. The HAH-Delphi framework offers a flexible, scalable approach for generating high-quality, context-sensitive consensus. Its successful application across health, coaching, and performance science confirms its methodological robustness and supports its use as a foundation for generating conditional, personalised guidance and published consensus frameworks at scale.


Deep Generative Models for Discrete Genotype Simulation

arXiv.org Artificial Intelligence

Deep generative models open new avenues for simulating realistic genomic data while preserving privacy and addressing data accessibility constraints. While previous studies have primarily focused on generating gene expression or haplotype data, this study explores generating genotype data in both unconditioned and phenotype-conditioned settings, which is inherently more challenging due to the discrete nature of genotype data. In this work, we developed and evaluated commonly used generative models, including Variational Autoencoders (VAEs), Diffusion Models, and Generative Adversarial Networks (GANs), and proposed adaptation tailored to discrete genotype data. We conducted extensive experiments on large-scale datasets, including all chromosomes from cow and multiple chromosomes from human. Model performance was assessed using a well-established set of metrics drawn from both deep learning and quantitative genetics literature. Our results show that these models can effectively capture genetic patterns and preserve genotype-phenotype association. Our findings provide a comprehensive comparison of these models and offer practical guidelines for future research in genotype simulation. We have made our code publicly available at https://github.com/SihanXXX/DiscreteGenoGen.


Generative Artificial Intelligence in Medical Imaging: Foundations, Progress, and Clinical Translation

arXiv.org Artificial Intelligence

Generative artificial intelligence (AI) is rapidly transforming medical imaging by enabling capabilities such as data synthesis, image enhancement, modality translation, and spatiotemporal modeling. This review presents a comprehensive and forward-looking synthesis of recent advances in generative modeling including generative adversarial networks (GANs), variational autoencoders (VAEs), diffusion models, and emerging multimodal foundation architectures and evaluates their expanding roles across the clinical imaging continuum. We systematically examine how generative AI contributes to key stages of the imaging workflow, from acquisition and reconstruction to cross-modality synthesis, diagnostic support, and treatment planning. Emphasis is placed on both retrospective and prospective clinical scenarios, where generative models help address longstanding challenges such as data scarcity, standardization, and integration across modalities. To promote rigorous benchmarking and translational readiness, we propose a three-tiered evaluation framework encompassing pixel-level fidelity, feature-level realism, and task-level clinical relevance. We also identify critical obstacles to real-world deployment, including generalization under domain shift, hallucination risk, data privacy concerns, and regulatory hurdles. Finally, we explore the convergence of generative AI with large-scale foundation models, highlighting how this synergy may enable the next generation of scalable, reliable, and clinically integrated imaging systems. By charting technical progress and translational pathways, this review aims to guide future research and foster interdisciplinary collaboration at the intersection of AI, medicine, and biomedical engineering.


5G Core Fault Detection and Root Cause Analysis using Machine Learning and Generative AI

arXiv.org Artificial Intelligence

With the advent of 5G networks and technologies, ensuring the integrity and performance of packet core traffic is paramount. During network analysis, test files such as Packet Capture (PCAP) files and log files will contain errors if present in the system that must be resolved for better overall network performance, such as connectivity strength and handover quality. Current methods require numerous person-hours to sort out testing results and find the faults. This paper presents a novel AI/ML-driven Fault Analysis (FA) Engine designed to classify successful and faulty frames in PCAP files, specifically within the 5G packet core. The FA engine analyses network traffic using natural language processing techniques to identify anomalies and inefficiencies, significantly reducing the effort time required and increasing efficiency. The FA Engine also suggests steps to fix the issue using Generative AI via a Large Language Model (LLM) trained on several 5G packet core documents. The engine explains the details of the error from the domain perspective using documents such as the 3GPP standards and user documents regarding the internal conditions of the tests. Test results on the ML models show high classification accuracy on the test dataset when trained with 80-20 splits for the successful and failed PCAP files. Future scopes include extending the AI engine to incorporate 4G network traffic and other forms of network data, such as log text files and multimodal systems.


Estimating Worst-Case Frontier Risks of Open-Weight LLMs

arXiv.org Artificial Intelligence

In this paper, we study the worst-case frontier risks of releasing gpt-oss. We introduce malicious fine-tuning (MFT), where we attempt to elicit maximum capabilities by fine-tuning gpt-oss to be as capable as possible in two domains: biology and cybersecurity. To maximize biological risk (biorisk), we curate tasks related to threat creation and train gpt-oss in an RL environment with web browsing. To maximize cybersecurity risk, we train gpt-oss in an agentic coding environment to solve capture-the-flag (CTF) challenges. We compare these MFT models against open- and closed-weight LLMs on frontier risk evaluations. Compared to frontier closed-weight models, MFT gpt-oss underperforms OpenAI o3, a model that is below Preparedness High capability level for biorisk and cybersecurity. Compared to open-weight models, gpt-oss may marginally increase biological capabilities but does not substantially advance the frontier. Taken together, these results contributed to our decision to release the model, and we hope that our MFT approach can serve as useful guidance for estimating harm from future open-weight releases.


OpenAI Designed GPT-5 to Be Safer. It Still Outputs Gay Slurs

WIRED

OpenAI is trying to make its chatbot less annoying with the release of GPT-5. And I'm not talking about adjustments to its synthetic personality that many users have complained about. Before GPT-5, if the AI tool determined it couldn't answer your prompt because the request violated OpenAI's content guidelines, it would hit you with a curt, canned apology. Now, ChatGPT is adding more explanations. OpenAI's general model spec lays out what is and isn't allowed to be generated.


Elon Musk and Sam Altman's AI Feud Gets Nasty

TIME - Tech

A long-running feud between Elon Musk and Sam Altman spilled out into the open this week as the AI billionaire heavyweights publicly fought over their rival companies. The latest round in the battle between the X CEO and the CEO of OpenAI began when Musk claimed that Apple had been favoring Altman's AI app over his own in the Apple Store rankings. "Apple is behaving in a manner that makes it impossible for any AI company besides OpenAI to reach #1 in the App Store, which is an unequivocal antitrust violation," Musk said on X on Monday evening. "xAI will take immediate legal action," he added, referring to the AI company he leads. "Hey @Apple App Store, why do you refuse to put either X or Grok in your'Must Have' section when X is the #1 news app in the world and Grok is #5 among all apps?" he asked.


GPT-5's modest gains suggest AI progress is slowing down

New Scientist

GPT-5 is the latest version of OpenAI's large language model OpenAI has released its newest AI model, GPT-5, two years after rolling out GPT-4, whose success has driven ChatGPT towards world domination. But despite promises of a similar jump in capability, GPT-5 appears to show little improvement over other leading AI models, hinting that the industry may need a fresh approach to build more intelligent AI systems. OpenAI's own pronouncements hail GPT-5 as a "significant leap in intelligence" from the company's previous models, showing apparent improvements in programming, mathematics, writing, health information and visual understanding. It also promises less frequent hallucinations, which is when an AI presents false information as true. On an internal benchmark measuring "performance on complex, economically valuable knowledge work", OpenAI says GPTโ€‘5 is "comparable to or better than experts in roughly half the casesโ€ฆ across tasks spanning over 40 occupations including law, logistics, sales, and engineering."


Can We Trust AI to Govern AI? Benchmarking LLM Performance on Privacy and AI Governance Exams

arXiv.org Artificial Intelligence

The rapid emergence of large language models (LLMs) has raised urgent questions across the modern workforce about this new technology's strengths, weaknesses, and capabilities. For privacy professionals, the question is whether these AI systems can provide reliable support on regulatory compliance, privacy program management, and AI governance. In this study, we evaluate ten leading open and closed LLMs, including models from OpenAI, Anthropic, Google DeepMind, Meta, and DeepSeek, by benchmarking their performance on industry-standard certification exams: CIPP/US, CIPM, CIPT, and AIGP from the International Association of Privacy Professionals (IAPP). Each model was tested using official sample exams in a closed-book setting and compared to IAPP's passing thresholds. Our findings show that several frontier models such as Gemini 2.5 Pro and OpenAI's GPT-5 consistently achieve scores exceeding the standards for professional human certification - demonstrating substantial expertise in privacy law, technical controls, and AI governance. The results highlight both the strengths and domain-specific gaps of current LLMs and offer practical insights for privacy officers, compliance leads, and technologists assessing the readiness of AI tools for high-stakes data governance roles. This paper provides an overview for professionals navigating the intersection of AI advancement and regulatory risk and establishes a machine benchmark based on human-centric evaluations.


Opening Musical Creativity? Embedded Ideologies in Generative-AI Music Systems

arXiv.org Artificial Intelligence

AI systems for music generation are increasingly common and easy to use, granting people without any musical background the ability to create music. Because of this, generative-AI has been marketed and celebrated as a means of democratizing music making. However, inclusivity often functions as marketable rhetoric rather than a genuine guiding principle in these industry settings. In this paper, we look at four generative-AI music making systems available to the public as of mid-2025 (AIVA, Stable Audio, Suno, and Udio) and track how they are rhetoricized by their developers, and received by users. Our aim is to investigate ideologies that are driving the early-stage development and adoption of generative-AI in music making, with a particular focus on democratization. A combination of autoethnography and digital ethnography is used to examine patterns and incongruities in rhetoric when positioned against product functionality. The results are then collated to develop a nuanced, contextual discussion. The shared ideology we map between producers and consumers is individualist, globalist, techno-liberal, and ethically evasive. It is a 'total ideology' which obfuscates individual responsibility, and through which the nature of music and musical practice is transfigured to suit generative outcomes.