Generative AI
Generative Artificial Intelligence in Bioinformatics: A Systematic Review of Models, Applications, and Methodological Advances
Alvi, Riasad, Zaman, Sayeem Been, Karim, Wasimul, Abian, Arefin Ittesafun, Raiaan, Mohaimenul Azam Khan, Mukta, Saddam, Rashid, Md Rafi Ur, Islam, Md Rafiqul, Sebastian, Yakub, Azam, Sami
Generative artificial intelligence (GenAI) has become a transformative approach in bioinformatics that often enables advancements in genomics, proteomics, transcriptomics, structural biology, and drug discovery. To systematically identify and evaluate these growing developments, this review proposed six research questions (RQs), according to the preferred reporting items for systematic reviews and meta-analysis methods. The objective is to evaluate impactful GenAI strategies in methodological advancement, predictive performance, and specialization, and to identify promising approaches for advanced modeling, data-intensive discovery, and integrative biological analysis. RQ1 highlights diverse applications across multiple bioinformatics subfields (sequence analysis, molecular design, and integrative data modeling), which demonstrate superior performance over traditional methods through pattern recognition and output generation. RQ2 reveals that adapted specialized model architectures outperformed general-purpose models, an advantage attributed to targeted pretraining and context-aware strategies. RQ3 identifies significant benefits in the bioinformatics domains, focusing on molecular analysis and data integration, which improves accuracy and reduces errors in complex analysis. RQ4 indicates improvements in structural modeling, functional prediction, and synthetic data generation, validated by established benchmarks. RQ5 suggests the main constraints, such as the lack of scalability and biases in data that impact generalizability, and proposes future directions focused on robust evaluation and biologically grounded modeling. RQ6 examines that molecular datasets (such as UniProtKB and ProteinNet12), cellular datasets (such as CELLxGENE and GTEx) and textual resources (such as PubMedQA and OMIM) broadly support the training and generalization of GenAI models.
Comparing the Performance of LLMs in RAG-based Question-Answering: A Case Study in Computer Science Literature
Dayarathne, Ranul, Ranaweera, Uvini, Ganegoda, Upeksha
Retrieval Augmented Generation (RAG) is emerging as a powerful technique to enhance the capabilities of Generative AI models by reducing hallucination. Thus, the increasing prominence of RAG alongside Large Language Models (LLMs) has sparked interest in comparing the performance of different LLMs in question-answering (QA) in diverse domains. This study compares the performance of four open-source LLMs, Mistral-7b-instruct, LLaMa2-7b-chat, Falcon-7b-instruct and Orca-mini-v3-7b, and OpenAI's trending GPT-3.5 over QA tasks within the computer science literature leveraging RAG support. Evaluation metrics employed in the study include accuracy and precision for binary questions and ranking by a human expert, ranking by Google's AI model Gemini, alongside cosine similarity for long-answer questions. GPT-3.5, when paired with RAG, effectively answers binary and long-answer questions, reaffirming its status as an advanced LLM. Regarding open-source LLMs, Mistral AI's Mistral-7b-instruct paired with RAG surpasses the rest in answering both binary and long-answer questions. However, among the open-source LLMs, Orca-mini-v3-7b reports the shortest average latency in generating responses, whereas LLaMa2-7b-chat by Meta reports the highest average latency. This research underscores the fact that open-source LLMs, too, can go hand in hand with proprietary models like GPT-3.5 with better infrastructure.
Academics and Generative AI: Empirical and Epistemic Indicators of Policy-Practice Voids
As generative AI diffuses through academia, policy-practice divergence becomes consequential, creating demand for auditable indicators of alignment. This study prototypes a ten-item, indirect-elicitation instrument embedded in a structured interpretive framework to surface voids between institutional rules and practitioner AI use. The framework extracts empirical and epistemic signals from academics, yielding three filtered indicators of such voids: (1) AI-integrated assessment capacity (proxy) - within a three-signal screen (AI skill, perceived teaching benefit, detection confidence), the share who would fully allow AI in exams; (2) sector-level necessity (proxy) - among high output control users who still credit AI with high contribution, the proportion who judge AI capable of challenging established disciplines; and (3) ontological stance - among respondents who judge AI different in kind from prior tools, report practice change, and pass a metacognition gate, the split between material and immaterial views as an ontological map aligning procurement claims with evidence classes.
Evaluating Generative AI as an Educational Tool for Radiology Resident Report Drafting
Verdone, Antonio, Cardall, Aidan, Siddiqui, Fardeen, Nashawaty, Motaz, Rigau, Danielle, Kwon, Youngjoon, Yousef, Mira, Patel, Shalin, Kieturakis, Alex, Kim, Eric, Heacock, Laura, Reig, Beatriu, Shen, Yiqiu
Objective: Radiology residents require timely, personalized feedback to develop accurate image analysis and reporting skills. Increasing clinical workload often limits attendings' ability to provide guidance. This study evaluates a HIPAA-compliant GPT-4o system that delivers automated feedback on breast imaging reports drafted by residents in real clinical settings. Methods: We analyzed 5,000 resident-attending report pairs from routine practice at a multi-site U.S. health system. GPT-4o was prompted with clinical instructions to identify common errors and provide feedback. A reader study using 100 report pairs was conducted. Four attending radiologists and four residents independently reviewed each pair, determined whether predefined error types were present, and rated GPT-4o's feedback as helpful or not. Agreement between GPT and readers was assessed using percent match. Inter-reader reliability was measured with Krippendorff's alpha. Educational value was measured as the proportion of cases rated helpful. Results: Three common error types were identified: (1) omission or addition of key findings, (2) incorrect use or omission of technical descriptors, and (3) final assessment inconsistent with findings. GPT-4o showed strong agreement with attending consensus: 90.5%, 78.3%, and 90.4% across error types. Inter-reader reliability showed moderate variability (ฮฑ = 0.767, 0.595, 0.567), and replacing a human reader with GPT-4o did not significantly affect agreement (ฮ = -0.004 to 0.002). GPT's feedback was rated helpful in most cases: 89.8%, 83.0%, and 92.0%. Discussion: ChatGPT-4o can reliably identify key educational errors. It may serve as a scalable tool to support radiology education.
Erasing 'Ugly' from the Internet: Propagation of the Beauty Myth in Text-Image Models
Dinkar, Tanvi, Jiang, Aiqi, Abercrombie, Gavin, Konstas, Ioannis
Social media has exacerbated the promotion of Western beauty norms, leading to negative self-image, particularly in women and girls, and causing harm such as body dysmorphia. Increasingly content on the internet has been artificially generated, leading to concerns that these norms are being exaggerated. The aim of this work is to study how generative AI models may encode 'beauty' and erase 'ugliness', and discuss the implications of this for society. To investigate these aims, we create two image generation pipelines: a text-to-image model and a text-to-language model-to image model. We develop a structured beauty taxonomy which we use to prompt three language models (LMs) and two text-to-image models to cumulatively generate 5984 images using our two pipelines. We then recruit women and non-binary social media users to evaluate 1200 of the images through a Likert-scale within-subjects study. Participants show high agreement in their ratings. Our results show that 86.5% of generated images depicted people with lighter skin tones, 22% contained explicit content despite Safe for Work (SFW) training, and 74% were rated as being in a younger age demographic. In particular, the images of non-binary individuals were rated as both younger and more hypersexualised, indicating troubling intersectional effects. Notably, prompts encoded with 'negative' or 'ugly' beauty traits (such as "a wide nose") consistently produced higher Not SFW (NSFW) ratings regardless of gender. This work sheds light on the pervasive demographic biases related to beauty standards present in generative AI models -- biases that are actively perpetuated by model developers, such as via negative prompting. We conclude by discussing the implications of this on society, which include pollution of the data streams and active erasure of features that do not fall inside the stereotype of what is considered beautiful by developers.
ForTIFAI: Fending Off Recursive Training Induced Failure for AI Model Collapse
Shabgahi, Soheil Zibakhsh, Aghazadeh, Pedram, Mirhoseini, Azalia, Koushanfar, Farinaz
The increasing reliance on generative AI models is rapidly increasing the volume of synthetic data, with some projections suggesting that most available new data for training could be machine-generated by 2030 Gartner, Inc. (2022). This shift to a mainly synthetic content presents a critical challenge: repeated training in synthetic data leads to a phenomenon known as model collapse, where model performance degrades over generations of training, eventually rendering the models ineffective. While the causes of model collapse are increasingly understood, effective mitigation strategies remain scarce. We address this challenge by leveraging a key insight: auto-regressive models tend to generate text sequences to which they assign high confidence (i.e., high log-likelihood). Based on this observation, we introduce the Truncated-Cross-Entropy (TCE) loss function. Our experiments demonstrate that models trained with TCE not only learn effectively but also exhibit significantly increased resilience, tolerating over 2.3 more synthetic data before the onset of collapse. In addition, we provide an open-source benchmark for collapse dynamics in mixed-data settings. Our results demonstrate that confidence-aware training objectives can substantially delay collapse onset, offering a practical and generalizable tool for model robustness under synthetic-data exposure. Generative models have become the foundation for modern AI applications in several modalities, including text, image, code, and audio. Large Language Models (LLMs) such as ChatGPT (OpenAI et al., 2024), LLaMA (Grattafiori et al., 2024) and Gemma (Team et al., 2025), as well as image generators DALL-E (Ramesh et al., 2021) and Imagen (Saharia et al., 2022), all rely on large datasets scraped from the Web. As these models are continuously updated to reflect recent knowledge and linguistic patterns, the need for ever larger and frequently refreshed training corpora has grown substantially. However, this demand is colliding with a shift in the data landscape: synthetic content is increasingly populating the Internet, contaminating the very datasets used for model training. This shift raises fundamental concerns.
"Accessibility people, you go work on that thing of yours over there": Addressing Disability Inclusion in AI Product Organizations
Moharana, Sanika, Bennett, Cynthia L., Buehler, Erin, Madaio, Michael, Tibdewal, Vinita, Kane, Shaun K.
The rapid emergence of generative AI has changed the way that technology is designed, constructed, maintained, and evaluated. Decisions made when creating AI-powered systems may impact some users disproportionately, such as people with disabilities. In this paper, we report on an interview study with 25 AI practitioners across multiple roles (engineering, research, UX, and responsible AI) about how their work processes and artifacts may impact end users with disabilities. We found that practitioners experienced friction when triaging problems at the intersection of responsible AI and accessibility practices, navigated contradictions between accessibility and responsible AI guidelines, identified gaps in data about users with disabilities, and gathered support for addressing the needs of disabled stakeholders by leveraging informal volunteer and community groups within their company. Based on these findings, we offer suggestions for new resources and process changes to better support people with disabilities as end users of AI.
SoftBank chases actual revenue with OpenAI in corporate Japan
SoftBank Group's Japanese mobile unit and OpenAI are set to launch AI services for local companies next year. SoftBank Group's Japanese mobile unit and OpenAI will launch AI services for local companies next year, seeking to realize real revenue in the face of growing concerns over sky-high valuations. SoftBank Corp. and Open AI are still fine-tuning the products the two companies are co-developing for Japanese enterprises, said Junichi Miyakawa, president of the country's third-largest mobile carrier. Miyakawa said he has seen a test version of the services, which once launched would "completely change" the speed in which business is done. One feature is voice recognition that would allow users to rely less on manual typing, he said.
Deep Generative Models for Enhanced Vitreous OCT Imaging
Sarrocco, Simone, Cattin, Philippe C., Maloca, Peter M., Friedrich, Paul, Valmaggia, Philippe
Purpose: To evaluate deep learning (DL) models for enhancing vitreous optical coherence tomography (OCT) image quality and reducing acquisition time. Methods: Conditional Denoising Diffusion Probabilistic Models (cDDPMs), Brownian Bridge Diffusion Models (BBDMs), U-Net, Pix2Pix, and Vector-Quantised Generative Adversarial Network (VQ-GAN) were used to generate high-quality spectral-domain (SD) vitreous OCT images. Inputs were SD ART10 images, and outputs were compared to pseudoART100 images obtained by averaging ten ART10 images per eye location. Model performance was assessed using image quality metrics and Visual Turing Tests, where ophthalmologists ranked generated images and evaluated anatomical fidelity. The best model's performance was further tested within the manually segmented vitreous on newly acquired data. Results: U-Net achieved the highest Peak Signal-to-Noise Ratio (PSNR: 30.230) and Structural Similarity Index Measure (SSIM: 0.820), followed by cDDPM. For Learned Perceptual Image Patch Similarity (LPIPS), Pix2Pix (0.697) and cDDPM (0.753) performed best. In the first Visual Turing Test, cDDPM ranked highest (3.07); in the second (best model only), cDDPM achieved a 32.9% fool rate and 85.7% anatomical preservation. On newly acquired data, cDDPM generated vitreous regions more similar in PSNR to the ART100 reference than true ART1 or ART10 B-scans and achieved higher PSNR on whole images when conditioned on ART1 than ART10. Conclusions: Results reveal discrepancies between quantitative metrics and clinical evaluation, highlighting the need for combined assessment. cDDPM showed strong potential for generating clinically meaningful vitreous OCT images while reducing acquisition time fourfold. Translational Relevance: cDDPMs show promise for clinical integration, supporting faster, higher-quality vitreous imaging. Dataset and code will be made publicly available.
A Non-Adversarial Approach to Idempotent Generative Modelling
Al-Jaff, Mohammed, Marchetti, Giovanni Luca, Welle, Michael C, Lundell, Jens, Gustafsson, Mats G., Henter, Gustav Eje, Azizpour, Hossein, Kragic, Danica
Idempotent Generative Networks (IGNs) are deep generative models that also function as local data manifold projectors, mapping arbitrary inputs back onto the manifold. They are trained to act as identity operators on the data and as idempotent operators off the data manifold. However, IGNs suffer from mode collapse, mode dropping, and training instability due to their objectives, which contain adversarial components and can cause the model to cover the data manifold only partially -- an issue shared with generative adversarial networks. We introduce Non-Adversarial Idempotent Generative Networks (NAIGNs) to address these issues. Our loss function combines reconstruction with the non-adversarial generative objective of Implicit Maximum Likelihood Estimation (IMLE). This improves on IGN's ability to restore corrupted data and generate new samples that closely match the data distribution. We moreover demonstrate that NAIGNs implicitly learn the distance field to the data manifold, as well as an energy-based model.