Generative AI
Protein Design with Dynamic Protein Vocabulary
Liu, Nuowei, Kuang, Jiahao, Liu, Yanting, Ji, Tao, Sun, Changzhi, Lan, Man, Wu, Yuanbin
Protein design is a fundamental challenge in biotechnology, aiming to design novel sequences with specific functions within the vast space of possible proteins. Recent advances in deep generative models have enabled function-based protein design from textual descriptions, yet struggle with structural plausibility. Inspired by classical protein design methods that leverage natural protein structures, we explore whether incorporating fragments from natural proteins can enhance foldability in generative models. Our empirical results show that even random incorporation of fragments improves foldability. Building on this insight, we introduce ProDVa, a novel protein design approach that integrates a text encoder for functional descriptions, a protein language model for designing proteins, and a fragment encoder to dynamically retrieve protein fragments based on textual functional descriptions. Experimental results demonstrate that our approach effectively designs protein sequences that are both functionally aligned and structurally plausible. Compared to state-of-the-art models, ProDVa achieves comparable function alignment using less than 0.04% of the training data, while designing significantly more well-folded proteins, with the proportion of proteins having pLDDT above 70 increasing by 7.38% and those with PAE below 10 increasing by 9.6%.
FedMMKT:Co-Enhancing a Server Text-to-Image Model and Client Task Models in Multi-Modal Federated Learning
He, Ningxin, Liu, Yang, Sun, Wei, Ye, Xiaozhou, Ouyang, Ye, Gao, Tiegang, Zhang, Zehui
Abstract--T ext-to-Image (T2I) models have demonstrated their versatility in a wide range of applications. However, adaptation of T2I models to specialized tasks is often limited by the availability of task-specific data due to privacy concerns. On the other hand, harnessing the power of rich multimodal data from modern mobile systems and IoT infrastructures presents a great opportunity. EXT -to-Image (T2I) models such as GLIDE [1], DALL-E-2 [2], and Stable Diffusion [3] have seen rapid development across various application domains. Recent work in multimodal FL explores the integration of diverse modalities from decentralized clients to train a global multimodal model [25]-[27].
An AI-Based Behavioral Health Safety Filter and Dataset for Identifying Mental Health Crises in Text-Based Conversations
Nelson, Benjamin W., Wong, Celeste, Silvestrini, Matthew T., Shin, Sooyoon, Robinson, Alanna, Lee, Jessica, Yang, Eric, Torous, John, Trister, Andrew
Large language models often mishandle psychiatric emergencies, offering harmful or inappropriate advice and enabling destructive behaviors. This study evaluated the Verily behavioral health safety filter (VBHSF) on two datasets: the Verily Mental Health Crisis Dataset containing 1,800 simulated messages and the NVIDIA Aegis AI Content Safety Dataset subsetted to 794 mental health-related messages. The two datasets were clinician-labelled and we evaluated performance using the clinician labels. Additionally, we carried out comparative performance analyses against two open source, content moderation guardrails: OpenAI Omni Moderation Latest and NVIDIA NeMo Guardrails. The VBHSF demonstrated, well-balanced performance on the Verily Mental Health Crisis Dataset v1.0, achieving high sensitivity (0.990) and specificity (0.992) in detecting any mental health crises. It achieved an F1-score of 0.939, sensitivity ranged from 0.917-0.992, and specificity was >= 0.978 in identifying specific crisis categories. When evaluated against the NVIDIA Aegis AI Content Safety Dataset 2.0, VBHSF performance remained highly sensitive (0.982) and accuracy (0.921) with reduced specificity (0.859). When compared with the NVIDIA NeMo and OpenAI Omni Moderation Latest guardrails, the VBHSF demonstrated superior performance metrics across both datasets, achieving significantly higher sensitivity in all cases (all p < 0.001) and higher specificity relative to NVIDIA NeMo (p < 0.001), but not to OpenAI Omni Moderation Latest (p = 0.094). NVIDIA NeMo and OpenAI Omni Moderation Latest exhibited inconsistent performance across specific crisis types, with sensitivity for some categories falling below 0.10. Overall, the VBHSF demonstrated robust, generalizable performance that prioritizes sensitivity to minimize missed crises, a crucial feature for healthcare applications.
LLM Reasoning for Machine Translation: Synthetic Data Generation over Thinking Tokens
Zebaze, Armel, Bawden, Rachel, Sagot, Benoรฎt
Large reasoning models (LRMs) have led to new possibilities in terms of problem-solving, through the devising of a natural language thought process prior to answering a query. While their capabilities are well known across mathematics and coding tasks, their impact on the task of machine translation (MT) remains under-explored. In this work, we explore the benefits of the generation of intermediate tokens when performing MT across multiple language pairs of different levels of resourcedness and multiple setups. We find that "thinking tokens" do not help LRMs better perform MT. This result generalizes to models fine-tuned to reason before translating using distilled chain of thought (CoT) inspired by human translators' practices. Specifically, fine-tuning a model with synthetic CoT explanations detailing how to translate step-by-step does not outperform standard input-output fine-tuning. Our findings underscore that the contribution of intermediate tokens during fine-tuning highly depends on the presence of translation attempts within them. More broadly, our results suggest that using a teacher to refine target translations or to expand parallel corpora is more impactful than distilling their CoT explanations into "thinking" MT models. Large Language Models (LLMs) are general-purpose problem solvers (Touvron et al., 2023; OpenAI et al., 2024; Dubey et al., 2024; Kimi Team et al., 2025). Their instruction-following capabilities help them carry out a wide variety of requests provided by users via text. Research on alignment, typically through Reinforcement Learning from Human Feedback (RLHF) (Askell et al., 2021; Bai et al., 2022; Ouyang et al., 2022; Rafailov et al., 2023; Lambert et al., 2025) has greatly contributed to improving the quality of LLMs' outputs. Recently, a new paradigm has emerged: to train LLMs to "think" in natural language before answering a query. OpenAI o1 and o3 (OpenAI, 2024), DeepSeek-R1 (DeepSeek-AI et al., 2025), Qwen3 (Y ang et al., 2025), Claude 4 (Anthropic, 2025) and Gemini 2.5 (Gemini Team et al., 2025) inter alia are instances of these Reasoning Models (RM) or Thinking Models (TM).
AgentBuilder: Exploring Scaffolds for Prototyping User Experiences of Interface Agents
Liang, Jenny T., Barik, Titus, Nichols, Jeffrey, Schoop, Eldon, Cheng, Ruijia
Interface agents powered by generative AI models (referred to as "agents") can automate actions based on user commands. An important aspect of developing agents is their user experience (i.e., agent experience). There is a growing need to provide scaffolds for a broader set of individuals beyond AI engineers to prototype agent experiences, since they can contribute valuable perspectives to designing agent experiences. In this work, we explore the affordances agent prototyping systems should offer by conducting a requirements elicitation study with 12 participants with varying experience with agents. We identify key activities in agent experience prototyping and the desired capabilities of agent prototyping systems. We instantiate those capabilities in the AgentBuilder design probe for agent prototyping. We conduct an in situ agent prototyping study with 14 participants using AgentBuilder to validate the design requirements and elicit insights on how developers prototype agents and what their needs are in this process.
Neon: Negative Extrapolation From Self-Training Improves Image Generation
Alemohammad, Sina, Wang, Zhangyang, Baraniuk, Richard G.
Scaling generative AI models is bottlenecked by the scarcity of high-quality training data. The ease of synthesizing from a generative model suggests using (unverified) synthetic data to augment a limited corpus of real data for the purpose of fine-tuning in the hope of improving performance. Unfortunately, however, the resulting positive feedback loop leads to model autophagy disorder (MAD, aka model collapse) that results in a rapid degradation in sample quality and/or diversity. In this paper, we introduce Neon (for Negative Extrapolation frOm self-traiNing), a new learning method that turns the degradation from self-training into a powerful signal for self-improvement. Given a base model, Neon first fine-tunes it on its own self-synthesized data but then, counterintuitively, reverses its gradient updates to extrapolate away from the degraded weights. We prove that Neon works because typical inference samplers that favor high-probability regions create a predictable anti-alignment between the synthetic and real data population gradients, which negative extrapolation corrects to better align the model with the true data distribution. Neon is remarkably easy to implement via a simple post-hoc merge that requires no new real data, works effectively with as few as 1k synthetic samples, and typically uses less than 1% additional training compute. We demonstrate Neon's universality across a range of architectures (diffusion, flow matching, autoregressive, and inductive moment matching models) and datasets (ImageNet, CIFAR-10, and FFHQ). In particular, on ImageNet 256x256, Neon elevates the xAR-L model to a new state-of-the-art FID of 1.02 with only 0.36% additional training compute. Code is available at https://github.com/VITA-Group/Neon
ChatGPT will soon allow erotica for verified adults, says OpenAI boss
OpenAI plans to allow a wider range of content, including erotica, on its popular chatbot ChatGPT as part of its push to treat adult users like adults, says its boss Sam Altman. In a post on X on Tuesday, Mr Altman said upcoming versions of the popular chatbot would enable it to behave in a more human-like way - but only if you want it, not because we are usage maxxing. The move, reminiscent of Elon Musk's xAI recent introduction of two sexually explicit chatbots to Grok, could help OpenAI attract more paying subscribers. It is also likely to intensify pressure on lawmakers to introduce tighter restrictions on chatbot companions. OpenAI did not respond to the BBC's requests for comment following Mr Altman's post.
OpenAI will allow verified adults to use ChatGPT to generate erotic content
The company launched a dedicated ChatGPT experience for under-18 users in September. The company launched a dedicated ChatGPT experience for under-18 users in September. New version will allow users to customize AI assistant's personality in what firm calls'treat adults users like adults' policy OpenAI announced plans on Tuesday to relax restrictions on its ChatGPT chatbot, including allowing erotic content for verified adult users as part of what the company calls a "treat adult users like adults" principle. OpenAI's plan includes the release of an updated version of ChatGPT that will allow users to customize their AI assistant's personality, including options for more human-like responses, heavy emoji use, or friend-like behavior. The most significant change will come in December, when OpenAI plans to roll out more comprehensive age-gating that would permit erotic content for adults who have verified their ages.
'Sovereign AI' Has Become a New Front in the US-China Tech War
'Sovereign AI' Has Become a New Front in the US-China Tech War OpenAI has announced "AI sovereignty partnerships with governments around the world, but can proprietary models compete with Beijing's open source offerings? OpenAI has announced a number of projects this year with foreign governments to help build out what it has called their "sovereign AI" systems. The company says the deals, some of which are being coordinated with the US government, are part of a broader push to give national leaders more control over a technology that could reshape their economies. Over the past few months, sovereign AI has become something of a buzzword in both Washington and Silicon Valley. Proponents of the concept argue it's crucial that AI systems developed in democratic nations are able to proliferate globally, particularly as China races to deploy its own AI technology abroad.
Revisiting Trust in the Era of Generative AI: Factorial Structure and Latent Profiles
Sun, Haocan, Liu, Weizi, Wu, Di, Yu, Guoming, Yao, Mike
Trust is one of the most important factors shaping whether and how people adopt and rely on artificial intelligence (AI). Yet most existing studies measure trust in terms of functionality, focusing on whether a system is reliable, accurate, or easy to use, while giving less attention to the social and emotional dimensions that are increasingly relevant for today's generative AI (GenAI) systems. These systems do not just process information; they converse, respond, and collaborate with users, blurring the line between tool and partner. In this study, we introduce and validate the Human-AI Trust Scale (HAITS), a new measure designed to capture both the rational and relational aspects of trust in GenAI. Drawing on prior trust theories, qualitative interviews, and two waves of large-scale surveys in China and the United States, we used exploratory (n = 1,546) and confirmatory (n = 1,426) factor analyses to identify four key dimensions of trust: Affective Trust, Competence Trust, Benevolence & Integrity, and Perceived Risk. We then applied latent profile analysis to classify users into six distinct trust profiles, revealing meaningful differences in how affective-competence trust and trust-distrust frameworks coexist across individuals and cultures. Our findings offer a validated, culturally sensitive tool for measuring trust in GenAI and provide new insight into how trust evolves in human-AI interaction. By integrating instrumental and relational perspectives of trust, this work lays the foundation for more nuanced research and design of trustworthy AI systems.