Government
Guardian-regularized Safe Offline Reinforcement Learning for Smart Weaning of Mechanical Circulatory Devices
Tumay, Aysin, Sun, Sophia, Fereidooni, Sonia, Dumas, Aaron, Jortberg, Elise, Yu, Rose
We study the sequential decision-making problem for automated weaning of mechanical circulatory support (MCS) devices in cardiogenic shock patients. MCS devices are percutaneous micro-axial flow pumps that provide left ventricular unloading and forward blood flow, but current weaning strategies vary significantly across care teams and lack data-driven approaches. Offline reinforcement learning (RL) has proven to be successful in sequential decision-making tasks, but our setting presents challenges for training and evaluating traditional offline RL methods: prohibition of online patient interaction, highly uncertain circulatory dynamics due to concurrent treatments, and limited data availability. We developed an end-to-end machine learning framework with two key contributions (1) Clinically-aware OOD-regularized Model-based Policy Optimization (CORMPO), a density-regularized offline RL algorithm for out-of-distribution suppression that also incorporates clinically-informed reward shaping and (2) a Transformer-based probabilistic digital twin that models MCS circulatory dynamics for policy evaluation with rich physiological and clinical metrics. We prove that \textsf{CORMPO} achieves theoretical performance guarantees under mild assumptions. CORMPO attains a higher reward than the offline RL baselines by 28% and higher scores in clinical metrics by 82.6% on real and synthetic datasets. Our approach offers a principled framework for safe offline policy learning in high-stakes medical applications where domain expertise and safety constraints are essential.
Forecasting Thermospheric Density with Transformers for Multi-Satellite Orbit Management
Bรถs, Cedric, Bortotto, Alessandro, Ben-Larbi, Mohamed Khalil
Accurate thermospheric density prediction is crucial for reliable satellite operations in Low Earth Orbits, especially at high solar and geomagnetic activity. Physics-based models such as TIE-GCM offer high fidelity but are computationally expensive, while empirical models like NRLMSIS are efficient yet lack predictive power. This work presents a transformer-based model that forecasts densities up to three days ahead and is intended as a drop-in replacement for an empirical baseline. Unlike recent approaches, it avoids spatial reduction and complex input pipelines, operating directly on a compact input set. Validated on real-world data, the model improves key prediction metrics and shows potential to support mission planning.
Visual Exploration of Feature Relationships in Sparse Autoencoders with Curated Concepts
Yan, Xinyuan, Liu, Shusen, Thopalli, Kowshik, Wang, Bei
Sparse autoencoders (SAEs) have emerged as a powerful tool for uncovering interpretable features in large language models (LLMs) through the sparse directions they learn. However, the sheer number of extracted directions makes comprehensive exploration intractable. While conventional embedding techniques such as UMAP can reveal global structure, they suffer from limitations including high-dimensional compression artifacts, overplotting, and misleading neighborhood distortions. In this work, we propose a focused exploration framework that prioritizes curated concepts and their corresponding SAE features over attempts to visualize all available features simultaneously. We present an interactive visualization system that combines topology-based visual encoding with dimensionality reduction to faithfully represent both local and global relationships among selected features. This hybrid approach enables users to investigate SAE behavior through targeted, interpretable subsets, facilitating deeper and more nuanced analysis of concept representation in latent space.
Advancing Ocean State Estimation with efficient and scalable AI
Xiang, Yanfei, Gao, Yuan, Wu, Hao, Zhang, Quan, Shu, Ruiqi, Zhou, Xiao, Wu, Xi, Huang, Xiaomeng
Accurate and efficient global ocean state estimation remains a grand challenge for Earth system science, hindered by the dual bottlenecks of computational scalability and degraded data fidelity in traditional data assimilation (DA) and deep learning (DL) approaches. Here we present an AI-driven Data Assimilation Framework for Ocean (ADAF-Ocean) that directly assimilates multi-source and multi-scale observations, ranging from sparse in-situ measurements to 4 km satellite swaths, without any interpolation or data thinning. Inspired by Neural Processes, ADAF-Ocean learns a continuous mapping from heterogeneous inputs to ocean states, preserving native data fidelity. Through AI-driven super-resolution, it reconstructs 0.25$^\circ$ mesoscale dynamics from coarse 1$^\circ$ fields, which ensures both efficiency and scalability, with just 3.7\% more parameters than the 1$^\circ$ configuration. When coupled with a DL forecasting system, ADAF-Ocean extends global forecast skill by up to 20 days compared to baselines without assimilation. This framework establishes a computationally viable and scientifically rigorous pathway toward real-time, high-resolution Earth system monitoring.
Towards a Humanized Social-Media Ecosystem: AI-Augmented HCI Design Patterns for Safety, Agency & Well-Being
Ameen, Mohd Ruhul, Islam, Akif
Social platforms connect billions of people, yet their engagement-first algorithms often work on users rather than with them, amplifying stress, misinformation, and a loss of control. We propose Human-Layer AI (HL-AI)--user-owned, explainable intermediaries that sit in the browser between platform logic and the interface. HL-AI gives people practical, moment-to-moment control without requiring platform cooperation. We contribute a working Chrome/Edge prototype implementing five representative pattern frameworks--Context-Aware Post Rewriter, Post Integrity Meter, Granular Feed Curator, Micro-Withdrawal Agent, and Recovery Mode--alongside a unifying mathematical formulation balancing user utility, autonomy costs, and risk thresholds. Evaluation spans technical accuracy, usability, and behavioral outcomes. The result is a suite of humane controls that help users rewrite before harm, read with integrity cues, tune feeds with intention, pause compulsive loops, and seek shelter during harassment, all while preserving agency through explanations and override options. This prototype offers a practical path to retrofit today's feeds with safety, agency, and well-being, inviting rigorous cross-cultural user evaluation.
When AI Meets the Web: Prompt Injection Risks in Third-Party AI Chatbot Plugins
Kaya, Yigitcan, Landerer, Anton, Pletinckx, Stijn, Zimmermann, Michelle, Kruegel, Christopher, Vigna, Giovanni
Prompt injection attacks pose a critical threat to large language models (LLMs), with prior work focusing on cutting-edge LLM applications like personal copilots. In contrast, simpler LLM applications, such as customer service chatbots, are widespread on the web, yet their security posture and exposure to such attacks remain poorly understood. These applications often rely on third-party chatbot plugins that act as intermediaries to commercial LLM APIs, offering non-expert website builders intuitive ways to customize chatbot behaviors. To bridge this gap, we present the first large-scale study of 17 third-party chatbot plugins used by over 10,000 public websites, uncovering previously unknown prompt injection risks in practice. First, 8 of these plugins (used by 8,000 websites) fail to enforce the integrity of the conversation history transmitted in network requests between the website visitor and the chatbot. This oversight amplifies the impact of direct prompt injection attacks by allowing adversaries to forge conversation histories (including fake system messages), boosting their ability to elicit unintended behavior (e.g., code generation) by 3 to 8x. Second, 15 plugins offer tools, such as web-scraping, to enrich the chatbot's context with website-specific content. However, these tools do not distinguish the website's trusted content (e.g., product descriptions) from untrusted, third-party content (e.g., customer reviews), introducing a risk of indirect prompt injection. Notably, we found that ~13% of e-commerce websites have already exposed their chatbots to third-party content. We systematically evaluate both vulnerabilities through controlled experiments grounded in real-world observations, focusing on factors such as system prompt design and the underlying LLM. Our findings show that many plugins adopt insecure practices that undermine the built-in LLM safeguards.
Lived Experience in Dialogue: Co-designing Personalization in Large Language Models to Support Youth Mental Well-being
Guan, Kathleen W., Giri, Sarthak, Amara, Mohammed, Jansen, Bernard J., Liscio, Enrico, Esherick, Milena, Owayyed, Mohammed Al, Ratkute, Ausrine, Sedrakyan, Gayane, de Reuver, Mark, Goncalves, Joao Fernando Ferreira, Figueroa, Caroline A.
We conducted three 90 - minute workshops at Talenthub Op Zuid, each with a different group of participants (total N=24, MAge =17.6, SD=1.2, see S upplement for additional details). In the first workshop, participants reviewed the prior 13 personas from Stage 1 and critiqued them for gaps in relevance. The scoping personas generated from survey and forum data gave youth stakeholders a concrete starting point for consulting as experts by experience in initial co - design activities. They challenged the realism of the scoping personas . Using fill - in - the - blank templates to guide but not restrict their persona creation (created by a youth member of the research team with design training, see Supplement), youth added contextual details to the project personas, such as daily routines, stressors, and digital habits, and brainstormed plausible backstories involving bullying, school difficulties, or parental conflict. The second workshop engaged a new participant group who expanded on previous outputs and addressed additional questions on living environment and emotional support needs, as this was suggested as relevant by youth from the prior workshop . Participants revised or created new personas b ased on their own or peers' experiences. In t he third workshop, a new group of participants again reviewed prior co - creation and outputs and further refined the personas .
VMDT: Decoding the Trustworthiness of Video Foundation Models
Potter, Yujin, Wang, Zhun, Crispino, Nicholas, Montgomery, Kyle, Xiong, Alexander, Chang, Ethan Y., Pinto, Francesco, Chen, Yuqi, Gupta, Rahul, Ziyadi, Morteza, Christodoulopoulos, Christos, Li, Bo, Wang, Chenguang, Song, Dawn
As foundation models become more sophisticated, ensuring their trustworthiness becomes increasingly critical; yet, unlike text and image, the video modality still lacks comprehensive trustworthiness benchmarks. We introduce VMDT (Video-Modal DecodingTrust), the first unified platform for evaluating text-to-video (T2V) and video-to-text (V2T) models across five key trustworthiness dimensions: safety, hallucination, fairness, privacy, and adversarial robustness. Through our extensive evaluation of 7 T2V models and 19 V2T models using VMDT, we uncover several significant insights. For instance, all open-source T2V models evaluated fail to recognize harmful queries and often generate harmful videos, while exhibiting higher levels of unfairness compared to image modality models. In V2T models, unfairness and privacy risks rise with scale, whereas hallucination and adversarial robustness improve -- though overall performance remains low. Uniquely, safety shows no correlation with model size, implying that factors other than scale govern current safety levels. Our findings highlight the urgent need for developing more robust and trustworthy video foundation models, and VMDT provides a systematic framework for measuring and tracking progress toward this goal. The code is available at https://sunblaze-ucb.github.io/VMDT-page/.
wa-hls4ml: A Benchmark and Surrogate Models for hls4ml Resource and Latency Estimation
Hawks, Benjamin, Weitz, Jason, Demler, Dmitri, Tame-Narvaez, Karla, Plotnikov, Dennis, Rahimifar, Mohammad Mehdi, Rahali, Hamza Ezzaoui, Therrien, Audrey C., Sproule, Donovan, Khoda, Elham E, Smith, Keegan A., Marroquin, Russell, Di Guglielmo, Giuseppe, Tran, Nhan, Duarte, Javier, Loncar, Vladimir
As machine learning (ML) is increasingly implemented in hardware to address real-time challenges in scientific applications, the development of advanced toolchains has significantly reduced the time required to iterate on various designs. These advancements have solved major obstacles, but also exposed new challenges. For example, processes that were not previously considered bottlenecks, such as hardware synthesis, are becoming limiting factors in the rapid iteration of designs. To mitigate these emerging constraints, multiple efforts have been undertaken to develop an ML-based surrogate model that estimates resource usage of ML accelerator architectures. We introduce wa-hls4ml, a benchmark for ML accelerator resource and latency estimation, and its corresponding initial dataset of over 680,000 fully connected and convolutional neural networks, all synthesized using hls4ml and targeting Xilinx FPGAs. The benchmark evaluates the performance of resource and latency predictors against several common ML model architectures, primarily originating from scientific domains, as exemplar models, and the average performance across a subset of the dataset. Additionally, we introduce GNN- and transformer-based surrogate models that predict latency and resources for ML accelerators. We present the architecture and performance of the models and find that the models generally predict latency and resources for the 75% percentile within several percent of the synthesized resources on the synthetic test dataset.
Who Evaluates AI's Social Impacts? Mapping Coverage and Gaps in First and Third Party Evaluations
Reuel, Anka, Ghosh, Avijit, Chim, Jenny, Tran, Andrew, Long, Yanan, Mickel, Jennifer, Gohar, Usman, Yadav, Srishti, Ammanamanchi, Pawan Sasanka, Allaham, Mowafak, Rahmani, Hossein A., Akhtar, Mubashara, Friedrich, Felix, Scholz, Robert, Riegler, Michael Alexander, Batzner, Jan, Habba, Eliya, Saxena, Arushi, Kornilova, Anastassia, Wei, Kevin, Soni, Prajna, Mathew, Yohan, Klyman, Kevin, Sania, Jeba, Sahoo, Subramanyam, Bruvik, Olivia Beyer, Sadeghi, Pouya, Goswami, Sujata, Wang, Angelina, Jernite, Yacine, Talat, Zeerak, Biderman, Stella, Kochenderfer, Mykel, Koyejo, Sanmi, Solaiman, Irene
Foundation models are increasingly central to high-stakes AI systems, and governance frameworks now depend on evaluations to assess their risks and capabilities. Although general capability evaluations are widespread, social impact assessments covering bias, fairness, privacy, environmental costs, and labor practices remain uneven across the AI ecosystem. To characterize this landscape, we conduct the first comprehensive analysis of both first-party and third-party social impact evaluation reporting across a wide range of model developers. Our study examines 186 first-party release reports and 183 post-release evaluation sources, and complements this quantitative analysis with interviews of model developers. We find a clear division of evaluation labor: first-party reporting is sparse, often superficial, and has declined over time in key areas such as environmental impact and bias, while third-party evaluators including academic researchers, nonprofits, and independent organizations provide broader and more rigorous coverage of bias, harmful content, and performance disparities. However, this complementarity has limits. Only model developers can authoritatively report on data provenance, content moderation labor, financial costs, and training infrastructure, yet interviews reveal that these disclosures are often deprioritized unless tied to product adoption or regulatory compliance. Our findings indicate that current evaluation practices leave major gaps in assessing AI's societal impacts, highlighting the urgent need for policies that promote developer transparency, strengthen independent evaluation ecosystems, and create shared infrastructure to aggregate and compare third-party evaluations in a consistent and accessible way.