Goto

Collaborating Authors

 Pomona



Meet the history-making Nasa astronauts headed for the Moon next year

BBC News

The commander of Nasa's next mission to the Moon said that he and his crew would see things that no human has ever seen. Reid Wiseman told a news conference that it was likely that his spacecraft would fly over large areas of the Moon that previous Apollo missions had never mapped. Yesterday, Nasa announced it hoped it would be able to launch the first crewed Moon mission in 50 years as early as February 2026 . Mission specialist Christina Koch explained that the astronauts would be able to study the lunar surface in exquisite detail for a full three hours. Believe it or not, human eyes are one of the best scientific instruments that we have, she said.



Scaffolding Recursive Divergence and Convergence in Story Ideation

arXiv.org Artificial Intelligence

Human creative ideation involves both exploration of diverse ideas (divergence) and selective synthesis of explored ideas into coherent combinations (convergence). While processes of divergence and convergence are often interleaved and nested, existing AI-powered creativity support tools (CSTs) lack support for sophisticated orchestration of divergence and convergence. We present Reverger, an AI-powered CST that helps users ideate variations of conceptual directions for modifying a story by scaffolding flexible iteration between divergence and convergence. For divergence, our tool enables recursive exploration of alternative high-level directions for modifying a specific part of the original story. For convergence, it allows users to collect explored high-level directions and synthesize them into concrete variations. Users can then iterate between divergence and convergence until they find a satisfactory outcome. A within-subject study revealed that Reverger permitted participants to explore more unexpected and diverse high-level directions than a comparable baseline. Reverger users also felt that they had more fine-grained control and discovered more effort-worthy outcomes.


A Systematic Review of Human-AI Co-Creativity

arXiv.org Artificial Intelligence

The co creativity community is making significant progress in developing more sophisticated and tailored systems to support and enhance human creativity. Design considerations from prior work can serve as a valuable and efficient foundation for future systems. To support this effort, we conducted a systematic literature review of 62 papers on co-creative systems. These papers cover a diverse range of applications, including visual arts, design, and writing, where the AI acts not just as a tool but as an active collaborator in the creative process. From this review, we identified several key dimensions relevant to system design: phase of the creative process, creative task, proactive behavior of the system, user control, system embodiment, and AI model type. Our findings suggest that systems offering high user control lead to greater satisfaction, trust, and a stronger sense of ownership over creative outcomes. Furthermore, proactive systems, when adaptive and context sensitive, can enhance collaboration. We also extracted 24 design considerations, highlighting the value of encouraging users to externalize their thoughts and of increasing the system's social presence and transparency to foster trust. Despite recent advancements, important gaps remain, such as limited support for early creative phases like problem clarification, and challenges related to user adaptation to AI systems.


Clinically Interpretable Mortality Prediction for ICU Patients with Diabetes and Atrial Fibrillation: A Machine Learning Approach

arXiv.org Artificial Intelligence

Background: Patients with both diabetes mellitus (DM) and atrial fibrillation (AF) face elevated mortality in intensive care units (ICUs), yet models targeting this high-risk group remain limited. Objective: To develop an interpretable machine learning (ML) model predicting 28-day mortality in ICU patients with concurrent DM and AF using early-phase clinical data. Methods: A retrospective cohort of 1,535 adult ICU patients with DM and AF was extracted from the MIMIC-IV database. Data preprocessing involved median/mode imputation, z-score normalization, and early temporal feature engineering. A two-step feature selection pipeline-univariate filtering (ANOVA F-test) and Random Forest-based multivariate ranking-yielded 19 interpretable features. Seven ML models were trained with stratified 5-fold cross-validation and SMOTE oversampling. Interpretability was assessed via ablation and Accumulated Local Effects (ALE) analysis. Results: Logistic regression achieved the best performance (AUROC: 0.825; 95% CI: 0.779-0.867), surpassing more complex models. Key predictors included RAS, age, bilirubin, and extubation. ALE plots showed intuitive, non-linear effects such as age-related risk acceleration and bilirubin thresholds. Conclusion: This interpretable ML model offers accurate risk prediction and clinical insights for early ICU triage in patients with DM and AF.


Enhancing Aviation Communication Transcription: Fine-Tuning Distil-Whisper with LoRA

arXiv.org Artificial Intelligence

Transcription of aviation communications has several applications, from assisting air traffic controllers in identifying the accuracy of read-back errors to search and rescue operations. Recent advances in artificial intelligence have provided unprecedented opportunities for improving aviation communication transcription tasks. OpenAI's Whisper is one of the leading automatic speech recognition models. However, fine-tuning Whisper for aviation communication transcription is not computationally efficient. Thus, this paper aims to use a Parameter-Efficient Fine-tuning method called Low-Rank Adaptation to fine-tune a more computationally efficient version of Whisper, distil-Whisper. To perform the fine-tuning, we used the Air Traffic Control Corpus dataset from the Linguistic Data Consortium, which contains approximately 70 hours of controller and pilot transmissions near three major airports in the US. The objective was to reduce the word error rate to enhance accuracy in the transcription of aviation communication. First, starting with an initial set of hyperparameters for LoRA (Alpha = 64 and Rank = 32), we performed a grid search. We applied a 5-fold cross-validation to find the best combination of distil-Whisper hyperparameters. Then, we fine-tuned the model for LoRA hyperparameters, achieving an impressive average word error rate of 3.86% across five folds. This result highlights the model's potential for use in the cockpit.


A Novel Multi-Task Teacher-Student Architecture with Self-Supervised Pretraining for 48-Hour Vasoactive-Inotropic Trend Analysis in Sepsis Mortality Prediction

arXiv.org Artificial Intelligence

Sepsis is a major cause of ICU mortality, where early recognition and effective interventions are essential for improving patient outcomes. However, the vasoactive-inotropic score (VIS) varies dynamically with a patient's hemodynamic status, complicated by irregular medication patterns, missing data, and confounders, making sepsis prediction challenging. To address this, we propose a novel Teacher-Student multitask framework with self-supervised VIS pretraining via a Masked Autoencoder (MAE). The teacher model performs mortality classification and severity-score regression, while the student distills robust time-series representations, enhancing adaptation to heterogeneous VIS data. Compared to LSTM-based methods, our approach achieves an AUROC of 0.82 on MIMIC-IV 3.0 (9,476 patients), outperforming the baseline (0.74). SHAP analysis revealed that SOFA score (0.147) had the greatest impact on ICU mortality, followed by LODS (0.033), single marital status (0.031), and Medicaid insurance (0.023), highlighting the role of sociodemographic factors. SAPSII (0.020) also contributed significantly. These findings suggest that both clinical and social factors should be considered in ICU decision-making. Our novel multitask and distillation strategies enable earlier identification of high-risk patients, improving prediction accuracy and disease management, offering new tools for ICU decision support.


Toyteller: AI-powered Visual Storytelling Through Toy-Playing with Character Symbols

arXiv.org Artificial Intelligence

We introduce Toyteller, an AI-powered storytelling system where users generate a mix of story text and visuals by directly manipulating character symbols like they are toy-playing. Anthropomorphized symbol motions can convey rich and nuanced social interactions; Toyteller leverages these motions (1) to let users steer story text generation and (2) as a visual output format that accompanies story text. We enabled motion-steered text generation and text-steered motion generation by mapping motions and text onto a shared semantic space so that large language models and motion generation models can use it as a translational layer. Technical evaluations showed that Toyteller outperforms a competitive baseline, GPT-4o. Our user study identified that toy-playing helps express intentions difficult to verbalize. However, only motions could not express all user intentions, suggesting combining it with other modalities like language. We discuss the design space of toy-playing interactions and implications for technical HCI research on human-AI interaction.


CtrlNeRF: The Generative Neural Radiation Fields for the Controllable Synthesis of High-fidelity 3D-Aware Images

arXiv.org Artificial Intelligence

The neural radiance field (NERF) advocates learning the continuous representation of 3D geometry through a multilayer perceptron (MLP). By integrating this into a generative model, the generative neural radiance field (GRAF) is capable of producing images from random noise z without 3D supervision. In practice, the shape and appearance are modeled by z_s and z_a, respectively, to manipulate them separately during inference. However, it is challenging to represent multiple scenes using a solitary MLP and precisely control the generation of 3D geometry in terms of shape and appearance. In this paper, we introduce a controllable generative model (i.e. \textbf{CtrlNeRF}) that uses a single MLP network to represent multiple scenes with shared weights. Consequently, we manipulated the shape and appearance codes to realize the controllable generation of high-fidelity images with 3D consistency. Moreover, the model enables the synthesis of novel views that do not exist in the training sets via camera pose alteration and feature interpolation. Extensive experiments were conducted to demonstrate its superiority in 3D-aware image generation compared to its counterparts.