semanticscholar
- North America > United States (0.67)
- Europe > France (0.28)
- Asia > Middle East > Republic of Türkiye (0.14)
- (45 more...)
- Law (0.93)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.67)
- Government > Military (0.67)
- Government > Regional Government > North America Government > United States Government (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Temporal Reasoning (0.51)
- Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.47)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.42)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > Dominican Republic (0.04)
- (11 more...)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > France (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
SEA: Semantic Map Prediction for Active Exploration of Uncertain Areas
Ding, Hongyu, Liang, Xinyue, Fang, Yudong, Wu, You, Shi, Jieqi, Huo, Jing, Li, Wenbin, Wu, Jing, Lai, Yu-Kun, Gao, Yang
In this paper, we propose SEA, a novel approach for active robot exploration through semantic map prediction and a reinforcement learning-based hierarchical exploration policy. Unlike existing learning-based methods that rely on one-step waypoint prediction, our approach enhances the agent's long-term environmental understanding to facilitate more efficient exploration. We propose an iterative prediction-exploration framework that explicitly predicts the missing areas of the map based on current observations. The difference between the actual accumulated map and the predicted global map is then used to guide exploration. Additionally, we design a novel reward mechanism that leverages reinforcement learning to update the long-term exploration strategies, enabling us to construct an accurate semantic map within limited steps. Experimental results demonstrate that our method significantly outperforms state-of-the-art exploration strategies, achieving superior coverage ares of the global map within the same time constraints.
- Asia > China > Jiangsu Province > Nanjing (0.04)
- Europe > United Kingdom (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Research Report > New Finding (0.48)
- Research Report > Promising Solution (0.34)
Fast Functionally Redundant Inverse Kinematics for Robotic Toolpath Optimisation in Manufacturing Tasks
Razjigaev, Andrew, Lohr, Hans, Vargas-Uscategui, Alejandro, King, Peter, Bandyopadhyay, Tirthankar
Abstract--Industrial automation with six-axis robotic arms is critical for many manufacturing tasks, including welding and additive manufacturing applications; however, many of these operations are functionally redundant due to the symmetrical tool axis, which effectively makes the operation a five-axis task. Exploiting this redundancy is crucial for achieving the desired workspace and dexterity required for the feasibility and optimisation of toolpath planning. Inverse kinematics algorithms can solve this in a fast, reactive framework, but these techniques are underutilised over the more computationally expensive offline planning methods. We propose a novel algorithm to solve functionally redundant inverse kinematics for robotic manipulation utilising a task space decomposition approach, the damped least-squares method and Halley's method to achieve fast and robust solutions with reduced joint motion. We evaluate our methodology in the case of toolpath optimisation in a cold spray coating application on a non-planar surface. The functionally redundant inverse kinematics algorithm can quickly solve motion plans that minimise joint motion, expanding the feasible operating space of the complex toolpath.
- Europe > Switzerland > Zürich > Zürich (0.14)
- Oceania > Australia (0.04)
- North America > United States (0.04)
- Europe > Germany (0.04)
PrefGen: Multimodal Preference Learning for Preference-Conditioned Image Generation
Mo, Wenyi, Zhang, Tianyu, Bai, Yalong, Han, Ligong, Ba, Ying, Metaxas, Dimitris N.
Preference-conditioned image generation seeks to adapt generative models to individual users, producing outputs that reflect personal aesthetic choices beyond the given textual prompt. Despite recent progress, existing approaches either fail to capture nuanced user preferences or lack effective mechanisms to encode personalized visual signals. In this work, we propose a multimodal framework that leverages multimodal large language models (MLLMs) to extract rich user representations and inject them into diffusion-based image generation. We train the MLLM with a preference-oriented visual question answering task to capture fine-grained semantic cues. To isolate preference-relevant features, we introduce two complementary probing tasks: inter-user discrimination to distinguish between different users, and intra-user discrimination to separate liked from disliked content. To ensure compatibility with diffusion text encoders, we design a maximum mean discrepancy-based alignment loss that bridges the modality gap while preserving multimodal structure. The resulting embeddings are used to condition the generator, enabling faithful adherence to both prompts and user preferences. Extensive experiments demonstrate that our method substantially outperforms strong baselines in both image quality and preference alignment, highlighting the effectiveness of representation extraction and alignment for personalized generation.
- North America > United States > Washington > King County > Seattle (0.04)
- Africa > Rwanda > Kigali > Kigali (0.04)
- North America > United States > Oregon > Multnomah County > Portland (0.04)
- (8 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Future You: Designing and Evaluating Multimodal AI-generated Digital Twins for Strengthening Future Self-Continuity
Albrecht, Constanze, Archiwaranguprok, Chayapatr, Poonsiriwong, Rachel, Chen, Awu, Yin, Peggy, Lertsutthiwong, Monchai, Winson, Kavin, Hershfield, Hal, Maes, Pattie, Pataranutaporn, Pat
What if users could meet their future selves today? AI-generated future selves simulate meaningful encounters with a digital twin decades in the future. As AI systems advance, combining cloned voices, age-progressed facial rendering, and autobiographical narratives, a central question emerges: Does the modality of these future selves alter their psychological and affective impact? How might a text-based chatbot, a voice-only system, or a photorealistic avatar shape present-day decisions and our feeling of connection to the future? We report a randomized controlled study (N=92) evaluating three modalities of AI-generated future selves (text, voice, avatar) against a neutral control condition. We also report a systematic model evaluation between Claude 4 and three other Large Language Models (LLMs), assessing Claude 4 across psychological and interaction dimensions and establishing conversational AI quality as a critical determinant of intervention effectiveness. All personalized modalities strengthened Future Self-Continuity (FSC), emotional well-being, and motivation compared to control, with avatar producing the largest vividness gains, yet with no significant differences between formats. Interaction quality metrics, particularly persuasiveness, realism, and user engagement, emerged as robust predictors of psychological and affective outcomes, indicating that how compelling the interaction feels matters more than the form it takes. Content analysis found thematic patterns: text emphasized career planning, while voice and avatar facilitated personal reflection. Claude 4 outperformed ChatGPT 3.5, Llama 4, and Qwen 3 in enhancing psychological, affective, and FSC outcomes.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > Virginia (0.04)
- (5 more...)
- Research Report > Strength High (1.00)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study > Negative Result (0.34)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.93)
- Information Technology > Security & Privacy (0.87)
- Health & Medicine > Consumer Health (0.68)
- Education > Educational Setting (0.67)
Simulating Life Paths with Digital Twins: AI-Generated Future Selves Influence Decision-Making and Expand Human Choice
Poonsiriwong, Rachel, Archiwaranguprok, Chayapatr, Albrecht, Constanze, Yin, Peggy, Powdthavee, Nattavudh, Hershfield, Hal, Lertsutthiwong, Monchai, Winson, Kavin, Pataranutaporn, Pat
Major life transitions demand high-stakes decisions, yet people often struggle to imagine how their future selves will live with the consequences. To support this limited capacity for mental time travel, we introduce AI-enabled digital twins that have ``lived through'' simulated life scenarios. Rather than predicting optimal outcomes, these simulations extend prospective cognition by making alternative futures vivid enough to support deliberation without assuming which path is best. We evaluate this idea in a randomized controlled study (N=192) using multimodal synthesis - facial age progression, voice cloning, and large language model dialogue - to create personalized avatars representing participants 30 years forward. Young adults 18 to 28 years old described pending binary decisions and were assigned to guided imagination or one of four avatar conditions: single-option, balanced dual-option, or expanded three-option with a system-generated novel alternative. Results showed asymmetric effects: single-sided avatars increased shifts toward the presented option, while balanced presentation produced movement toward both. Introducing a system-generated third option increased adoption of this new alternative compared to control, suggesting that AI-generated future selves can expand choice by surfacing paths that might otherwise go unnoticed. Participants rated evaluative reasoning and eudaimonic meaning-making as more important than emotional or visual vividness. Perceived persuasiveness and baseline agency predicted decision change. These findings advance understanding of AI-mediated episodic prospection and raise questions about autonomy in AI-augmented decisions.
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > New York (0.04)
- (5 more...)
- Research Report > Strength High (1.00)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Law (1.00)
- Education > Educational Setting > Higher Education (0.68)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.67)
Vector Quantization using Gaussian Variational Autoencoder
Xu, Tongda, Zheng, Wendi, He, Jiajun, Hernandez-Lobato, Jose Miguel, Wang, Yan, Zhang, Ya-Qin, Tang, Jie
V ector quantized variational autoencoder (VQ-V AE) is a discrete auto-encoder that compresses images into discrete tokens. It is difficult to train due to dis-cretization. In this paper, we propose a simple yet effective technique, dubbed Gaussian Quant (GQ), that converts a Gaussian V AE with certain constraint into a VQ-V AE without training. GQ generates random Gaussian noise as a code-book and finds the closest noise to the posterior mean. Theoretically, we prove that when the logarithm of the codebook size exceeds the bits-back coding rate of the Gaussian V AE, a small quantization error is guaranteed. Practically, we propose a heuristic to train Gaussian V AE for effective GQ, named target divergence constraint (TDC). Empirically, we show that GQ outperforms previous VQ-V AEs, such as VQGAN, FSQ, LFQ, and BSQ, on both UNet and ViT architectures. Furthermore, TDC also improves upon previous Gaussian V AE discretization methods, such as TokenBridge. V ector-quantized variational autoencoder (V an Den Oord et al., 2017) is an autoencoder that compresses images into discrete tokens. It is fundamental to autoregressive generative models (Esser et al., 2021; Chang et al., 2022; Y u et al., 2023; Sun et al., 2024b). However, VQ-V AE is difficult to train: the encoding process of VQ-V AE is not differentiable and challenges such as codebook collapse often emerge (Sønderby et al., 2017).
Flash Multi-Head Feed-Forward Network
Zhang, Minshen, Hu, Xiang, Li, Jianguo, Wu, Wei, Tu, Kewei
We explore Multi-Head FFN (MH-FFN) as a replacement of FFN in the Transformer architecture, motivated by the structural similarity between single-head attention and FFN. While multi-head mechanisms enhance expressivity in attention, naively applying them to FFNs faces two challenges: memory consumption scaling with the head count, and an imbalanced ratio between the growing intermediate size and the fixed head dimension as models scale, which degrades scalability and expressive power. To address these challenges, we propose Flash Multi-Head FFN (FlashMHF), with two key innovations: an I/O-aware fused kernel computing outputs online in SRAM akin to FlashAttention, and a design using dynamically weighted parallel sub-networks to maintain a balanced ratio between intermediate and head dimensions. Validated on models from 128M to 1.3B parameters, FlashMHF consistently improves perplexity and downstream task accuracy over SwiGLU FFNs, while reducing peak memory usage by 3-5x and accelerating inference by up to 1.08x. Our work establishes the multi-head design as a superior architectural principle for FFNs, presenting FlashMHF as a powerful, efficient, and scalable alternative to FFNs in Transformers.