embodied ai
Alexa Arena: A User-Centric Interactive Platform for Embodied AI
We introduce Alexa Arena, a user-centric simulation platform to facilitate research in building assistive conversational embodied agents. Alexa Arena features multi-room layouts and an abundance of interactable objects. With user-friendly graphics and control mechanisms, the platform supports the development of gamified robotic tasks readily accessible to general human users, allowing high-efficiency data collection and EAI system evaluation. Along with the platform, we introduce a dialog-enabled task completion benchmark with online human evaluations.
Generations in Dialogue: Embodied AI, robotics, perception, and action with Professor Roberto Martín-Martín
Generations in Dialogue: Bridging Perspectives in AI is a podcast from AAAI featuring thought-provoking discussions between AI experts, practitioners, and enthusiasts from different age groups and backgrounds. Each episode delves into how generational experiences shape views on AI, exploring the challenges, opportunities, and ethical considerations that come with the advancement of this transformative technology. In the third episode of this new series from AAAI, host Ella Lan chats to Professor Roberto Martín-Martín about taking a screwdriver to his toys as a child, how his research focus has evolved over time, how different generations interact with technology, making robots for everyone, being inspired by colleagues, advice for early-career researchers, and how machines can enhance human capabilities. Roberto Martín-Martín is an Assistant Professor of Computer Science at the University of Texas at Austin, where his research integrates robotics, computer vision, and machine learning to build autonomous agents capable of perceiving, learning, and acting in the real world. He previously worked as an AI Researcher at Salesforce AI and as a Postdoctoral Scholar at the Stanford Vision and Learning Lab with Silvio Savarese and Fei-Fei Li, leading projects in visuomotor learning, mobile manipulation, and human-robot interaction.
- North America > United States > Texas > Travis County > Austin (0.25)
- Europe > Spain > Galicia > Madrid (0.05)
- Research Report > New Finding (0.36)
- Overview (0.36)
Image Quality Assessment for Embodied AI
Li, Chunyi, Xiao, Jiaohao, Zhang, Jianbo, Wen, Farong, Zhang, Zicheng, Tian, Yuan, Zhu, Xiangyang, Liu, Xiaohong, Cheng, Zhengxue, Lin, Weisi, Zhai, Guangtao
Embodied AI has developed rapidly in recent years, but it is still mainly deployed in laboratories, with various distortions in the Real-world limiting its application. Traditionally, Image Quality Assessment (IQA) methods are applied to predict human preferences for distorted images; however, there is no IQA method to assess the usability of an image in embodied tasks, namely, the perceptual quality for robots. To provide accurate and reliable quality indicators for future embodied scenarios, we first propose the topic: IQA for Embodied AI. Specifically, we (1) based on the Mertonian system and meta-cognitive theory, constructed a perception-cognition-decision-execution pipeline and defined a comprehensive subjective score collection process; (2) established the Embodied-IQA database, containing over 36k reference/distorted image pairs, with more than 5m fine-grained annotations provided by Vision Language Models/Vision Language Action-models/Real-world robots; (3) trained and validated the performance of mainstream IQA methods on Embodied-IQA, demonstrating the need to develop more accurate quality indicators for Embodied AI. We sincerely hope that through evaluation, we can promote the application of Embodied AI under complex distortions in the Real-world. Project page: https://github.com/lcysyzxdxc/EmbodiedIQA
- Research Report (0.63)
- Workflow (0.46)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
- Europe > Poland (0.04)
- Information Technology (0.46)
- Leisure & Entertainment > Games (0.46)
PersONAL: Towards a Comprehensive Benchmark for Personalized Embodied Agents
Ziliotto, Filippo, Akkara, Jelin Raphael, Daniele, Alessandro, Ballan, Lamberto, Serafini, Luciano, Campari, Tommaso
Recent advances in Embodied AI have enabled agents to perform increasingly complex tasks and adapt to diverse environments. However, deploying such agents in realistic human-centered scenarios, such as domestic households, remains challenging, particularly due to the difficulty of modeling individual human preferences and behaviors. In this work, we introduce PersONAL (PERSonalized Object Navigation And Localization, a comprehensive benchmark designed to study personalization in Embodied AI. Agents must identify, retrieve, and navigate to objects associated with specific users, responding to natural-language queries such as "find Lily's backpack". PersONAL comprises over 2,000 high-quality episodes across 30+ photorealistic homes from the HM3D dataset. Each episode includes a natural-language scene description with explicit associations between objects and their owners, requiring agents to reason over user-specific semantics. The benchmark supports two evaluation modes: (1) active navigation in unseen environments, and (2) object grounding in previously mapped scenes. Experiments with state-of-the-art baselines reveal a substantial gap to human performance, highlighting the need for embodied agents capable of perceiving, reasoning, and memorizing over personalized information; paving the way towards real-world assistive robot.
Embodied Arena: A Comprehensive, Unified, and Evolving Evaluation Platform for Embodied AI
Ni, Fei, Zhang, Min, Li, Pengyi, Yuan, Yifu, Zhang, Lingfeng, Liu, Yuecheng, Han, Peilong, Kou, Longxin, Ma, Shaojin, Qiao, Jinbin, Bravo, David Gamaliel Arcos, Wang, Yuening, Hu, Xiao, Zhang, Zhanguang, Yao, Xianze, Li, Yutong, Zhang, Zhao, Wen, Ying, Chen, Ying-Cong, Liang, Xiaodan, Lin, Liang, He, Bin, Bou-Ammar, Haitham, Wang, He, Xu, Huazhe, Deng, Jiankang, Luo, Shan, Jiang, Shuqiang, Pan, Wei, Gao, Yang, Zafeiriou, Stefanos, Peters, Jan, Zhuang, Yuzheng, Zhang, Yingxue, Zheng, Yan, Tang, Hongyao, Hao, Jianye
Embodied AI development significantly lags behind large foundation models due to three critical challenges: (1) lack of systematic understanding of core capabilities needed for Embodied AI, making research lack clear objectives; (2) absence of unified and standardized evaluation systems, rendering cross-benchmark evaluation infeasible; and (3) underdeveloped automated and scalable acquisition methods for embodied data, creating critical bottlenecks for model scaling. To address these obstacles, we present Embodied Arena, a comprehensive, unified, and evolving evaluation platform for Embodied AI. Our platform establishes a systematic embodied capability taxonomy spanning three levels (perception, reasoning, task execution), seven core capabilities, and 25 fine-grained dimensions, enabling unified evaluation with systematic research objectives. We introduce a standardized evaluation system built upon unified infrastructure supporting flexible integration of 22 diverse benchmarks across three domains (2D/3D Embodied Q&A, Navigation, Task Planning) and 30+ advanced models from 20+ worldwide institutes. Additionally, we develop a novel LLM-driven automated generation pipeline ensuring scalable embodied evaluation data with continuous evolution for diversity and comprehensiveness. Embodied Arena publishes three real-time leaderboards (Embodied Q&A, Navigation, Task Planning) with dual perspectives (benchmark view and capability view), providing comprehensive overviews of advanced model capabilities. Especially, we present nine findings summarized from the evaluation results on the leaderboards of Embodied Arena. This helps to establish clear research veins and pinpoint critical research problems, thereby driving forward progress in the field of Embodied AI.
Multi-Modal Multi-Task (M3T) Federated Foundation Models for Embodied AI: Potentials and Challenges for Edge Integration
Borazjani, Kasra, Abdisarabshali, Payam, Nadimi, Fardis, Khosravan, Naji, Liwang, Minghui, Wang, Xianbin, Hong, Yiguang, Hosseinalipour, Seyyedali
As embodied AI systems become increasingly multi-modal, personalized, and interactive, they must learn effectively from diverse sensory inputs, adapt continually to user preferences, and operate safely under resource and privacy constraints. These challenges expose a pressing need for machine learning models capable of swift, context-aware adaptation while balancing model generalization and personalization. Here, two methods emerge as suitable candidates, each offering parts of these capabilities: multi-modal multi-task foundation models (M3T-FMs) provide a pathway toward generalization across tasks and modalities, whereas federated learning (FL) offers the infrastructure for distributed, privacy-preserving model updates and user-level model personalization. However, when used in isolation, each of these approaches falls short of meeting the complex and diverse capability requirements of real-world embodied AI environments. In this vision paper, we introduce multi-modal multi-task federated foundation models (M3T-FFMs) for embodied AI, a new paradigm that unifies the strengths of M3T-FMs with the privacy-preserving distributed training nature of FL, enabling intelligent systems at the wireless edge. We collect critical deployment dimensions of M3T-FFMs in embodied AI ecosystems under a unified framework, which we name "EMBODY": Embodiment heterogeneity, Modality richness and imbalance, Bandwidth and compute constraints, On-device continual learning, Distributed control and autonomy, and Yielding safety, privacy, and personalization. For each, we identify concrete challenges and envision actionable research directions. We also present an evaluation framework for deploying M3T-FFMs in embodied AI systems, along with the associated trade-offs. Finally, we present a prototype implementation of M3T-FFMs and evaluate their energy and latency performance.
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Embodied AI in Social Spaces: Responsible and Adaptive Robots in Complex Setting -- UKAIRS 2025 (Copy)
Landowska, Aleksandra, Bergin, Aislinn D Gomez, Abioye, Ayodeji O., Deshmukh, Jayati, Bouadouki, Andriana, Wheadon, Maria, Georgara, Athina, Price, Dominic, Nguyen, Tuyen, Ao, Shuang, Singh, Lokesh, Long, Yi, Miele, Raffaele, Fischer, Joel E., Ramchurn, Sarvapali D.
This paper introduces and overviews a multidisciplinary project aimed at developing responsible and adaptive multi-human multi-robot (MHMR) systems for complex, dynamic settings. The project integrates co-design, ethical frameworks, and multimodal sensing to create AI-driven robots that are emotionally responsive, context-aware, and aligned with the needs of diverse users. We outline the project's vision, methodology, and early outcomes, demonstrating how embodied AI can support sustainable, ethical, and human-centred futures.
- Europe > United Kingdom > England > Nottinghamshire > Nottingham (0.20)
- Europe > United Kingdom > England > Hampshire > Southampton (0.08)
- Europe > United Kingdom > England > Tyne and Wear > Newcastle (0.06)
- (3 more...)
Multimodal Data Storage and Retrieval for Embodied AI: A Survey
Embodied AI (EAI) agents continuously interact with the physical world, generating vast, heterogeneous multimodal data streams that traditional management systems are ill-equipped to handle. In this survey, we first systematically evaluate five storage architectures (Graph Databases, Multi-Model Databases, Data Lakes, Vector Databases, and Time-Series Databases), focusing on their suitability for addressing EAI's core requirements, including physical grounding, low-latency access, and dynamic scalability. We then analyze five retrieval paradigms (Fusion Strategy-Based Retrieval, Representation Alignment-Based Retrieval, Graph-Structure-Based Retrieval, Generation Model-Based Retrieval, and Efficient Retrieval-Based Optimization), revealing a fundamental tension between achieving long-term semantic coherence and maintaining real-time responsiveness. Based on this comprehensive analysis, we identify key bottlenecks, spanning from the foundational Physical Grounding Gap to systemic challenges in cross-modal integration, dynamic adaptation, and open-world generalization. Finally, we outline a forward-looking research agenda encompassing physics-aware data models, adaptive storage-retrieval co-optimization, and standardized benchmarking, to guide future research toward principled data management solutions for EAI. Our survey is based on a comprehensive review of more than 180 related studies, providing a rigorous roadmap for designing the robust, high-performance data management frameworks essential for the next generation of autonomous embodied systems.
- Asia > China > Beijing > Beijing (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > China > Guangdong Province > Guangzhou (0.04)
- Information Technology (0.93)
- Education (0.67)
- Health & Medicine > Therapeutic Area > Neurology (0.46)
Empowering Virtual Agents With Intelligent Systems
While embodied AI is commonly understood as general-purpose intelligence that empowers various forms of robotics,9 we believe that its scope extends significantly beyond robotic platforms alone. Embodied AI, as we define it, refers to intelligent systems capable of learning from and actively interacting with their environments, continuously adapting based on real-time sensor feedback and context-driven decision-making. Specifically, we define Environmental Embodied AI as an intelligent virtual agent capable of real-time perception, learning, and interaction with its surrounding environment through sensor inputs, enabling it to actuate environmental elements, e.g. Distinct from traditional embodied AI systems primarily associated with robotic platforms, Environmental Embodied AI specifically emphasizes non-robotic applications, employing virtual agents to directly influence physical or operational states within environments. These intelligent systems autonomously analyze environmental data, dynamically adapting behaviors to optimize outcomes and significantly reduce ecological footprints, inherently supporting environmentally sustainable practices.