physical action
Synergetic Empowerment: Wireless Communications Meets Embodied Intelligence
Liang, Hongtao, Diao, Yihe, Wu, YuHang, Zhou, Fuhui, Wu, Qihui
--Wireless communication is evolving into an agent era, where large-scale agents with inherent embodied intelligence are not just users but active participants. The perfect combination of wireless communication and embodied intelligence can achieve a synergetic empowerment and greatly facilitate the development of agent communication. An overview of this synergetic empowerment is presented, framing it as a co-evolutionary process that transforms wireless communication from a simple utility into the digital nervous system of a collective intelligence, while simultaneously elevating isolated agents into a unified superorganism with emergent capabilities far exceeding individual contributions. Furthermore, critical open issues and future research directions are identified. IRELESS communication is evolving into the agent era, marking a fundamental shift from connecting passive information endpoints to enabling massive-scale agent collaboration. Unlike traditional devices, these agents such as autonomous vehicles, industrial robots, and advanced environmental sensors possess inherent embodied intelligence, empowering them to actively perceive, reason, and physically interact with their surroundings [1]. The scale of this transformation is unprecedented. The projections for 2030 estimate that the number of connected IoT devices will reach 125 billion, while monthly global mobile traffic is expected to increase to over 5000 exabytes, representing an 80-fold increase from 2020 [2]. More critically, a growing portion of these devices is the embodied agents that require real-time coordination for complex collective tasks, marking a qualitative shift from isolated sensors to collaborative swarms. Diao and Q. Wu are with the College of Electronic and Information Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, 210000, P .
Bridging Physical and Digital Worlds: Embodied Large AI for Future Wireless Systems
Wang, Xinquan, Zhu, Fenghao, Yang, Zhaohui, Huang, Chongwen, Chen, Xiaoming, Zhang, Zhaoyang, Muhaidat, Sami, Debbah, Mérouane
Large artificial intelligence (AI) models offer revolutionary potential for future wireless systems, promising unprecedented capabilities in network optimization and performance. However, current paradigms largely overlook crucial physical interactions. This oversight means they primarily rely on offline datasets, leading to difficulties in handling real-time wireless dynamics and non-stationary environments. Furthermore, these models often lack the capability for active environmental probing. This paper proposes a fundamental paradigm shift towards wireless embodied large AI (WELAI), moving from passive observation to active embodiment. We first identify key challenges faced by existing models, then we explore the design principles and system structure of WELAI. Besides, we outline prospective applications in next-generation wireless. Finally, through an illustrative case study, we demonstrate the effectiveness of WELAI and point out promising research directions for realizing adaptive, robust, and autonomous wireless systems.
Metareasoning in uncertain environments: a meta-BAMDP framework
Godara, Prakhar, Aléman, Tilman Diego, Yu, Angela J.
In decision-making scenarios, \textit{reasoning} can be viewed as an algorithm $P$ that makes a choice of an action $a^* \in \mathcal{A}$, aiming to optimize some outcome such as maximizing the value function of a Markov decision process (MDP). However, executing $P$ itself may bear some costs (time, energy, limited capacity, etc.) and needs to be considered alongside explicit utility obtained by making the choice in the underlying decision problem. Such costs need to be taken into account in order to accurately model human behavior, as well as optimizing AI planning, as all physical systems are bound to face resource constraints. Finding the right $P$ can itself be framed as an optimization problem over the space of reasoning processes $P$, generally referred to as \textit{metareasoning}. Conventionally, human metareasoning models assume that the agent knows the transition and reward distributions of the underlying MDP. This paper generalizes such models by proposing a meta Bayes-Adaptive MDP (meta-BAMDP) framework to handle metareasoning in environments with unknown reward/transition distributions, which encompasses a far larger and more realistic set of planning problems that humans and AI systems face. As a first step, we apply the framework to two-armed Bernoulli bandit (TABB) tasks, which have often been used to study human decision making. Owing to the meta problem's complexity, our solutions are necessarily approximate, but nevertheless robust within a range of assumptions that are arguably realistic for human decision-making scenarios. These results offer a normative framework for understanding human exploration under cognitive constraints. This integration of Bayesian adaptive strategies with metareasoning enriches both the theoretical landscape of decision-making research and practical applications in designing AI systems that plan under uncertainty and resource constraints.
DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences
Huang, Yidong, Sansom, Jacob, Ma, Ziqiao, Gervits, Felix, Chai, Joyce
Recent advancements in foundation models (FMs) have unlocked new prospects in autonomous driving, yet the experimental settings of these studies are preliminary, over-simplified, and fail to capture the complexity of real-world driving scenarios in human environments. It remains under-explored whether FM agents can handle long-horizon navigation tasks with free-from dialogue and deal with unexpected situations caused by environmental dynamics or task changes. To explore the capabilities and boundaries of FMs faced with the challenges above, we introduce DriVLMe, a video-language-model-based agent to facilitate natural and effective communication between humans and autonomous vehicles that perceive the environment and navigate. We develop DriVLMe from both embodied experiences in a simulated environment and social experiences from real human dialogue. While DriVLMe demonstrates competitive performance in both open-loop benchmarks and closed-loop human studies, we reveal several limitations and challenges, including unacceptable inference time, imbalanced training data, limited visual understanding, challenges with multi-turn interactions, simplified language generation from robotic experiences, and difficulties in handling on-the-fly unexpected situations like environmental dynamics and task changes.
AI enabled RPM for Mental Health Facility
Shaik, Thanveer, Tao, Xiaohui, Higgins, Niall, Xie, Haoran, Gururajan, Raj, Zhou, Xujuan
Mental healthcare is one of the prominent parts of the healthcare industry with alarming concerns related to patients depression, stress leading to self-harm and threat to fellow patients and medical staff. To provide a therapeutic environment for both patients and staff, aggressive or agitated patients need to be monitored remotely and track their vital signs and physical activities continuously. Remote patient monitoring (RPM) using non-invasive technology could enable contactless monitoring of acutely ill patients in a mental health facility. Enabling the RPM system with AI unlocks a predictive environment in which future vital signs of the patients can be forecasted. This paper discusses an AI-enabled RPM system framework with a non-invasive digital technology RFID using its in-built NCS mechanism to retrieve vital signs and physical actions of patients. Based on the retrieved time series data, future vital signs of patients for the upcoming 3 hours and classify their physical actions into 10 labelled physical activities. This framework assists to avoid any unforeseen clinical disasters and take precautionary measures with medical intervention at right time. A case study of a middle-aged PTSD patient treated with the AI-enabled RPM system is demonstrated in this study.
Multiagent Online Planning with Nested Beliefs and Dialogue
Kominis, Filippos (Universitat Pompeu Fabra) | Geffner, Hector (Universitat Pompeu Fabra)
The problem of planning with partial observability in the presence of a single agent has been addressed as a contingent or POMDP problem. Since the task is computationally hard, on-line approaches have also been developed that just compute the action to do next rather than full policies. In this work, we address a similar problem but in a multiagent setting where agents share a common goal and plan with beliefs which are about the world and the possibly nested beliefs of other agents. For this, we extend the belief tracking formulation of Kominis and Geffner to the on-line setting where plans are supposed to work for the true hidden state as revealed by the observations, and develop an alternative translation into classical planning that is used within a plan-execute-observe-and-replan cycle. Planning is done from the perspective of the agents, and there is a single planning agent in each replanning episode that can change across episodes. We present empirical results and show that interesting agent dialogues arise in this setting where agents collaborate by requesting or volunteering information in a goal-directed manner.
Angry Birds as a Challenge for Artificial Intelligence
Renz, Jochen (The Australian National University) | Ge, XiaoYu (Australian National University) | Verma, Rohan (Australian National University) | Zhang, Peng (Australian National University)
The Angry Birds AI Competition (aibirds.org) has been held annually since 2012 in conjunction with some of the major AI conferences, most recently with IJCAI 2015. The goal of the competition is to build AI agents that can play new Angry Birds levels as good as or better than the best human players. Successful agents should be able to quickly analyze new levels and to predict physical consequences of possible actions in order to select actions that solve a given level with a high score. Agents have no access to the game internal physics, but only receive screenshots of the live game. In this paper we describe why this problem is a challenge for AI, and why it is an important step towards building AI that can successfully interact with the real world. We also summarise some highlights of past competitions, including a new competition track we introduced recently.
Diamonds From the Rough: Improving Drawing, Painting, and Singing via Crowdsourcing
Gingold, Yotam (Rutgers University and Columbia University) | Vouga, Etienne (Columbia University) | Grinspun, Eitan (Columbia University) | Hirsh, Haym (Rutgers University)
It is well established that in certain domains, noisy inputs can be reliablycombined to obtain a better answer than any individual.It is now possible to consider the crowdsourcing of physical actions,commonly used for creative expressions such as drawing, shading, and singing.We provide algorithms for converting low-quality inputobtained from the physical actions of a crowd into high-quality output.The inputs take the form of line drawings, shaded images, and songs.We investigate single-individual crowds (multiple inputs from a single human)and multiple-individual crowds.
Belief Revision with Sensing and Fallible Actions
Delgrande, James (Simon Fraser University) | Levesque, Hector J. (University of Toronto)
An agent will generally have incomplete and possibly inaccurate knowledge about its environment. In addition, such an agent may receive erroneous information, perhaps in being misinformed about the truth of some formula. In this paper we present a general approach to reasoning about action and belief change in such a setting. An agent may carry out actions, but in some cases may inadvertently execute the wrong one (for example, pushing an unintended button). As well, an agent may sense whether a condition holds, and may revise its beliefs after being told that a formula is true. Our approach is based on an epistemic extension to basic action theories expressed in the situation calculus, augmented by a plausibility relation over situations. This plausibility relation can be thought of as characterising the agent's overall belief state; as such it keeps track of not just the formulas that the agent believes to hold, but also the plausibility of formulas that it does not believe to hold. The agent's belief state is updated by suitably modifying the plausibility relation following the execution of an action. We show that our account generalises previous approaches, and fully handles belief revision, sensing, and erroneous actions.
On the Progression of Knowledge in the Situation Calculus
Liu, Yongmei (Sun Yat-sen University) | Wen, Ximing (Sun Yat-sen University and Guangdong Institute of Public Administration)
In a seminal paper, Lin and Reiter introduced the notion of progression for basic action theories in the situation calculus. Earlier works by Moore, Scherl and Levesque extended the situation calculus to account for knowledge. In this paper, we study progression of knowledge in the situation calculus. We first adapt the concept of bisimulation from modal logic and extend Lin and Reiter's notion of progression to accommodate knowledge. We show that for physical actions, progression of knowledge reduces to forgetting predicates in first-order modal logic. We identify a class of first-order modal formulas for which forgetting an atom is definable in first-order modal logic. This class of formulas goes beyond formulas without quantifying-in. We also identify a simple case where forgetting a predicate reduces to forgetting a finite number of atoms. Thus we are able to show that for local-effect physical actions, when the initial KB is a formula in this class, progression of knowledge is definable in first-order modal logic. Finally, we extend our results to the multi-agent case.