Goto

Collaborating Authors

 gom


Toward General Object-level Mapping from Sparse Views with 3D Diffusion Priors

arXiv.org Artificial Intelligence

Object-level mapping [1, 2, 3, 4, 5, 6, 7, 8, 9] builds a 3D map of multiple object instances in a scene, which is critical for scene understanding [10] and has various applications in robotic manipulation [11], semantic navigation [12, 13] and long-term dynamic map maintenance [14]. It addresses two closely coupled tasks: 3D shape reconstruction [15, 16] and pose estimation [17]. Conventional methods [18, 19, 20] approach these tasks from a perspective of state estimation [21], solving an inverse problem where low-dimensional observations (RGB and Depth images) are used to recover high-dimensional unknown variables (3D poses and shapes) through a known observation process (e.g., projection, and differentiable rendering). However, these methods require dense observations (e.g., hundreds of views for NeRF [18]) to fully constrain the problem. In robotics or AR applications, obtaining such dense observations is challenging due to limitations in the robot's or user's observation angle and occlusions in clustered scenarios. Therefore, it is crucial to develop methods that can map from sparse (fewer than 10) or even single observations. Human vision can infer complete 3D objects from images despite occlusions by using prior knowledge of the objects, which represents the prior distributions of the shapes of specific categories, such as chairs, based on thousands of instances observed in daily life. We aim to introduce generative models [22] as providers of prior knowledge to constrain the 3D object mapping. Generative models have demonstrated impressive abilities to generate high-quality multi-modal data by learning distributions in large-scale datasets, including texts [23], images [24], videos [25], and 3D models [26, 27, 28, 29].


Transferable Reinforcement Learning via Generalized Occupancy Models

arXiv.org Artificial Intelligence

Intelligent agents must be generalists, capable of quickly adapting to various tasks. In reinforcement learning (RL), model-based RL learns a dynamics model of the world, in principle enabling transfer to arbitrary reward functions through planning. However, autoregressive model rollouts suffer from compounding error, making model-based RL ineffective for long-horizon problems. Successor features offer an alternative by modeling a policy's long-term state occupancy, reducing policy evaluation under new tasks to linear reward regression. Yet, policy improvement with successor features can be challenging. This work proposes a novel class of models, i.e., generalized occupancy models (GOMs), that learn a distribution of successor features from a stationary dataset, along with a policy that acts to realize different successor features. These models can quickly select the optimal action for arbitrary new tasks. By directly modeling long-term outcomes in the dataset, GOMs avoid compounding error while enabling rapid transfer across reward functions. We present a practical instantiation of GOMs using diffusion models and show their efficacy as a new class of transferable models, both theoretically and empirically across various simulated robotics problems.


Motion Capture Benchmark of Real Industrial Tasks and Traditional Crafts for Human Movement Analysis

arXiv.org Artificial Intelligence

Human movement analysis is a key area of research in robotics, biomechanics, and data science. It encompasses tracking, posture estimation, and movement synthesis. While numerous methodologies have evolved over time, a systematic and quantitative evaluation of these approaches using verifiable ground truth data of three-dimensional human movement is still required to define the current state of the art. This paper presents seven datasets recorded using inertial-based motion capture. The datasets contain professional gestures carried out by industrial operators and skilled craftsmen performed in real conditions in-situ. The datasets were created with the intention of being used for research in human motion modeling, analysis, and generation. The protocols for data collection are described in detail, and a preliminary analysis of the collected data is provided as a benchmark. The Gesture Operational Model, a hybrid stochastic-biomechanical approach based on kinematic descriptors, is utilized to model the dynamics of the experts' movements and create mathematical representations of their motion trajectories for analysis and quantifying their body dexterity. The models allowed accurate the generation of human professional poses and an intuitive description of how body joints cooperate and change over time through the performance of the task.


Comparative Analysis of Frameworks for Knowledge-Intensive Intelligent Agents

AI Magazine

A recurring requirement for human-level artificial intelligence is the incorporation of vast amounts of knowledge into a software agent that can use the knowledge in an efficient and organized fashion. This article discusses representations and processes for agents and behavior models that integrate large, diverse knowledge stores, are long-lived, and exhibit high degrees of competence and flexibility while interacting with complex environments. There are many different approaches to building such agents, and understanding the important commonalities and differences between approaches is often difficult. We introduce a new approach to comparing frameworks based on the notions of commitment, reconsideration, and a categorization of representations and processes. We review four agent frameworks, concentrating on the major representations and processes each directly supports.


Comparative Analysis of Frameworks for Knowledge-Intensive Intelligent Agents

AI Magazine

A recurring requirement for human-level artificial intelligence is the incorporation of vast amounts of knowledge into a software agent that can use the knowledge in an efficient and organized fashion. This article discusses representations and processes for agents and behavior models that integrate large, diverse knowledge stores, are long-lived, and exhibit high degrees of competence and flexibility while interacting with complex environments. There are many different approaches to building such agents, and understanding the important commonalities and differences between approaches is often difficult. We introduce a new approach to comparing frameworks based on the notions of commitment, reconsideration, and a categorization of representations and processes. We review four agent frameworks, concentrating on the major representations and processes each directly supports. By organizing the approaches according to a common nomenclature, the analysis highlights points of similarity and difference and suggests directions for integrating and unifying disparate approaches and for incorporating research results from one framework into alternatives.