Goto

Collaborating Authors

 Ahmetoglu, Alper


Symbolic Manipulation Planning with Discovered Object and Relational Predicates

arXiv.org Artificial Intelligence

Discovering the symbols and rules that can be used in long-horizon planning from a robot's unsupervised exploration of its environment and continuous sensorimotor experience is a challenging task. The previous studies proposed learning symbols from single or paired object interactions and planning with these symbols. In this work, we propose a system that learns rules with discovered object and relational symbols that encode an arbitrary number of objects and the relations between them, converts those rules to Planning Domain Description Language (PDDL), and generates plans that involve affordances of the arbitrary number of objects to achieve tasks. We validated our system with box-shaped objects in different sizes and showed that the system can develop a symbolic knowledge of pick-up, carry, and place operations, taking into account object compounds in different configurations, such as boxes would be carried together with a larger box that they are placed on. We also compared our method with the state-of-the-art methods and showed that planning with the operators defined over relational symbols gives better planning performance compared to the baselines.


Developmental Scaffolding with Large Language Models

arXiv.org Artificial Intelligence

Exploratoration and self-observation are key mechanisms of infant sensorimotor development. These processes are further guided by parental scaffolding accelerating skill and knowledge acquisition. In developmental robotics, this approach has been adopted often by having a human acting as the source of scaffolding. In this study, we investigate whether Large Language Models (LLMs) can act as a scaffolding agent for a robotic system that aims to learn to predict the effects of its actions. To this end, an object manipulation setup is considered where one object can be picked and placed on top of or in the vicinity of another object. The adopted LLM is asked to guide the action selection process through algorithmically generated state descriptions and action selection alternatives in natural language. The simulation experiments that include cubes in this setup show that LLM-guided (GPT3.5-guided) learning yields significantly faster discovery of novel structures compared to random exploration. However, we observed that GPT3.5 fails to effectively guide the robot in generating structures with different affordances such as cubes and spheres. Overall, we conclude that even without fine-tuning, LLMs may serve as a moderate scaffolding agent for improving robot learning, however, they still lack affordance understanding which limits the applicability of the current LLMs in robotic scaffolding tasks.


Discovering Predictive Relational Object Symbols with Symbolic Attentive Layers

arXiv.org Artificial Intelligence

In this paper, we propose and realize a new deep learning architecture for discovering symbolic representations for objects and their relations based on the self-supervised continuous interaction of a manipulator robot with multiple objects on a tabletop environment. The key feature of the model is that it can handle a changing number number of objects naturally and map the object-object relations into symbolic domain explicitly. In the model, we employ a self-attention layer that computes discrete attention weights from object features, which are treated as relational symbols between objects. These relational symbols are then used to aggregate the learned object symbols and predict the effects of executed actions on each object. The result is a pipeline that allows the formation of object symbols and relational symbols from a dataset of object features, actions, and effects in an end-to-end manner. We compare the performance of our proposed architecture with state-of-the-art symbol discovery methods in a simulated tabletop environment where the robot needs to discover symbols related to the relative positions of objects to predict the observed effect successfully. Our experiments show that the proposed architecture performs better than other baselines in effect prediction while forming not only object symbols but also relational symbols. Furthermore, we analyze the learned symbols and relational patterns between objects to learn about how the model interprets the environment. Our analysis shows that the learned symbols relate to the relative positions of objects, object types, and their horizontal alignment on the table, which reflect the regularities in the environment.


DeepSym: Deep Symbol Generation and Rule Learning for Planning from Unsupervised Robot Interaction

Journal of Artificial Intelligence Research

Symbolic planning and reasoning are powerful tools for robots tackling complex tasks. However, the need to manually design the symbols restrict their applicability, especially for robots that are expected to act in open-ended environments. Therefore symbol formation and rule extraction should be considered part of robot learning, which, when done properly, will offer scalability, flexibility, and robustness. Towards this goal, we propose a novel general method that finds action-grounded, discrete object and effect categories and builds probabilistic rules over them for non-trivial action planning. Our robot interacts with objects using an initial action repertoire that is assumed to be acquired earlier and observes the effects it can create in the environment. To form action-grounded object, effect, and relational categories, we employ a binary bottleneck layer in a predictive, deep encoderdecoder network that takes the image of the scene and the action applied as input, and generates the resulting effects in the scene in pixel coordinates. After learning, the binary latent vector represents action-driven object categories based on the interaction experience of the robot. To distill the knowledge represented by the neural network into rules useful for symbolic reasoning, a decision tree is trained to reproduce its decoder function. Probabilistic rules are extracted from the decision paths of the tree and are represented in the Probabilistic Planning Domain Definition Language (PPDDL), allowing off-the-shelf planners to operate on the knowledge extracted from the sensorimotor experience of the robot. The deployment of the proposed approach for a simulated robotic manipulator enabled the discovery of discrete representations of object properties such as ‘rollable’ and ‘insertable’. In turn, the use of these representations as symbols allowed the generation of effective plans for achieving goals, such as building towers of the desired height, demonstrating the effectiveness of the approach for multi-step object manipulation. Finally, we demonstrate that the system is not only restricted to the robotics domain by assessing its applicability to the MNIST 8-puzzle domain in which learned symbols allow for the generation of plans that move the empty tile into any given position.


Learning Multi-Object Symbols for Manipulation with Attentive Deep Effect Predictors

arXiv.org Artificial Intelligence

In this paper, we propose a concept learning architecture that enables a robot to build symbols through self-exploration by interacting with a varying number of objects. Our aim is to allow a robot to learn concepts without constraints, such as a fixed number of interacted objects or pre-defined symbolic structures. As such, the sought architecture should be able to build symbols for objects such as single objects that can be grasped, object stacks that cannot be grasped together, or other composite dynamic structures. Towards this end, we propose a novel architecture, a self-attentive predictive encoder-decoder network with binary activation layers. We show the validity of the proposed network through a robotic manipulation setup involving a varying number of rigid objects. The continuous sensorimotor experience of the robot is used by the proposed network to form effect predictors and symbolic structures that describe the interaction of the robot in a discrete way. We showed that the robot acquired reasoning capabilities to encode interaction dynamics of a varying number of objects in different configurations using the discovered symbols. For example, the robot could reason that (possible multiple numbers of) objects on top of another object would move together if the object below is moved by the robot. We also showed that the discovered symbols can be used for planning to reach goals by training a higher-level neural network that makes pure symbolic reasoning.


DeepSym: Deep Symbol Generation and Rule Learning from Unsupervised Continuous Robot Interaction for Planning

arXiv.org Artificial Intelligence

Autonomous discovery of discrete symbols and rules from continuous interaction experience is a crucial building block of robot AI, but remains a challenging problem. Solving it will overcome the limitations in scalability, flexibility, and robustness of manually-designed symbols and rules, and will constitute a substantial advance towards autonomous robots that can learn and reason at abstract levels in open-ended environments. Towards this goal, we propose a novel and general method that finds action-grounded, discrete object and effect categories and builds probabilistic rules over them that can be used in complex action planning. Our robot interacts with single and multiple objects using a given action repertoire and observes the effects created in the environment. In order to form action-grounded object, effect, and relational categories, we employ a binarized bottleneck layer of a predictive, deep encoder-decoder network that takes as input the image of the scene and the action applied, and generates the resulting object displacements in the scene (action effects) in pixel coordinates. The binary latent vector represents a learned, action-driven categorization of objects. To distill the knowledge represented by the neural network into rules useful for symbolic reasoning, we train a decision tree to reproduce its decoder function. From its branches we extract probabilistic rules and represent them in PPDDL, allowing off-the-shelf planners to operate on the robot's sensorimotor experience. Our system is verified in a physics-based 3d simulation environment where a robot arm-hand system learned symbols that can be interpreted as 'rollable', 'insertable', 'larger-than' from its push and stack actions; and generated effective plans to achieve goals such as building towers from given cubes, balls, and cups using off-the-shelf probabilistic planners.