Not enough data to create a plot.
Try a different view from the menu above.
Tsetserukou, Dzmitry
DogSurf: Quadruped Robot Capable of GRU-based Surface Recognition for Blind Person Navigation
Bazhenov, Artem, Berman, Vladimir, Satsevich, Sergei, Shalopanova, Olga, Cabrera, Miguel Altamirano, Lykov, Artem, Tsetserukou, Dzmitry
This paper introduces DogSurf - a newapproach of using quadruped robots to help visually impaired people navigate in real world. The presented method allows the quadruped robot to detect slippery surfaces, and to use audio and haptic feedback to inform the user when to stop. A state-of-the-art GRU-based neural network architecture with mean accuracy of 99.925% was proposed for the task of multiclass surface classification for quadruped robots. A dataset was collected on a Unitree Go1 Edu robot. The dataset and code have been posted to the public domain.
CognitiveOS: Large Multimodal Model based System to Endow Any Type of Robot with Generative AI
Lykov, Artem, Konenkov, Mikhail, Gbagbe, Koffivi Fidèle, Litvinov, Mikhail, Peter, Robinroy, Davletshin, Denis, Fedoseev, Aleksey, Kobzarev, Oleg, Alabbas, Ali, Alyounes, Oussama, Cabrera, Miguel Altamirano, Tsetserukou, Dzmitry
In cognitive robotics, the scientific community recognized the high generalization capability of these large models as a key to developing a robot that could perform new tasks based on generalized knowledge derived from familiar actions expressed in natural language. However, efforts to apply LLMs in robotics faced challenges, particularly in understanding and processing the external world. Previous attempts to convey the model's understanding of the world through text-only approaches [1], [20], [8] struggled with ambiguities and the assumption of static objects unless interacted with. The introduction of multi-modal transformer-based models such as GPT-4 [16] and Gemini [18], capable of processing images, opened up new possibilities for robotics [5], allowing robots to comprehend their environment and enhancing their'Embodied Experience' [15]. Cognitive robots have been developed on various platforms, ranging from mobile manipulators [5], [3] to bio-inspired humanoid robots [21] and quadrupedal robots [6]. In the latter, cognitive abilities were developed using an'Inner Monologue' approach [10], with improvements inspired by the'Autogen' concept [25]. The cognition of the robot is facilitated through internal communication between agent models, leveraging their strengths to provide different cognitive capabilities to the system.
CognitiveDog: Large Multimodal Model Based System to Translate Vision and Language into Action of Quadruped Robot
Lykov, Artem, Litvinov, Mikhail, Konenkov, Mikhail, Prochii, Rinat, Burtsev, Nikita, Abdulkarim, Ali Alridha, Bazhenov, Artem, Berman, Vladimir, Tsetserukou, Dzmitry
This paper introduces CognitiveDog, a pioneering development of quadruped robot with Large Multi-modal Model (LMM) that is capable of not only communicating with humans verbally but also physically interacting with the environment through object manipulation. The system was realized on Unitree Go1 robot-dog equipped with a custom gripper and demonstrated autonomous decision-making capabilities, independently determining the most appropriate actions and interactions with various objects to fulfill user-defined tasks. These tasks do not necessarily include direct instructions, challenging the robot to comprehend and execute them based on natural language input and environmental cues. The paper delves into the intricacies of this system, dataset characteristics, and the software architecture. Key to this development is the robot's proficiency in navigating space using Visual-SLAM, effectively manipulating and transporting objects, and providing insightful natural language commentary during task execution. Experimental results highlight the robot's advanced task comprehension and adaptability, underscoring its potential in real-world applications. The dataset used to fine-tune the robot-dog behavior generation model is provided at the following link: huggingface.co/datasets/ArtemLykov/CognitiveDog_dataset
LLM-MARS: Large Language Model for Behavior Tree Generation and NLP-enhanced Dialogue in Multi-Agent Robot Systems
Lykov, Artem, Dronova, Maria, Naglov, Nikolay, Litvinov, Mikhail, Satsevich, Sergei, Bazhenov, Artem, Berman, Vladimir, Shcherbak, Aleksei, Tsetserukou, Dzmitry
This paper introduces LLM-MARS, first technology that utilizes a Large Language Model based Artificial Intelligence for Multi-Agent Robot Systems. LLM-MARS enables dynamic dialogues between humans and robots, allowing the latter to generate behavior based on operator commands and provide informative answers to questions about their actions. LLM-MARS is built on a transformer-based Large Language Model, fine-tuned from the Falcon 7B model. We employ a multimodal approach using LoRa adapters for different tasks. The first LoRa adapter was developed by fine-tuning the base model on examples of Behavior Trees and their corresponding commands. The second LoRa adapter was developed by fine-tuning on question-answering examples. Practical trials on a multi-agent system of two robots within the Eurobot 2023 game rules demonstrate promising results. The robots achieve an average task execution accuracy of 79.28% in compound commands. With commands containing up to two tasks accuracy exceeded 90%. Evaluation confirms the system's answers on operators questions exhibit high accuracy, relevance, and informativeness. LLM-MARS and similar multi-agent robotic systems hold significant potential to revolutionize logistics, enabling autonomous exploration missions and advancing Industry 5.0.
HaptiCharger: Robotic Charging of Electric Vehicles Based on Human Haptic Patterns
Alyounes, Oussama, Cabrera, Miguel Altamirano, Tsetserukou, Dzmitry
The growing demand for electric vehicles requires the development of automated car charging methods. At the moment, the process of charging an electric car is completely manual, and that requires physical effort to accomplish the task, which is not suitable for people with disabilities. Typically, the effort in the automation of the charging task research is focused on detecting the position and orientation of the socket, which resulted in a relatively high accuracy, 5 mm, and 10 degrees. However, this accuracy is not enough to complete the charging process. In this work, we focus on designing a novel methodology for robust robotic plug-in and plug-out based on human haptics to overcome the error in the orientation of the socket. Participants were invited to perform the charging task, and their cognitive capabilities were recognized by measuring the applied forces along with the movements of the charger. Eventually, an algorithm was developed based on the human's best strategies to be applied to a robotic arm.
PolyMerge: A Novel Technique aimed at Dynamic HD Map Updates Leveraging Polylines
Sayed, Mohamed, Perminov, Stepan, Tsetserukou, Dzmitry
Currently, High-Definition (HD) maps are a prerequisite for the stable operation of autonomous vehicles. Such maps contain information about all static road objects for the vehicle to consider during navigation, such as road edges, road lanes, crosswalks, and etc. To generate such an HD map, current approaches need to process pre-recorded environment data obtained from onboard sensors. However, recording such a dataset often requires a lot of time and effort. In addition, every time actual road environments are changed, a new dataset should be recorded to generate a relevant HD map. This paper addresses a novel approach that allows to continuously generate or update the HD map using onboard sensor data. When there is no need to pre-record the dataset, updating the HD map can be run in parallel with the main autonomous vehicle navigation pipeline. The proposed approach utilizes the VectorMapNet framework to generate vector road object instances from a sensor data scan. The PolyMerge technique is aimed to merge new instances into previous ones, mitigating detection errors and, therefore, generating or updating the HD map. The performance of the algorithm was confirmed by comparison with ground truth on the NuScenes dataset. Experimental results showed that the mean error for different levels of environment complexity was comparable to the VectorMapNet single instance error.
TeslaCharge: Smart Robotic Charger Driven by Impedance Control and Human Haptic Patterns
Alyounes, Oussama, Cabrera, Miguel Altamirano, Tsetserukou, Dzmitry
The growing demand for electric vehicles requires the development of automated car charging methods. At the moment, the process of charging an electric car is completely manual, and that requires physical effort to accomplish the task, which is not suitable for people with disabilities. Typically, the effort in the research is focused on detecting the position and orientation of the socket, which resulted in a relatively high accuracy, $\pm 5 \: mm $ and $\pm 10^o$. However, this accuracy is not enough to complete the charging process. In this work, we focus on designing a novel methodology for robust robotic plug-in and plug-out based on human haptics, to overcome the error in the position and orientation of the socket. Participants were invited to perform the charging task, and their cognitive capabilities were recognized by measuring the applied forces along with the movement of the charger. Three controllers were designed based on impedance control to mimic the human patterns of charging an electric car. The recorded data from humans were used to calibrate the parameters of the impedance controllers: inertia $M_d$, damping $D_d$, and stiffness $K_d$. A robotic validation was performed, where the designed controllers were applied to the robot UR10. Using the proposed controllers and the human kinesthetic data, it was possible to successfully automate the operation of charging an electric car.
LocoNeRF: A NeRF-based Approach for Local Structure from Motion for Precise Localization
Nenashev, Artem, Kurenkov, Mikhail, Potapov, Andrei, Zhura, Iana, Katerishich, Maksim, Tsetserukou, Dzmitry
Visual localization is a critical task in mobile robotics, and researchers are continuously developing new approaches to enhance its efficiency. In this article, we propose a novel approach to improve the accuracy of visual localization using Structure from Motion (SfM) techniques. We highlight the limitations of global SfM, which suffers from high latency, and the challenges of local SfM, which requires large image databases for accurate reconstruction. To address these issues, we propose utilizing Neural Radiance Fields (NeRF), as opposed to image databases, to cut down on the space required for storage. We suggest that sampling reference images around the prior query position can lead to further improvements. We evaluate the accuracy of our proposed method against ground truth obtained using LIDAR and Advanced Lidar Odometry and Mapping in Real-time (A-LOAM), and compare its storage usage against local SfM with COLMAP in the conducted experiments. Our proposed method achieves an accuracy of 0.068 meters compared to the ground truth, which is slightly lower than the most advanced method COLMAP, which has an accuracy of 0.022 meters. However, the size of the database required for COLMAP is 400 megabytes, whereas the size of our NeRF model is only 160 megabytes. Finally, we perform an ablation study to assess the impact of using reference images from the NeRF reconstruction.
DNFOMP: Dynamic Neural Field Optimal Motion Planner for Navigation of Autonomous Robots in Cluttered Environment
Katerishich, Maksim, Kurenkov, Mikhail, Karaf, Sausar, Nenashev, Artem, Tsetserukou, Dzmitry
Motion planning in dynamically changing environments is one of the most complex challenges in autonomous driving. Safety is a crucial requirement, along with driving comfort and speed limits. While classical sampling-based, lattice-based, and optimization-based planning methods can generate smooth and short paths, they often do not consider the dynamics of the environment. Some techniques do consider it, but they rely on updating the environment on-the-go rather than explicitly accounting for the dynamics, which is not suitable for self-driving. To address this, we propose a novel method based on the Neural Field Optimal Motion Planner (NFOMP), which outperforms state-of-the-art approaches in terms of normalized curvature and the number of cusps. Our approach embeds previously known moving obstacles into the neural field collision model to account for the dynamics of the environment. We also introduce time profiling of the trajectory and non-linear velocity constraints by adding Lagrange multipliers to the trajectory loss function. We applied our method to solve the optimal motion planning problem in an urban environment using the BeamNG.tech driving simulator. An autonomous car drove the generated trajectories in three city scenarios while sharing the road with the obstacle vehicle. Our evaluation shows that the maximum acceleration the passenger can experience instantly is -7.5 m/s^2 and that 89.6% of the driving time is devoted to normal driving with accelerations below 3.5 m/s^2. The driving style is characterized by 46.0% and 31.4% of the driving time being devoted to the light rail transit style and the moderate driving style, respectively.
NeuroSwarm: Multi-Agent Neural 3D Scene Reconstruction and Segmentation with UAV for Optimal Navigation of Quadruped Robot
Zhura, Iana, Davletshin, Denis, Mudalige, Nipun Dhananjaya Weerakkodi, Fedoseev, Aleksey, Peter, Robinroy, Tsetserukou, Dzmitry
Quadruped robots have the distinct ability to adapt their body and step height to navigate through cluttered environments. Nonetheless, for these robots to utilize their full potential in real-world scenarios, they require awareness of their environment and obstacle geometry. We propose a novel multi-agent robotic system that incorporates cutting-edge technologies. The proposed solution features a 3D neural reconstruction algorithm that enables navigation of a quadruped robot in both static and semi-static environments. The prior areas of the environment are also segmented according to the quadruped robots' abilities to pass them. Moreover, we have developed an adaptive neural field optimal motion planner (ANFOMP) that considers both collision probability and obstacle height in 2D space.Our new navigation and mapping approach enables quadruped robots to adjust their height and behavior to navigate under arches and push through obstacles with smaller dimensions. The multi-agent mapping operation has proven to be highly accurate, with an obstacle reconstruction precision of 82%. Moreover, the quadruped robot can navigate with 3D obstacle information and the ANFOMP system, resulting in a 33.3% reduction in path length and a 70% reduction in navigation time.