AITopics | skoltech

Collaborating Authors

skoltech

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

VLH: Vision-Language-Haptics Foundation Model

Fuentes, Luis Francisco Moreno, Khan, Muhammad Haris, Cabrera, Miguel Altamirano, Serpiva, Valerii, Iarchuk, Dmitri, Mahmoud, Yara, Tokmurziyev, Issatay, Tsetserukou, Dzmitry

arXiv.org Artificial IntelligenceAug-5-2025

We present VLH, a novel Visual-Language-Haptic Foundation Model that unifies perception, language, and tactile feedback in aerial robotics and virtual reality. Unlike prior work that treats haptics as a secondary, reactive channel, VLH synthesizes mid-air force and vibration cues as a direct consequence of contextual visual understanding and natural language commands. Our platform comprises an 8-inch quadcopter equipped with dual inverse five-bar linkage arrays for localized haptic actuation, an egocentric VR camera, and an exocentric top-down view. Visual inputs and language instructions are processed by a fine-tuned OpenVLA backbone - adapted via LoRA on a bespoke dataset of 450 multimodal scenarios - to output a 7-dimensional action vector (Vx, Vy, Vz, Hx, Hy, Hz, Hv). INT8 quantization and a high-performance server ensure real-time operation at 4-5 Hz. In human-robot interaction experiments (90 flights), VLH achieved a 56.7% success rate for target acquisition (mean reach time 21.3 s, pose error 0.24 m) and 100% accuracy in texture discrimination. Generalization tests yielded 70.0% (visual), 54.4% (motion), 40.0% (physical), and 35.0% (semantic) performance on novel tasks. These results demonstrate VLH's ability to co-evolve haptic feedback with perceptual reasoning and intent, advancing expressive, immersive human-robot interactions.

artificial intelligence, haptic feedback, human computer interaction, (13 more...)

arXiv.org Artificial Intelligence

2508.01361

Genre: Research Report (1.00)

Industry:

Information Technology (0.68)
Education (0.68)
Transportation > Air (0.47)

Technology:

Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (0.91)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.68)

Add feedback

Quadrupedal Robot Skateboard Mounting via Reverse Curriculum Learning

Belov, Danil, Erkhov, Artem, Pestova, Elizaveta, Osokin, Ilya, Tsetserukou, Dzmitry, Osinenko, Pavel

arXiv.org Artificial IntelligenceMay-13-2025

-- The aim of this work is to enable quadrupedal robots to mount skateboards using Reverse Curriculum Reinforcement Learning. Although prior work has demonstrated skateboarding for quadrupeds that are already positioned on the board, the initial mounting phase still poses a significant challenge. A goal-oriented methodology was adopted, beginning with the terminal phases of the task and progressively increasing the complexity of the problem definition to approximate the desired objective. The learning process was initiated with the skateboard rigidly fixed within the global coordinate frame and the robot positioned directly above it. Through gradual relaxation of these initial conditions, the learned policy demonstrated robustness to variations in skateboard position and orientation, ultimately exhibiting a successful transfer to scenarios involving a mobile skateboard. Legged robot locomotion has a number of advantages over the other motion types.

artificial intelligence, robot, skateboard, (12 more...)

arXiv.org Artificial Intelligence

2505.06561

Country: Europe > Russia (0.16)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Robots > Locomotion (1.00)

Add feedback

UAV-VLPA*: A Vision-Language-Path-Action System for Optimal Route Generation on a Large Scales

Sautenkov, Oleg, Akhmetkazy, Aibek, Yaqoot, Yasheerah, Mustafa, Muhammad Ahsan, Tadevosyan, Grik, Lykov, Artem, Tsetserukou, Dzmitry

arXiv.org Artificial IntelligenceMar-4-2025

The UAV-VLPA* (Visual-Language-Planning-and-Action) system represents a cutting-edge advancement in aerial robotics, designed to enhance communication and operational efficiency for unmanned aerial vehicles (UAVs). By integrating advanced planning capabilities, the system addresses the Traveling Salesman Problem (TSP) to optimize flight paths, reducing the total trajectory length by 18.5\% compared to traditional methods. Additionally, the incorporation of the A* algorithm enables robust obstacle avoidance, ensuring safe and efficient navigation in complex environments. The system leverages satellite imagery processing combined with the Visual Language Model (VLM) and GPT's natural language processing capabilities, allowing users to generate detailed flight plans through simple text commands. This seamless fusion of visual and linguistic analysis empowers precise decision-making and mission planning, making UAV-VLPA* a transformative tool for modern aerial operations. With its unmatched operational efficiency, navigational safety, and user-friendly functionality, UAV-VLPA* sets a new standard in autonomous aerial robotics, paving the way for future innovations in the field.

arxiv preprint arxiv, path planning, trajectory length, (14 more...)

arXiv.org Artificial Intelligence

2503.02454

Country:

Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.06)
Asia > Russia (0.06)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.67)

Add feedback

GazeGrasp: DNN-Driven Robotic Grasping with Wearable Eye-Gaze Interface

Tokmurziyev, Issatay, Cabrera, Miguel Altamirano, Moreno, Luis, Khan, Muhammad Haris, Tsetserukou, Dzmitry

arXiv.org Artificial IntelligenceJan-14-2025

Abstract--We present GazeGrasp, a gaze-based manipulation system enabling individuals with motor impairments to control collaborative robots using eye-gaze. The system employs an ESP32 CAM for eye tracking, MediaPipe for gaze detection, and YOLOv8 for object localization, integrated with a Universal Robot UR10 for manipulation tasks. After user-specific calibration, the system allows intuitive object selection with a magnetic snapping effect and robot control via eye gestures. Experimental evaluation involving 13 participants demonstrated that the magnetic snapping effect significantly reduced gaze alignment time, improving task efficiency by 31%. GazeGrasp provides a robust, hands-free interface for assistive robotics, enhancing accessibility and autonomy for users.

interface, manipulation, robot, (15 more...)

arXiv.org Artificial Intelligence

2501.07255

Country:

Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.06)
Asia > Russia (0.06)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

SafeSwarm: Decentralized Safe RL for the Swarm of Drones Landing in Dense Crowds

Tadevosyan, Grik, Osipenko, Maksim, Aschu, Demetros, Fedoseev, Aleksey, Serpiva, Valerii, Sautenkov, Oleg, Karaf, Sausar, Tsetserukou, Dzmitry

arXiv.org Artificial IntelligenceJan-13-2025

This paper introduces a safe swarm of drones capable of performing landings in crowded environments robustly by relying on Reinforcement Learning techniques combined with Safe Learning. The developed system allows us to teach the swarm of drones with different dynamics to land on moving landing pads in an environment while avoiding collisions with obstacles and between agents. The safe barrier net algorithm was developed and evaluated using a swarm of Crazyflie 2.1 micro quadrotors, which were tested indoors with the Vicon motion capture system to ensure precise localization and control. Experimental results show that our system achieves landing accuracy of 2.25 cm with a mean time of 17 s and collision-free landings, underscoring its effectiveness and robustness in real-world scenarios. This work offers a promising foundation for applications in environments where safety and precision are paramount.

artificial intelligence, machine learning, obstacle, (15 more...)

arXiv.org Artificial Intelligence

2501.07566

Country:

Europe > Russia (0.19)
Europe > Netherlands (0.14)

Genre: Research Report > New Finding (0.34)

Industry:

Transportation (0.69)
Aerospace & Defense (0.49)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

ViewVR: Visual Feedback Modes to Achieve Quality of VR-based Telemanipulation

Erkhov, A., Bazhenov, A., Satsevich, S., Belov, D., Khabibullin, F., Egorov, S., Gromakov, M., Cabrera, M. Altamirano, Tsetserukou, D.

arXiv.org Artificial IntelligenceJan-13-2025

Abstract--The paper focuses on an immersive teleoperation system that enhances operator's ability to actively perceive the robot's surroundings. A consumer-grade HTC Vive VR system was used to synchronize the operator's hand and head movements with a UR3 robot and a custom-built robotic head with two degrees of freedom (2-DoF). The system's usability, manipulation efficiency, and intuitiveness of control were evaluated in comparison with static head camera positioning across three distinct tasks. Teleoperation plays a pivotal role in robotics by enabling efficient data collection for learning from demonstrations. The quality of collected data heavily depends on the operator's ability to intuitively control the system and receive adaptive visual feedback.

artificial intelligence, participant, teleoperation, (16 more...)

arXiv.org Artificial Intelligence

2501.07299

Country:

North America > United States (0.30)
Europe > Russia (0.18)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment > Games > Computer Games (0.55)
Information Technology > Hardware (0.55)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

Shake-VLA: Vision-Language-Action Model-Based System for Bimanual Robotic Manipulations and Liquid Mixing

Khan, Muhamamd Haris, Asfaw, Selamawit, Iarchuk, Dmitrii, Cabrera, Miguel Altamirano, Moreno, Luis, Tokmurziyev, Issatay, Tsetserukou, Dzmitry

arXiv.org Artificial IntelligenceJan-12-2025

This paper introduces Shake-VLA, a Vision-Language-Action (VLA) model-based system designed to enable bimanual robotic manipulation for automated cocktail preparation. The system integrates a vision module for detecting ingredient bottles and reading labels, a speech-to-text module for interpreting user commands, and a language model to generate task-specific robotic instructions. Force Torque (FT) sensors are employed to precisely measure the quantity of liquid poured, ensuring accuracy in ingredient proportions during the mixing process. The system architecture includes a Retrieval-Augmented Generation (RAG) module for accessing and adapting recipes, an anomaly detection mechanism to address ingredient availability issues, and bimanual robotic arms for dexterous manipulation. Experimental evaluations demonstrated a high success rate across system components, with the speech-to-text module achieving a 93% success rate in noisy environments, the vision module attaining a 91% success rate in object and label detection in cluttered environment, the anomaly module successfully identified 95% of discrepancies between detected ingredients and recipe requirements, and the system achieved an overall success rate of 100% in preparing cocktails, from recipe formulation to action generation.

data mining, large language model, machine learning, (22 more...)

arXiv.org Artificial Intelligence

2501.06919

Country: Europe > Russia (0.18)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
(3 more...)

Add feedback

DogSurf: Quadruped Robot Capable of GRU-based Surface Recognition for Blind Person Navigation

Bazhenov, Artem, Berman, Vladimir, Satsevich, Sergei, Shalopanova, Olga, Cabrera, Miguel Altamirano, Lykov, Artem, Tsetserukou, Dzmitry

arXiv.org Artificial IntelligenceFeb-5-2024

This paper introduces DogSurf - a newapproach of using quadruped robots to help visually impaired people navigate in real world. The presented method allows the quadruped robot to detect slippery surfaces, and to use audio and haptic feedback to inform the user when to stop. A state-of-the-art GRU-based neural network architecture with mean accuracy of 99.925% was proposed for the task of multiclass surface classification for quadruped robots. A dataset was collected on a Unitree Go1 Edu robot. The dataset and code have been posted to the public domain.

dogsurf, robot, slippery surface, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3610978.3640606

2402.03156

Country:

North America > United States > Colorado > Boulder County > Boulder (0.15)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.06)
Asia > Russia (0.06)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.71)

Technology:

Information Technology > Artificial Intelligence > Robots > Locomotion (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

CognitiveDog: Large Multimodal Model Based System to Translate Vision and Language into Action of Quadruped Robot

Lykov, Artem, Litvinov, Mikhail, Konenkov, Mikhail, Prochii, Rinat, Burtsev, Nikita, Abdulkarim, Ali Alridha, Bazhenov, Artem, Berman, Vladimir, Tsetserukou, Dzmitry

arXiv.org Artificial IntelligenceJan-17-2024

This paper introduces CognitiveDog, a pioneering development of quadruped robot with Large Multi-modal Model (LMM) that is capable of not only communicating with humans verbally but also physically interacting with the environment through object manipulation. The system was realized on Unitree Go1 robot-dog equipped with a custom gripper and demonstrated autonomous decision-making capabilities, independently determining the most appropriate actions and interactions with various objects to fulfill user-defined tasks. These tasks do not necessarily include direct instructions, challenging the robot to comprehend and execute them based on natural language input and environmental cues. The paper delves into the intricacies of this system, dataset characteristics, and the software architecture. Key to this development is the robot's proficiency in navigating space using Visual-SLAM, effectively manipulating and transporting objects, and providing insightful natural language commentary during task execution. Experimental results highlight the robot's advanced task comprehension and adaptability, underscoring its potential in real-world applications. The dataset used to fine-tune the robot-dog behavior generation model is provided at the following link: huggingface.co/datasets/ArtemLykov/CognitiveDog_dataset

experiment, robot, skolkovo institute, (14 more...)

arXiv.org Artificial Intelligence

2401.09388

Country: North America > United States > Colorado > Boulder County > Boulder (0.06)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.98)

Add feedback

HyperDog: An Open-Source Quadruped Robot Platform Based on ROS2 and micro-ROS

Mudalige, Nipun Dhananjaya Weerakkodi, Zhura, Iana, Babataev, Ildar, Nazarova, Elena, Fedoseev, Aleksey, Tsetserukou, Dzmitry

arXiv.org Artificial IntelligenceSep-19-2022

Nowadays, design and development of legged quadruped robots is a quite active area of scientific research. In fact, the legged robots have become popular due to their capabilities to adapt to harsh terrains and diverse environmental conditions in comparison to other mobile robots. With the higher demand for legged robot experiments, more researches and engineers need an affordable and quick way of locomotion algorithm development. In this paper, we present a new open source quadruped robot HyperDog platform, which features 12 RC servo motors, onboard NVIDIA Jetson nano computer and STM32F4 Discovery board. HyperDog is an open-source platform for quadruped robotic software development, which is based on Robot Operating System 2 (ROS2) and micro-ROS. Moreover, the HyperDog is a quadrupedal robotic dog entirely built from 3D printed parts and carbon fiber, which allows the robot to have light weight and good strength. The idea of this work is to demonstrate an affordable and customizable way of robot development and provide researches and engineers with the legged robot platform, where different algorithms can be tested and validated in simulation and real environment. The developed project with code is available on GitHub (https://github.com/NDHANA94/hyperdog_ros2).

artificial intelligence, hyperdog, robot, (15 more...)

arXiv.org Artificial Intelligence

2209.09171

Country:

Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.05)
Asia > Russia (0.05)

Genre: Research Report (0.50)

Industry: Information Technology (0.35)

Technology: Information Technology > Artificial Intelligence > Robots > Locomotion (1.00)

Add feedback