AITopics | Cabrera, Miguel Altamirano

Plotting

Cabrera, Miguel Altamirano

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Evolution 6.0: Evolving Robotic Capabilities Through Generative Design

Khan, Muhammad Haris, Myshlyaev, Artyom, Lykov, Artem, Cabrera, Miguel Altamirano, Tsetserukou, Dzmitry

arXiv.org Artificial IntelligenceFeb-25-2025

We propose a new concept, Evolution 6.0, which represents the evolution of robotics driven by Generative AI. When a robot lacks the necessary tools to accomplish a task requested by a human, it autonomously designs the required instruments and learns how to use them to achieve the goal. Evolution 6.0 is an autonomous robotic system powered by Vision-Language Models (VLMs), Vision-Language Action (VLA) models, and Text-to-3D generative models for tool design and task execution. The system comprises two key modules: the Tool Generation Module, which fabricates task-specific tools from visual and textual data, and the Action Generation Module, which converts natural language instructions into robotic actions. It integrates QwenVLM for environmental understanding, OpenVLA for task execution, and Llama-Mesh for 3D tool generation. Evaluation results demonstrate a 90% success rate for tool generation with a 10-second inference time, and action generation achieving 83.5% in physical and visual generalization, 70% in motion generalization, and 37% in semantic generalization. Future improvements will focus on bimanual manipulation, expanded task capabilities, and enhanced environmental interpretation to improve real-world adaptability.

artificial intelligence, evolving robotic capability, natural language, (2 more...)

arXiv.org Artificial Intelligence

2502.17034

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.53)

Add feedback

GazeGrasp: DNN-Driven Robotic Grasping with Wearable Eye-Gaze Interface

Tokmurziyev, Issatay, Cabrera, Miguel Altamirano, Moreno, Luis, Khan, Muhammad Haris, Tsetserukou, Dzmitry

arXiv.org Artificial IntelligenceJan-14-2025

Abstract--We present GazeGrasp, a gaze-based manipulation system enabling individuals with motor impairments to control collaborative robots using eye-gaze. The system employs an ESP32 CAM for eye tracking, MediaPipe for gaze detection, and YOLOv8 for object localization, integrated with a Universal Robot UR10 for manipulation tasks. After user-specific calibration, the system allows intuitive object selection with a magnetic snapping effect and robot control via eye gestures. Experimental evaluation involving 13 participants demonstrated that the magnetic snapping effect significantly reduced gaze alignment time, improving task efficiency by 31%. GazeGrasp provides a robust, hands-free interface for assistive robotics, enhancing accessibility and autonomy for users.

artificial intelligence, interface, robot, (16 more...)

arXiv.org Artificial Intelligence

2501.07255

Country: Europe > Russia (0.17)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

Shake-VLA: Vision-Language-Action Model-Based System for Bimanual Robotic Manipulations and Liquid Mixing

Khan, Muhamamd Haris, Asfaw, Selamawit, Iarchuk, Dmitrii, Cabrera, Miguel Altamirano, Moreno, Luis, Tokmurziyev, Issatay, Tsetserukou, Dzmitry

arXiv.org Artificial IntelligenceJan-12-2025

This paper introduces Shake-VLA, a Vision-Language-Action (VLA) model-based system designed to enable bimanual robotic manipulation for automated cocktail preparation. The system integrates a vision module for detecting ingredient bottles and reading labels, a speech-to-text module for interpreting user commands, and a language model to generate task-specific robotic instructions. Force Torque (FT) sensors are employed to precisely measure the quantity of liquid poured, ensuring accuracy in ingredient proportions during the mixing process. The system architecture includes a Retrieval-Augmented Generation (RAG) module for accessing and adapting recipes, an anomaly detection mechanism to address ingredient availability issues, and bimanual robotic arms for dexterous manipulation. Experimental evaluations demonstrated a high success rate across system components, with the speech-to-text module achieving a 93% success rate in noisy environments, the vision module attaining a 91% success rate in object and label detection in cluttered environment, the anomaly module successfully identified 95% of discrepancies between detected ingredients and recipe requirements, and the system achieved an overall success rate of 100% in preparing cocktails, from recipe formulation to action generation.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2501.06919

Country: Europe > Russia (0.18)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.91)

Add feedback

UAV-VLA: Vision-Language-Action System for Large Scale Aerial Mission Generation

Sautenkov, Oleg, Yaqoot, Yasheerah, Lykov, Artem, Mustafa, Muhammad Ahsan, Tadevosyan, Grik, Akhmetkazy, Aibek, Cabrera, Miguel Altamirano, Martynov, Mikhail, Karaf, Sausar, Tsetserukou, Dzmitry

arXiv.org Artificial IntelligenceJan-9-2025

The UAV-VLA (Visual-Language-Action) system is a tool designed to facilitate communication with aerial robots. By integrating satellite imagery processing with the Visual Language Model (VLM) and the powerful capabilities of GPT, UAV-VLA enables users to generate general flight paths-and-action plans through simple text requests. This system leverages the rich contextual information provided by satellite images, allowing for enhanced decision-making and mission planning. The combination of visual analysis by VLM and natural language processing by GPT can provide the user with the path-and-action set, making aerial operations more efficient and accessible. The newly developed method showed the difference in the length of the created trajectory in 22% and the mean error in finding the objects of interest on a map in 34.22 m by Euclidean distance in the K-Nearest Neighbors (KNN) approach.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2501.05014

Country: North America > United States (0.14)

Genre: Research Report (0.82)

Industry: Government > Military (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
(2 more...)

Add feedback

TiltXter: CNN-based Electro-tactile Rendering of Tilt Angle for Telemanipulation of Pasteur Pipettes

Cabrera, Miguel Altamirano, Tirado, Jonathan, Fedoseev, Aleksey, Sautenkov, Oleg, Poliakov, Vladimir, Kopanev, Pavel, Tsetserukou, Dzmitry

arXiv.org Artificial IntelligenceSep-24-2024

The shape of deformable objects can change drastically during grasping by robotic grippers, causing an ambiguous perception of their alignment and hence resulting in errors in robot positioning and telemanipulation. Rendering clear tactile patterns is fundamental to increasing users' precision and dexterity through tactile haptic feedback during telemanipulation. Therefore, different methods have to be studied to decode the sensors' data into haptic stimuli. This work presents a telemanipulation system for plastic pipettes that consists of a Force Dimension Omega.7 haptic interface endowed with two electro-stimulation arrays and two tactile sensor arrays embedded in the 2-finger Robotiq gripper. We propose a novel approach based on convolutional neural networks (CNN) to detect the tilt of deformable objects. The CNN generates a tactile pattern based on recognized tilt data to render further electro-tactile stimuli provided to the user during the telemanipulation. The study has shown that using the CNN algorithm, tilt recognition by users increased from 23.13\% with the downsized data to 57.9%, and the success rate during teleoperation increased from 53.12% using the downsized data to 92.18% using the tactile patterns generated by the CNN.

artificial intelligence, machine learning, tactile pattern, (19 more...)

arXiv.org Artificial Intelligence

2409.15838

Country: North America > United States (0.47)

Genre:

Research Report > New Finding (0.96)
Research Report > Experimental Study (0.71)

Industry:

Government > Regional Government (0.47)
Health & Medicine (0.46)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

GazeRace: Revolutionizing Remote Piloting with Eye-Gaze Control

Tokmurziyev, Issatay, Serpiva, Valerii, Fedoseev, Alexey, Cabrera, Miguel Altamirano, Tsetserukou, Dzmitry

arXiv.org Artificial IntelligenceJul-12-2024

This paper introduces the GazeRace method for drone navigation, employing a computer vision interface facilitated by eye-tracking technology. This interface is designed to be compatible with a single camera and uses a convolutional neural network to convert eye movements into control commands for the drone. Experimental validation demonstrates that users equipped with the eye-tracking interface achieve comparable performance to a traditional remote control interface when completing a drone racing task. Ten participants completed flight tests in which they navigated a drone through a racing track in a Gazebo simulation environment. Users reduced drone trajectory length by 18% (73.44 m vs. 89.29 m) using the eye-tracking interface to navigate racing gates effectively. The time taken to complete the route using the eye-tracking method (average of 70.01 seconds) was only 3.5% slower than using the remote control method (also average of 70.01 seconds), indicating the good efficiency of the interface. It is also worth mentioning that four of the participants completed the race with an average time that was 25.9% faster than the other participants. In addition, users evaluated highly the performance (M = 34.0, SD = 14.2) and low frustration (M = 30.5, SD = 9.2) with the eye-tracking interface compared to performance (M = 63.0, SD = 10.1) and frustration (M = 49.0, SD = 11.7) with the baseline remote controller. The hedonic quality (M = 1.65, SD = 0.45) was also evaluated high by the users in the UEQ questionnaire.

artificial intelligence, human computer interaction, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2407.09459

Country: North America > United States > New York (0.14)

Genre: Research Report > New Finding (0.97)

Industry:

Information Technology > Robotics & Automation (0.88)
Government (0.69)
Transportation > Air (0.68)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
(2 more...)

Add feedback

MoveTouch: Robotic Motion Capturing System with Wearable Tactile Display to Achieve Safe HRI

Alabbas, Ali, Cabrera, Miguel Altamirano, Sayed, Mohamed, Alyounes, Oussama, Liu, Qian, Tsetserukou, Dzmitry

arXiv.org Artificial IntelligenceJul-5-2024

The collaborative robot market is flourishing as there is a trend towards simplification, modularity, and increased flexibility on the production line. But when humans and robots are collaborating in a shared environment, the safety of humans should be a priority. We introduce a novel wearable robotic system to enhance safety during Human-Robot Interaction (HRI). The proposed wearable robot is designed to hold a fiducial marker and maintain its visibility to a motion capture system, which, in turn, localizes the user's hand with good accuracy and low latency and provides vibrotactile feedback to the user's wrist. The vibrotactile feedback guides the user's hand movement during collaborative tasks in order to increase safety and enhance collaboration efficiency. A user study was conducted to assess the recognition and discriminability of ten designed vibration patterns applied to the upper (dorsal) and the down (volar) parts of the user's wrist. The results show that the pattern recognition rate on the volar side was higher, with an average of 75.64% among all users. Four patterns with a high recognition rate were chosen to be incorporated into our system. A second experiment was carried out to evaluate users' response to the chosen patterns in real-world collaborative tasks. Results show that all participants responded to the patterns correctly, and the average response time for the patterns was between 0.24 and 2.41 seconds.

artificial intelligence, participant, robot, (17 more...)

arXiv.org Artificial Intelligence

2405.04899

Country:

Asia > China (0.14)
Europe > Russia (0.14)

Genre: Research Report > New Finding (0.87)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

Robots Can Feel: LLM-based Framework for Robot Ethical Reasoning

Lykov, Artem, Cabrera, Miguel Altamirano, Gbagbe, Koffivi Fidèle, Tsetserukou, Dzmitry

arXiv.org Artificial IntelligenceMay-9-2024

This paper presents the development of a novel ethical reasoning framework for robots. "Robots Can Feel" is the first system for robots that utilizes a combination of logic and human-like emotion simulation to make decisions in morally complex situations akin to humans. The key feature of the approach is the management of the Emotion Weight Coefficient - a customizable parameter to assign the role of emotions in robot decision-making. The system aims to serve as a tool that can equip robots of any form and purpose with ethical behavior close to human standards. Besides the platform, the system is independent of the choice of the base model. During the evaluation, the system was tested on 8 top up-to-date LLMs (Large Language Models). This list included both commercial and open-source models developed by various companies and countries. The research demonstrated that regardless of the model choice, the Emotions Weight Coefficient influences the robot's decision similarly. According to ANOVA analysis, the use of different Emotion Weight Coefficients influenced the final decision in a range of situations, such as in a request for a dietary violation F(4, 35) = 11.2, p = 0.0001 and in an animal compassion situation F(4, 35) = 8.5441, p = 0.0001. A demonstration code repository is provided at: https://github.com/TemaLykov/robots_can_feel

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2405.05824

Country:

Asia (0.28)
Europe > Russia (0.14)
North America > United States (0.14)

Genre:

Research Report > New Finding (0.91)
Research Report > Experimental Study (0.71)

Industry: Health & Medicine (0.94)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Bi-VLA: Vision-Language-Action Model-Based System for Bimanual Robotic Dexterous Manipulations

Gbagbe, Koffivi Fidèle, Cabrera, Miguel Altamirano, Alabbas, Ali, Alyunes, Oussama, Lykov, Artem, Tsetserukou, Dzmitry

arXiv.org Artificial IntelligenceMay-9-2024

Abstract-- This research introduces the Bi-VLA (Vision-Language-Action) model, a novel system designed for bimanual robotic dexterous manipulation that seamlessly integrates vision for scene understanding, language comprehension for translating human instructions into executable code, and physical action generation. We evaluated the system's functionality through a series of household tasks, including the preparation of a desired salad upon human request. Bi-VLA demonstrates the ability to interpret complex human instructions, perceive and understand the visual context of ingredients, and execute precise bimanual actions to prepare the requested salad. We assessed the system's performance in terms of accuracy, efficiency, and adaptability to different salad recipes and human preferences through a series of experiments. Our results show a 100% success rate in generating the correct executable code by the Language Module, a 96.06% success rate in detecting specific ingredients by the Vision Module, and an overall success rate of 83.4% in However, despite their potential, the application of language models Recent advancements in language models have significantly to synthesize the bimanual skills of robots has not received impacted Human-Robot Interaction (HRI), enabling significant attention.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2405.06039

Country: Europe > Russia (0.14)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.70)

Add feedback

DogSurf: Quadruped Robot Capable of GRU-based Surface Recognition for Blind Person Navigation

Bazhenov, Artem, Berman, Vladimir, Satsevich, Sergei, Shalopanova, Olga, Cabrera, Miguel Altamirano, Lykov, Artem, Tsetserukou, Dzmitry

arXiv.org Artificial IntelligenceFeb-5-2024

This paper introduces DogSurf - a newapproach of using quadruped robots to help visually impaired people navigate in real world. The presented method allows the quadruped robot to detect slippery surfaces, and to use audio and haptic feedback to inform the user when to stop. A state-of-the-art GRU-based neural network architecture with mean accuracy of 99.925% was proposed for the task of multiclass surface classification for quadruped robots. A dataset was collected on a Unitree Go1 Edu robot. The dataset and code have been posted to the public domain.

artificial intelligence, machine learning, robot, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3610978.3640606

2402.03156

Country: North America > United States > Colorado > Boulder County > Boulder (0.15)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.71)

Technology:

Information Technology > Artificial Intelligence > Robots > Locomotion (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback