AITopics | Konenkov, Mikhail

Collaborating Authors

Konenkov, Mikhail

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

VR-GPT: Visual Language Model for Intelligent Virtual Reality Applications

Konenkov, Mikhail, Lykov, Artem, Trinitatova, Daria, Tsetserukou, Dzmitry

arXiv.org Artificial IntelligenceJul-11-2024

The advent of immersive Virtual Reality applications has transformed various domains, yet their integration with advanced artificial intelligence technologies like Visual Language Models remains underexplored. This study introduces a pioneering approach utilizing VLMs within VR environments to enhance user interaction and task efficiency. Leveraging the Unity engine and a custom-developed VLM, our system facilitates real-time, intuitive user interactions through natural language processing, without relying on visual text instructions. The incorporation of speech-to-text and text-to-speech technologies allows for seamless communication between the user and the VLM, enabling the system to guide users through complex tasks effectively. Preliminary experimental results indicate that utilizing VLMs not only reduces task completion times but also improves user comfort and task engagement compared to traditional VR interaction methods.

artificial intelligence, intelligent virtual reality application, natural language, (3 more...)

arXiv.org Artificial Intelligence

2405.11537

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (0.89)

Add feedback

Co-driver: VLM-based Autonomous Driving Assistant with Human-like Behavior and Understanding for Complex Road Scenes

Guo, Ziang, Lykov, Artem, Yagudin, Zakhar, Konenkov, Mikhail, Tsetserukou, Dzmitry

arXiv.org Artificial IntelligenceMay-9-2024

Recent research about Large Language Model based autonomous driving solutions shows a promising picture in planning and control fields. However, heavy computational resources and hallucinations of Large Language Models continue to hinder the tasks of predicting precise trajectories and instructing control signals. To address this problem, we propose Co-driver, a novel autonomous driving assistant system to empower autonomous vehicles with adjustable driving behaviors based on the understanding of road scenes. A pipeline involving the CARLA simulator and Robot Operating System 2 (ROS2) verifying the effectiveness of our system is presented, utilizing a single Nvidia 4090 24G GPU while exploiting the capacity of textual output of the Visual Language Model. Besides, we also contribute a dataset containing an image set and a corresponding prompt set for fine-tuning the Visual Language Model module of our system. In the real-world driving dataset, our system achieved 96.16% success rate in night scenes and 89.7% in gloomy scenes regarding reasonable predictions. Our Co-driver dataset will be released at https://github.com/ZionGo6/Co-driver.

artificial intelligence, autonomous driving, natural language, (11 more...)

arXiv.org Artificial Intelligence

2405.05885

Country: Europe > Russia (0.14)

Genre: Research Report (0.82)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)
Information Technology > Robotics & Automation (0.97)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

FlockGPT: Guiding UAV Flocking with Linguistic Orchestration

Lykov, Artem, Karaf, Sausar, Martynov, Mikhail, Serpiva, Valerii, Fedoseev, Aleksey, Konenkov, Mikhail, Tsetserukou, Dzmitry

arXiv.org Artificial IntelligenceMay-9-2024

This article presents the world's first rapid drone flocking control using natural language through generative AI. The described approach enables the intuitive orchestration of a flock of any size to achieve the desired geometry. The key feature of the method is the development of a new interface based on Large Language Models to communicate with the user and to generate the target geometry descriptions. Users can interactively modify or provide comments during the construction of the flock geometry model. By combining flocking technology and defining the target surface using a signed distance function, smooth and adaptive movement of the drone swarm between target states is achieved. Our user study on FlockGPT confirmed a high level of intuitive control over drone flocking by users. Subjects who had never previously controlled a swarm of drones were able to construct complex figures in just a few iterations and were able to accurately distinguish the formed swarm drone figures. The results revealed a high recognition rate for six different geometric patterns generated through the LLM-based interface and performed by a simulated drone flock (mean of 80% with a maximum of 93\% for cube and tetrahedron patterns). Users commented on low temporal demand (19.2 score in NASA-TLX), high performance (26 score in NASA-TLX), attractiveness (1.94 UEQ score), and hedonic quality (1.81 UEQ score) of the developed system. The FlockGPT demo code repository can be found at: coming soon

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2405.05872

Country: North America > United States (0.70)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.35)

Add feedback

HawkDrive: A Transformer-driven Visual Perception System for Autonomous Driving in Night Scene

Guo, Ziang, Perminov, Stepan, Konenkov, Mikhail, Tsetserukou, Dzmitry

arXiv.org Artificial IntelligenceMay-6-2024

Many established vision perception systems for autonomous driving scenarios ignore the influence of light conditions, one of the key elements for driving safety. To address this problem, we present HawkDrive, a novel perception system with hardware and software solutions. Hardware that utilizes stereo vision perception, which has been demonstrated to be a more reliable way of estimating depth information than monocular vision, is partnered with the edge computing device Nvidia Jetson Xavier AGX. Our software for low light enhancement, depth estimation, and semantic segmentation tasks, is a transformer-based neural network. Our software stack, which enables fast inference and noise reduction, is packaged into system modules in Robot Operating System 2 (ROS2). Our experimental results have shown that the proposed end-to-end system is effective in improving the depth estimation and semantic segmentation performance. Our dataset and codes will be released at https://github.com/ZionGo6/HawkDrive.

artificial intelligence, machine learning, module, (18 more...)

arXiv.org Artificial Intelligence

2404.04653

Country:

Europe > Russia (0.17)
Asia (0.15)

Genre: Research Report (1.00)

Industry:

Transportation > Ground > Road (1.00)
Information Technology > Robotics & Automation (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

CognitiveOS: Large Multimodal Model based System to Endow Any Type of Robot with Generative AI

Lykov, Artem, Konenkov, Mikhail, Gbagbe, Koffivi Fidèle, Litvinov, Mikhail, Peter, Robinroy, Davletshin, Denis, Fedoseev, Aleksey, Kobzarev, Oleg, Alabbas, Ali, Alyounes, Oussama, Cabrera, Miguel Altamirano, Tsetserukou, Dzmitry

arXiv.org Artificial IntelligenceJan-29-2024

In cognitive robotics, the scientific community recognized the high generalization capability of these large models as a key to developing a robot that could perform new tasks based on generalized knowledge derived from familiar actions expressed in natural language. However, efforts to apply LLMs in robotics faced challenges, particularly in understanding and processing the external world. Previous attempts to convey the model's understanding of the world through text-only approaches [1], [20], [8] struggled with ambiguities and the assumption of static objects unless interacted with. The introduction of multi-modal transformer-based models such as GPT-4 [16] and Gemini [18], capable of processing images, opened up new possibilities for robotics [5], allowing robots to comprehend their environment and enhancing their'Embodied Experience' [15]. Cognitive robots have been developed on various platforms, ranging from mobile manipulators [5], [3] to bio-inspired humanoid robots [21] and quadrupedal robots [6]. In the latter, cognitive abilities were developed using an'Inner Monologue' approach [10], with improvements inspired by the'Autogen' concept [25]. The cognition of the robot is facilitated through internal communication between agent models, leveraging their strengths to provide different cognitive capabilities to the system.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2401.16205

Country: North America > United States > Colorado (0.14)

Genre: Research Report (0.40)

Industry: Health & Medicine > Consumer Health (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.50)

Add feedback

CognitiveDog: Large Multimodal Model Based System to Translate Vision and Language into Action of Quadruped Robot

Lykov, Artem, Litvinov, Mikhail, Konenkov, Mikhail, Prochii, Rinat, Burtsev, Nikita, Abdulkarim, Ali Alridha, Bazhenov, Artem, Berman, Vladimir, Tsetserukou, Dzmitry

arXiv.org Artificial IntelligenceJan-17-2024

This paper introduces CognitiveDog, a pioneering development of quadruped robot with Large Multi-modal Model (LMM) that is capable of not only communicating with humans verbally but also physically interacting with the environment through object manipulation. The system was realized on Unitree Go1 robot-dog equipped with a custom gripper and demonstrated autonomous decision-making capabilities, independently determining the most appropriate actions and interactions with various objects to fulfill user-defined tasks. These tasks do not necessarily include direct instructions, challenging the robot to comprehend and execute them based on natural language input and environmental cues. The paper delves into the intricacies of this system, dataset characteristics, and the software architecture. Key to this development is the robot's proficiency in navigating space using Visual-SLAM, effectively manipulating and transporting objects, and providing insightful natural language commentary during task execution. Experimental results highlight the robot's advanced task comprehension and adaptability, underscoring its potential in real-world applications. The dataset used to fine-tune the robot-dog behavior generation model is provided at the following link: huggingface.co/datasets/ArtemLykov/CognitiveDog_dataset

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2401.09388

Country: North America > United States (0.17)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.96)

Add feedback

AirTouch: Towards Safe Human-Robot Interaction Using Air Pressure Feedback and IR Mocap System

Rakhmatulin, Viktor, Grankin, Denis, Konenkov, Mikhail, Davidenko, Sergei, Trinitatova, Daria, Sautenkov, Oleg, Tsetserukou, Dzmitry

arXiv.org Artificial IntelligenceJul-31-2023

The growing use of robots in urban environments has raised concerns about potential safety hazards, especially in public spaces where humans and robots may interact. In this paper, we present a system for safe human-robot interaction that combines an infrared (IR) camera with a wearable marker and airflow potential field. IR cameras enable real-time detection and tracking of humans in challenging environments, while controlled airflow creates a physical barrier that guides humans away from dangerous proximity to robots without the need for wearable devices. A preliminary experiment was conducted to measure the accuracy of the perception of safety barriers rendered by controlled air pressure. In a second experiment, we evaluated our approach in an imitation scenario of an interaction between an inattentive person and an autonomous robotic system. Experimental results show that the proposed system significantly improves a participant's ability to maintain a safe distance from the operating robot compared to trials without the system.

artificial intelligence, participant, robot, (13 more...)

arXiv.org Artificial Intelligence

2308.00096

Country:

Europe > Switzerland (0.14)
Europe > Russia (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Hardware (0.35)

Technology: Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.64)

Add feedback