Goto

Collaborating Authors

 autonomous navigation


Integrated YOLOP Perception and Lyapunov-based Control for Autonomous Mobile Robot Navigation on Track

Chen, Mo

arXiv.org Artificial Intelligence

In the 1990s, the modern scientific and technological revolution marked by computer technology, microelectronics technology, information technology, network technology, etc., entered a rapid development stage, which became the intrinsic driving force to promote the development of robotics technology, and robotics technology has developed rapidly. Among them, autonomous mobile robots(AMRs) can rely on the sensors they carry to perceive and understand the external environment, make real-time decisions according to the needs of the task, carry out closed-loop control, and operate in an autonomous or semi-autonomous manner. It is a new type of robot with certain self-learning and adaptive ability in known or unknown environment. Navigation is an important problem that needs to be solved for AMRs to realize autonomous control, which refers to the process of mobile robot sensing the environment and its own state through sensors and learning, and realizing the process of pointing to the target autonomous movement in an obstructed environment. Since the first mobile robot, Shakey, was introduced in the 1960s, mobile robot navigation has been receiving a lot of attention due to its comprehensiveness and practicality [1].


EdgeNavMamba: Mamba Optimized Object Detection for Energy Efficient Edge Devices

Aalishah, Romina, Navardi, Mozhgan, Mohsenin, Tinoosh

arXiv.org Artificial Intelligence

Deployment of efficient and accurate Deep Learning models has long been a challenge in autonomous navigation, particularly for real-time applications on resource-constrained edge devices. Edge devices are limited in computing power and memory, making model efficiency and compression essential. In this work, we propose EdgeNavMamba, a reinforcement learning-based framework for goal-directed navigation using an efficient Mamba object detection model. To train and evaluate the detector, we introduce a custom shape detection dataset collected in diverse indoor settings, reflecting visual cues common in real-world navigation. The object detector serves as a pre-processing module, extracting bounding boxes (BBOX) from visual input, which are then passed to an RL policy to control goal-oriented navigation. Experimental results show that the student model achieved a reduction of 67% in size, and up to 73% in energy per inference on edge devices of NVIDIA Jetson Orin Nano and Raspberry Pi 5, while keeping the same performance as the teacher model. EdgeNavMamba also maintains high detection accuracy in MiniWorld and IsaacLab simulators while reducing parameters by 31% compared to the baseline. In the MiniWorld simulator, the navigation policy achieves over 90% success across environments of varying complexity.


CottonSim: A vision-guided autonomous robotic system for cotton harvesting in Gazebo simulation

Thayananthan, Thevathayarajh, Zhang, Xin, Huang, Yanbo, Chen, Jingdao, Wijewardane, Nuwan K., Martins, Vitor S., Chesser, Gary D., Goodin, Christopher T.

arXiv.org Artificial Intelligence

Cotton is a major cash crop in the United States, with the country being a leading global producer and exporter. Nearly all U.S. cotton is grown in the Cotton Belt, spanning 17 states in the southern region. Harvesting remains a critical yet challenging stage, impacted by the use of costly, environmentally harmful defoliants and heavy, expensive cotton pickers. These factors contribute to yield loss, reduced fiber quality, and soil compaction, which collectively threaten long-term sustainability. To address these issues, this study proposes a lightweight, small-scale, vision-guided autonomous robotic cotton picker as an alternative. An autonomous system, built on Clearpath's Husky platform and integrated with the CottonEye perception system, was developed and tested in the Gazebo simulation environment. A virtual cotton field was designed to facilitate autonomous navigation testing. The navigation system used Global Positioning System (GPS) and map-based guidance, assisted by an RGBdepth camera and a YOLOv8nseg instance segmentation model. The model achieved a mean Average Precision (mAP) of 85.2%, a recall of 88.9%, and a precision of 93.0%. The GPS-based approach reached a 100% completion rate (CR) within a $(5e-6)^{\circ}$ threshold, while the map-based method achieved a 96.7% CR within a 0.25 m threshold. The developed Robot Operating System (ROS) packages enable robust simulation of autonomous cotton picking, offering a scalable baseline for future agricultural robotics. CottonSim code and datasets are publicly available on GitHub: https://github.com/imtheva/CottonSim


Hierarchical Decision-Making for Autonomous Navigation: Integrating Deep Reinforcement Learning and Fuzzy Logic in Four-Wheel Independent Steering and Driving Systems

Wang, Yizhi, Xu, Degang, Xie, Yongfang, Tan, Shuzhong, Zhou, Xianan, Chen, Peng

arXiv.org Artificial Intelligence

This paper presents a hierarchical decision-making framework for autonomous navigation in four-wheel independent steering and driving (4WISD) systems. The proposed approach integrates deep reinforcement learning (DRL) for high-level navigation with fuzzy logic for low-level control to ensure both task performance and physical feasibility. The DRL agent generates global motion commands, while the fuzzy logic controller enforces kinematic constraints to prevent mechanical strain and wheel slippage. Simulation experiments demonstrate that the proposed framework outperforms traditional navigation methods, offering enhanced training efficiency and stability and mitigating erratic behaviors compared to purely DRL-based solutions. Real-world validations further confirm the framework's ability to navigate safely and effectively in dynamic industrial settings. Overall, this work provides a scalable and reliable solution for deploying 4WISD mobile robots in complex, real-world scenarios.


Bio-Inspired Topological Autonomous Navigation with Active Inference in Robotics

de Tinguy, Daria, Verbelen, Tim, Gamba, Emilio, Dhoedt, Bart

arXiv.org Artificial Intelligence

Achieving fully autonomous exploration and navigation remains a critical challenge in robotics, requiring integrated solutions for localisation, mapping, decision-making and motion planning. Existing approaches either rely on strict navigation rules lacking adaptability or on pre-training, which requires large datasets. These AI methods are often computationally intensive or based on static assumptions, limiting their adaptability in dynamic or unknown environments. This paper introduces a bio-inspired agent based on the Active Inference Framework (AIF), which unifies mapping, localisation, and adaptive decision-making for autonomous navigation, including exploration and goal-reaching. Our model creates and updates a topological map of the environment in real-time, planning goal-directed trajectories to explore or reach objectives without requiring pre-training. Key contributions include a probabilistic reasoning framework for interpretable navigation, robust adaptability to dynamic changes, and a modular ROS2 architecture compatible with existing navigation systems. Our method was tested in simulated and real-world environments. The agent successfully explores large-scale simulated environments and adapts to dynamic obstacles and drift, proving to be comparable to other exploration strategies such as Gbplanner, FAEL and Frontiers. This approach offers a scalable and transparent approach for navigating complex, unstructured environments.


Autonomous Navigation of Cloud-Controlled Quadcopters in Confined Spaces Using Multi-Modal Perception and LLM-Driven High Semantic Reasoning

Ahmmad, Shoaib, Aditto, Zubayer Ahmed, Hossain, Md Mehrab, Yeasmin, Noushin, Hossain, Shorower

arXiv.org Artificial Intelligence

This paper introduces an advanced AI-driven perception system for autonomous quadcopter navigation in GPS-denied indoor environments. The proposed framework leverages cloud computing to offload computationally intensive tasks and incorporates a custom-designed printed circuit board (PCB) for efficient sensor data acquisition, enabling robust navigation in confined spaces. The system integrates YOLOv11 for object detection, Depth Anything V2 for monocular depth estimation, a PCB equipped with Time-of-Flight (ToF) sensors and an Inertial Measurement Unit (IMU), and a cloud-based Large Language Model (LLM) for context-aware decision-making. A virtual safety envelope, enforced by calibrated sensor offsets, ensures collision avoidance, while a multithreaded architecture achieves low-latency processing. Enhanced spatial awareness is facilitated by 3D bounding box estimation with Kalman filtering. Experimental results in an indoor testbed demonstrate strong performance, with object detection achieving a mean Average Precision (mAP50) of 0.6, depth estimation Mean Absolute Error (MAE) of 7.2 cm, only 16 safety envelope breaches across 42 trials over approximately 11 minutes, and end-to-end system latency below 1 second. This cloud-supported, high-intelligence framework serves as an auxiliary perception and navigation system, complementing state-of-the-art drone autonomy for GPS-denied confined spaces.


Solving Scene Understanding for Autonomous Navigation in Unstructured Environments

Renji, Naveen Mathews, K, Kruthika, Keshavamurthy, Manasa, Kumari, Pooja, Rajarajeswari, S.

arXiv.org Artificial Intelligence

Autonomous vehicles are the next revolution in the automobile industry and they are expected to revolutionize the future of transportation. Understanding the scenario in which the autonomous vehicle will operate is critical for its competent functioning. Deep Learning has played a massive role in the progress that has been made till date. Semantic Segmentation, the process of annotating every pixel of an image with an object class, is one crucial part of this scene comprehension using Deep Learning. It is especially useful in Autonomous Driving Research as it requires comprehension of drivable and non-drivable areas, roadside objects and the like. In this paper semantic segmentation has been performed on the Indian Driving Dataset which has been recently compiled on the urban and rural roads of Bengaluru and Hyderabad. This dataset is more challenging compared to other datasets like Cityscapes, since it is based on unstructured driving environments. It has a four level hierarchy and in this paper segmentation has been performed on the first level. Five different models have been trained and their performance has been compared using the Mean Intersection over Union. These are UNET, UNET+RESNET50, DeepLabsV3, PSPNet and SegNet. The highest MIOU of 0.6496 has been achieved. The paper discusses the dataset, exploratory data analysis, preparation, implementation of the five models and studies the performance and compares the results achieved in the process.


CoNav Chair: Development and Evaluation of a Shared Control based Wheelchair for the Built Environment

Xu, Yifan, Wang, Qianwei, Lillie, Jordan, Kamat, Vineet, Menassa, Carol, D'Souza, Clive

arXiv.org Artificial Intelligence

As the global population of people with disabilities (PWD) continues to grow, so will the need for mobility solutions that promote independent living and social integration. Wheelchairs are vital for the mobility of PWD in both indoor and outdoor environments. The current SOTA in powered wheelchairs is based on either manually controlled or fully autonomous modes of operation, offering limited flexibility and often proving difficult to navigate in spatially constrained environments. Moreover, research on robotic wheelchairs has focused predominantly on complete autonomy or improved manual control; approaches that can compromise efficiency and user trust. To overcome these challenges, this paper introduces the CoNav Chair, a smart wheelchair based on the Robot Operating System (ROS) and featuring shared control navigation and obstacle avoidance capabilities that are intended to enhance navigational efficiency, safety, and ease of use for the user. The paper outlines the CoNav Chair's design and presents a preliminary usability evaluation comparing three distinct navigation modes, namely, manual, shared, and fully autonomous, conducted with 21 healthy, unimpaired participants traversing an indoor building environment. Study findings indicated that the shared control navigation framework had significantly fewer collisions and performed comparably, if not superior to the autonomous and manual modes, on task completion time, trajectory length, and smoothness; and was perceived as being safer and more efficient based on user reported subjective assessments of usability. Overall, the CoNav system demonstrated acceptable safety and performance, laying the foundation for subsequent usability testing with end users, namely, PWDs who rely on a powered wheelchair for mobility.


A Compendium of Autonomous Navigation using Object Detection and Tracking in Unmanned Aerial Vehicles

Arora, Mohit, Shukla, Pratyush, Chopra, Shivali

arXiv.org Artificial Intelligence

Unmanned Aerial Vehicles (UAVs) are one of the most revolutionary inventions of 21st century. At the core of a UAV lies the central processing system that uses wireless signals to control their movement. The most popular UAVs are quadcopters that use a set of four motors, arranged as two on either side with opposite spin. An autonomous UAV is called a drone. Drones have been in service in the US army since the 90's for covert missions critical to national security. It would not be wrong to claim that drones make up an integral part of the national security and provide the most valuable service during surveillance operations. While UAVs are controlled using wireless signals, there reside some challenges that disrupt the operation of such vehicles such as signal quality and range, real time processing, human expertise, robust hardware and data security. These challenges can be solved by programming UAVs to be autonomous, using object detection and tracking, through Computer Vision algorithms. Computer Vision is an interdisciplinary field that seeks the use of deep learning to gain a high-level understanding of digital images and videos for the purpose of automating the task of human visual system. Using computer vision, algorithms for detecting and tracking various objects can be developed suitable to the hardware so as to allow real time processing for immediate judgement. This paper attempts to review the various approaches several authors have proposed for the purpose of autonomous navigation of UAVs by through various algorithms of object detection and tracking in real time, for the purpose of applications in various fields such as disaster management, dense area exploration, traffic vehicle surveillance etc.


Human-like Semantic Navigation for Autonomous Driving using Knowledge Representation and Large Language Models

Ballardini, Augusto Luis, Sotelo, Miguel Ángel

arXiv.org Artificial Intelligence

--Achieving full automation in self-driving vehicles remains a challenge, especially in dynamic urban environments where navigation requires real-time adaptability. Existing systems struggle to handle navigation plans when faced with unpredictable changes in road layouts, spontaneous detours, or missing map data, due to their heavy reliance on predefined cartographic information. In this work, we explore the use of Large Language Models to generate Answer Set Programming rules by translating informal navigation instructions into structured, logic-based reasoning. ASP provides non-monotonic reasoning, allowing autonomous vehicles to adapt to evolving scenarios without relying on predefined maps. We present an experimental evaluation in which LLMs generate ASP constraints that encode real-world urban driving logic into a formal knowledge representation. By automating the translation of informal navigation instructions into logical rules, our method improves adaptability and explainability in autonomous navigation. Results show that LLM-driven ASP rule generation supports semantic-based decision-making, offering an explainable framework for dynamic navigation planning that aligns closely with how humans communicate navigational intent.