rov
RoVer: Robot Reward Model as Test-Time Verifier for Vision-Language-Action Model
Dai, Mingtong, Liu, Lingbo, Bai, Yongjie, Liu, Yang, Wang, Zhouxia, SU, Rui, Chen, Chunjie, Lin, Liang, Wu, Xinyu
Vision-Language-Action (VLA) models have become a prominent paradigm for embodied intelligence, yet further performance improvements typically rely on scaling up training data and model size -- an approach that is prohibitively expensive for robotics and fundamentally limited by data collection costs. We address this limitation with $\mathbf{RoVer}$, an embodied test-time scaling framework that uses a $\mathbf{Ro}$bot Process Reward Model (PRM) as a Test-Time $\mathbf{Ver}$ifier to enhance the capabilities of existing VLA models without modifying their architectures or weights. Specifically, RoVer (i) assigns scalar-based process rewards to evaluate the reliability of candidate actions, and (ii) predicts an action-space direction for candidate expansion/refinement. During inference, RoVer generates multiple candidate actions concurrently from the base policy, expands them along PRM-predicted directions, and then scores all candidates with PRM to select the optimal action for execution. Notably, by caching shared perception features, it can amortize perception cost and evaluate more candidates under the same test-time computational budget. Essentially, our approach effectively transforms available computing resources into better action decision-making, realizing the benefits of test-time scaling without extra training overhead. Our contributions are threefold: (1) a general, plug-and-play test-time scaling framework for VLAs; (2) a PRM that jointly provides scalar process rewards and an action-space direction to guide exploration; and (3) an efficient direction-guided sampling strategy that leverages a shared perception cache to enable scalable candidate generation and selection during inference.
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
SubSense: VR-Haptic and Motor Feedback for Immersive Control in Subsea Telerobotics
Chen, Ruo, Blow, David, Abdullah, Adnan, Islam, Md Jahidul
Abstract-- This paper investigates the integration of haptic feedback and virtual reality (VR) control interfaces to enhance teleoperation and telemanipulation of underwater ROVs (remotely operated vehicles). Traditional ROV teleoperation relies on low-resolution 2D camera feeds and lacks immersive and sensory feedback, which diminishes situational awareness in complex subsea environments. We propose SubSense - a novel VR-Haptic framework incorporating a non-invasive feedback interface to an otherwise 1-DOF (degree of freedom) manipulator, which is paired with the teleoperator's glove to provide haptic feedback and grasp status. Our results highlight the potential of multisensory feedback in immersive virtual environments to significantly improve remote situational awareness and mission performance, offering more intuitive and accessible ROV operations in the field. Remotely Operated V ehicles (ROVs) are indispensable tools in the marine industry, offering a safer and more cost-effective alternative to human divers [1]. Underwater ROVs are versatile platforms supporting a range of missions, from routine imaging and infrastructure inspection to complex tasks such as environmental monitoring [2], maintaining sub-sea infrastructure [3], [4], performing mine countermeasure and explosive ordinance disposal [5], salvaging, search-and-rescue [6], and deep-water expeditions [7]. With over 79% of subsea deployments done by ROVs, they play a crucial role in commerce, military, and science - enabling us to explore beyond the limits of human scuba divers [8]. Despite growing industrial demands and recent advancements, underwater ROVs still have inherent limitations, particularly in their immersive control and interaction capabilities.
- Asia > Vietnam > Hanoi > Hanoi (0.05)
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > United States > Maryland (0.04)
- Electrical Industrial Apparatus (1.00)
- Government > Military (0.55)
AquaChat++: LLM-Assisted Multi-ROV Inspection for Aquaculture Net Pens with Integrated Battery Management and Thruster Fault Tolerance
Saad, Abdelhaleem, Akram, Waseem, Hussain, Irfan
The global demand for aquaculture has surged over the past decade, driving the expansion of offshore fish farming systems such as net pens [1, 2]. These structures, while effective for large-scale fish production, are continuously exposed to harsh marine environments that can degrade structural integrity, compromise biosecurity, and increase the risk of fish escape or environmental contamination [3]. As a result, regular and reliable inspection of aquaculture net pens is critical to ensuring operational safety, productivity, and regulatory compliance [4]. Recent advances in underwater robotics, control systems, and computer vision have enabled significant progress in autonomous inspection [5, 6]. Remotely Operated Vehicles (ROVs), in particular, offer a practical platform for deploying sensing payloads such as cameras, sonars and performing close-range inspection in confined underwater environments [7]. However, most existing ROV-based systems operate in isolation, with limited autonomy and minimal adaptability to dynamic conditions such as power constraints, actuator degradation, and evolving mission demands [8, 9]. Moreover, mission planning and coordination typically require expert operators, limiting the scalability and responsiveness of these systems in real-world aquaculture operations [10, 11, 12]. To address these challenges, we propose AquaChat++, a novel framework that combines the reasoning capabilities of Large Language Models (LLMs) with multi-ROV coordination, battery-aware mission planning, and fault-tolerant control [13, 14]. Unlike traditional inspection pipelines that rely on fixed scripts or manual supervision, AquaChat++ enables natural language-driven task planning and dynamic allocation across multiple ROVs.
- North America > United States (0.04)
- Atlantic Ocean > North Atlantic Ocean > Norwegian Sea (0.04)
- Asia > Middle East > UAE (0.04)
- Research Report (0.82)
- Workflow (0.67)
- Electrical Industrial Apparatus (1.00)
- Food & Agriculture > Fishing (0.66)
- Government > Military (0.54)
13 World War II shipwrecks captured in stunning detail
Breakthroughs, discoveries, and DIY tips sent every weekday. Judging by newly released photos and video, the crew aboard Ocean Exploration Trust's Nautilus research vessel had an extremely productive summer trip to the South Pacific. Over 22 days, the team completed detailed archaeological surveys of more than a dozen shipwrecks sunk amid the Solomon Islands campaign during World War II. In addition to imaging four of them for the first time, experts guided remotely operated vehicles (ROVs) towards the rediscovery of two long-lost vessels:the separated bow from the USS New Orleans as well as the Imperial Japanese Naval destroyer Teruzuki. Although researchers originally spotted some of these shipwrecks more than 34 years ago, Ocean Exploration Trust president Robert Ballard explained that the most recent trip to Iron Bottom Sound provided opportunities to document their finds using a new generation of technology including high-definition survey cameras, underwater vehicles, and imaging tools aboard the EV Nautilus.
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.26)
- Oceania > Solomon Islands > Guadalcanal Province > Guadalcanal Island > Honiara (0.06)
- Oceania > Australia (0.06)
- Asia > Japan (0.06)
AquaChat: An LLM-Guided ROV Framework for Adaptive Inspection of Aquaculture Net Pens
Akram, Waseem, Din, Muhayy Ud, Saad, Abdelhaleem, Hussain, Irfan
Inspection of aquaculture net pens is essential for maintaining the structural integrity, biosecurity, and operational efficiency of fish farming systems. Traditional inspection approaches rely on pre-programmed missions or manual control, offering limited adaptability to dynamic underwater conditions and user-specific demands. In this study, we propose AquaChat, a novel Remotely Operated Vehicle (ROV) framework that integrates Large Language Models (LLMs) for intelligent and adaptive net pen inspection. The system features a multi-layered architecture: (1) a high-level planning layer that interprets natural language user commands using an LLM to generate symbolic task plans; (2) a mid-level task manager that translates plans into ROV control sequences; and (3) a low-level motion control layer that executes navigation and inspection tasks with precision. Real-time feedback and event-triggered replanning enhance robustness in challenging aquaculture environments. The framework is validated through experiments in both simulated and controlled aquatic environments representative of aquaculture net pens. Results demonstrate improved task flexibility, inspection accuracy, and operational efficiency. AquaChat illustrates the potential of integrating language-based AI with marine robotics to enable intelligent, user-interactive inspection systems for sustainable aquaculture operations.
- North America > United States > Colorado (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Asia > Singapore (0.04)
- Asia > Middle East > UAE (0.04)
- Electrical Industrial Apparatus (1.00)
- Food & Agriculture > Fishing (0.88)
REACT: Real-time Entanglement-Aware Coverage Path Planning for Tethered Underwater Vehicles
Amer, Abdelhakim, Mehindratta, Mohit, Brodskiy, Yury, Wehbe, Bilal, Kayacan, Erdal
Inspection of complex underwater structures with tethered underwater vehicles is often hindered by the risk of tether entanglement. We propose REACT (real-time entanglement-aware coverage path planning for tethered underwater vehicles), a framework designed to overcome this limitation. REACT comprises a fast geometry-based tether model using the signed distance field (SDF) map for accurate, real-time simulation of taut tether configurations around arbitrary structures in 3D. This model enables an efficient online replanning strategy by enforcing a maximum tether length constraint, thereby actively preventing entanglement. By integrating REACT into a coverage path planning framework, we achieve safe and optimal inspection paths, previously challenging due to tether constraints. The complete REACT framework's efficacy is validated in a pipe inspection scenario, demonstrating safe, entanglement-free navigation and full-coverage inspection. Simulation results show that REACT achieves complete coverage while maintaining tether constraints and completing the total mission 20% faster than conventional planners, despite a longer inspection time due to proactive avoidance of entanglement that eliminates extensive post-mission disentanglement. Real-world experiments confirm these benefits, where REACT completes the full mission, while the baseline planner fails due to physical tether entanglement.
TritonZ: A Remotely Operated Underwater Rover with Manipulator Arm for Exploration and Rescue Operations
Ahmed, Kawser, Fardin, Mir Shahriar, Nayem, Md Arif Faysal, Hafiz, Fahim, Shatabda, Swakkhar
The increasing demand for underwater exploration and rescue operations enforces the development of advanced wireless or semi-wireless underwater vessels equipped with manipulator arms. This paper presents the implementation of a semi-wireless underwater vehicle, "TritonZ" equipped with a manipulator arm, tailored for effective underwater exploration and rescue operations. The vehicle's compact design enables deployment in different submarine surroundings, addressing the need for wireless systems capable of navigating challenging underwater terrains. The manipulator arm can interact with the environment, allowing the robot to perform sophisticated tasks during exploration and rescue missions in emergency situations. TritonZ is equipped with various sensors such as Pi-Camera, Humidity, and Temperature sensors to send real-time environmental data. Our underwater vehicle controlled using a customized remote controller can navigate efficiently in the water where Pi-Camera enables live streaming of the surroundings. Motion control and video capture are performed simultaneously using this camera. The manipulator arm is designed to perform various tasks, similar to grasping, manipulating, and collecting underwater objects. Experimental results shows the efficacy of the proposed remotely operated vehicle in performing a variety of underwater exploration and rescue tasks. Additionally, the results show that TritonZ can maintain an average of 13.5cm/s with a minimal delay of 2-3 seconds. Furthermore, the vehicle can sustain waves underwater by maintaining its position as well as average velocity. The full project details and source code can be accessed at this link: https://github.com/kawser-ahmed-byte/TritonZ
- Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)
- Asia > Malaysia (0.04)
- Asia > India > Tamil Nadu > Chennai (0.04)
Enhancing Situational Awareness in Underwater Robotics with Multi-modal Spatial Perception
Kaveti, Pushyami, Waldum, Ambjorn Grimsrud, Singh, Hanumant, Ludvigsen, Martin
Autonomous Underwater Vehicles (AUVs) and Remotely Operated Vehicles (ROVs) demand robust spatial perception capabilities, including Simultaneous Localization and Mapping (SLAM), to support both remote and autonomous tasks. Vision-based systems have been integral to these advancements, capturing rich color and texture at low cost while enabling semantic scene understanding. However, underwater conditions -- such as light attenuation, backscatter, and low contrast -- often degrade image quality to the point where traditional vision-based SLAM pipelines fail. Moreover, these pipelines typically rely on monocular or stereo inputs, limiting their scalability to the multi-camera configurations common on many vehicles. To address these issues, we propose to leverage multi-modal sensing that fuses data from multiple sensors-including cameras, inertial measurement units (IMUs), and acoustic devices-to enhance situational awareness and enable robust, real-time SLAM. We explore both geometric and learning-based techniques along with semantic analysis, and conduct experiments on the data collected from a work-class ROV during several field deployments in the Trondheim Fjord. Through our experimental results, we demonstrate the feasibility of real-time reliable state estimation and high-quality 3D reconstructions in visually challenging underwater conditions. We also discuss system constraints and identify open research questions, such as sensor calibration, limitations with learning-based methods, that merit further exploration to advance large-scale underwater operations.
- Europe > Norway > Central Norway > Trøndelag > Trondheim (0.25)
- North America > United States (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
SOLAQUA: SINTEF Ocean Large Aquaculture Robotics Dataset
Ohrem, Sveinung Johan, Haugaløkken, Bent, Kelasidi, Eleni
--This paper presents a dataset gathered with an underwater robot in a sea-based aquaculture setting. Data was gathered from an operational fish farm and includes data from sensors such as the Waterlinked A50 DVL, the Nortek Nucleus 1000 DVL, Sonardyne Micro Ranger 2 USBL, Sonoptix Mulitbeam Sonar, mono and stereo cameras, and vehicle sensor data such as power usage, IMU, pressure, temperature, and more. Data acquisition is performed during both manual and autonomous traversal of the net pen structure. The collected vision data is of undamaged nets with some fish and marine growth presence, and it is expected that both the research community and the aquaculture industry will benefit greatly from the utilization of the proposed SOLAQUA dataset. Aquaculture is and will be an important contributor to the production of protein and food in the years to come.
100 years of deep-sea filmmaking and ocean exploration
When Hans Hartman, a civil engineer, attempted to film the ocean depths in 1917, he pioneered what would become the first deep-sea ROV, or remotely operated vehicle. During an era of silent movies and wartime U-boats, Hartman's ambitious invention--a 1,500-pound electric, submarine camera--could be lowered to a depth of 1,000 feet to capture images of sunken ships and submerged treasures. Despite featuring a gyroscope for stability, a motorized propeller for controlled rotation, and an innovative light source, as Popular Science explained, it had a serious limitation: The hulking apparatus had to be operated blindly from a ship's deck, which meant it was impossible for the camera's operator to see what they were filming until the footage was viewed later. In 1925, Popular Science showcased his next breakthrough--a cylindrical apparatus (seen above) attached to a ship by a cable, housing a submersible, motor-driven camera, as well as enough room for a person who could control the camera, or communicate with crew members nearby to aid with various underwater missions, such as salvaging. The vertical, tin-can-like submarine, equipped with porthole windows and a powerful spotlight, allowed "the operator to go down into the water with a camera and photograph whatever he chooses."
- Media > Film (0.77)
- Leisure & Entertainment (0.77)