obb
Action-Informed Estimation and Planning: Clearing Clutter on Staircases via Quadrupedal Pedipulation
Sriganesh, Prasanna, Satheeshkumar, Barath, Sabnis, Anushree, Travers, Matthew
Abstract-- For robots to operate autonomously in densely cluttered environments, they must reason about and potentially physically interact with obstacles to clear a path. Safely clearing a path on challenging terrain, such as a cluttered staircase, requires controlled interaction. For example, a quadrupedal robot that pushes objects out of the way with one leg while maintaining a stable stance with its three other legs. However, tightly coupled physical actions, such as one-legged pushing, create new constraints on the system that can be difficult to predict at design time. In this work, we present a new method that addresses one such constraint, wherein the object being pushed by a quadrupedal robot with one of its legs becomes occluded from the robot's sensors during manipulation. T o address this challenge, we present a tightly coupled perception-action framework that enables the robot to perceive clutter, reason about feasible push paths, and execute the clearing maneuver . Our core contribution is an interaction-aware state estimation loop that uses proprioceptive feedback regarding foot contact and leg position to predict an object's displacement during the occlusion. This prediction guides the perception system to robustly re-detect the object after the interaction, closing the loop between action and sensing to enable accurate tracking even after partial pushes. Using this feedback allows the robot to learn from physical outcomes, reclassifying an object as immovable if a push fails due to it being too heavy. We present results of implementing our approach on a Boston Dynamics Spot robot that show our interaction-aware approach achieves higher task success rates and tracking accuracy in pushing objects on stairs compared to open-loop baselines.
Real2Code: Reconstruct Articulated Objects via Code Generation
Mandi, Zhao, Weng, Yijia, Bauer, Dominik, Song, Shuran
We present Real2Code, a novel approach to reconstructing articulated objects via code generation. Given visual observations of an object, we first reconstruct its part geometry using an image segmentation model and a shape completion model. We then represent the object parts with oriented bounding boxes, which are input to a fine-tuned large language model (LLM) to predict joint articulation as code. By leveraging pre-trained vision and language models, our approach scales elegantly with the number of articulated parts, and generalizes from synthetic training data to real world objects in unstructured environments. Experimental results demonstrate that Real2Code significantly outperforms previous state-of-the-art in reconstruction accuracy, and is the first approach to extrapolate beyond objects' structural complexity in the training set, and reconstructs objects with up to 10 articulated parts. When incorporated with a stereo reconstruction model, Real2Code also generalizes to real world objects from a handful of multi-view RGB images, without the need for depth or camera information.
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
- Research Report > Promising Solution (0.34)
- Research Report > New Finding (0.34)
Theoretically Achieving Continuous Representation of Oriented Bounding Boxes
Xiao, Zi-Kai, Yang, Guo-Ye, Yang, Xue, Mu, Tai-Jiang, Yan, Junchi, Hu, Shi-min
Considerable efforts have been devoted to Oriented Object Detection (OOD). However, one lasting issue regarding the discontinuity in Oriented Bounding Box (OBB) representation remains unresolved, which is an inherent bottleneck for extant OOD methods. This paper endeavors to completely solve this issue in a theoretically guaranteed manner and puts an end to the ad-hoc efforts in this direction. Prior studies typically can only address one of the two cases of discontinuity: rotation and aspect ratio, and often inadvertently introduce decoding discontinuity, e.g. Decoding Incompleteness (DI) and Decoding Ambiguity (DA) as discussed in literature. Specifically, we propose a novel representation method called Continuous OBB (COBB), which can be readily integrated into existing detectors e.g. Faster-RCNN as a plugin. It can theoretically ensure continuity in bounding box regression which to our best knowledge, has not been achieved in literature for rectangle-based object representation. For fairness and transparency of experiments, we have developed a modularized benchmark based on the open-source deep learning framework Jittor's detection toolbox JDet for OOD evaluation. On the popular DOTA dataset, by integrating Faster-RCNN as the same baseline model, our new method outperforms the peer method Gliding Vertex by 1.13% mAP50 (relative improvement 1.54%), and 2.46% mAP75 (relative improvement 5.91%), without any tricks.
- North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
- North America > United States > Nevada > Clark County > Las Vegas (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (13 more...)
Haptic-Enhanced Virtual Reality Simulator for Robot-Assisted Femur Fracture Surgery
Alruwaili, Fayez H., Halim-Banoub, David W., Rodgers, Jessica, Dalkilic, Adam, Haydel, Christopher, Parvizi, Javad, Iordachita, Iulian I., Abedin-Nasab, Mohammad H.
In this paper, we develop a virtual reality (VR) simulator for the Robossis robot-assisted femur fracture surgery. Due to the steep learning curve for such procedures, a VR simulator is essential for training surgeon(s) and staff. The Robossis Surgical Simulator (RSS) is designed to immerse user(s) in a realistic surgery setting using the Robossis system as completed in a previous real-world cadaveric procedure. The RSS is designed to interface the Sigma-7 Haptic Controller with the Robossis Surgical Robot (RSR) and the Meta Quest VR headset. Results show that the RSR follows user commands in 6 DOF and prevents the overlapping of bone segments. This development demonstrates a promising avenue for future implementation of the Robossis system.
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > New Jersey > Gloucester County > Glassboro (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- Europe > Switzerland (0.04)
- Research Report > Strength High (0.46)
- Research Report > Experimental Study (0.46)
- Health & Medicine > Therapeutic Area > Orthopedics/Orthopedic Surgery (1.00)
- Health & Medicine > Surgery (1.00)
A Deep Learning Approach to Teeth Segmentation and Orientation from Panoramic X-rays
Dhar, Mrinal Kanti, Deb, Mou, Madhab, D., Yu, Zeyun
- Accurate teeth segmentation and orientation are fundamental in modern oral healthcare, enabling precise diagnosis, treatment planning, and dental implant design. In this study, we present a comprehensive approach to teeth segmentation and orientation from panoramic X-ray images, leveraging deep learning techniques. We build our model based on FUSegNet, a popular model originally developed for wound segmentation, and introduce modifications by incorporating grid-based attention gates into the skip connections. We introduce oriented bounding box (OBB) generation through principal component analysis (PCA) for precise tooth orientation estimation. Evaluating our approach on the publicly available DNS dataset, comprising 543 panoramic X-ray images, we achieve the highest Intersection-over-Union (IoU) score of 82.43% and Dice Similarity Coefficient (DSC) score of 90.37% among compared models in teeth instance segmentation. In OBB analysis, we obtain the Rotated IoU (RIoU) score of 82.82%. We also conduct detailed analyses of individual tooth labels and categorical performance, shedding light on strengths and weaknesses. The proposed model's accuracy and versatility offer promising prospects for improving dental diagnoses, treatment planning, and personalized healthcare in the oral domain.
- Asia > Bangladesh (0.04)
- North America > United States > South Dakota (0.04)
- Europe > Switzerland (0.04)
- (2 more...)
Occlusion-Resistant LiDAR Fiducial Marker Detection
Liu, Yibo, Shan, Jinjun, Schofield, Hunter
The LiDAR fiducial marker, akin to the well-known AprilTag used in camera applications, serves as a convenient resource to impart artificial features to the LiDAR sensor, facilitating robotics applications. Unfortunately, current LiDAR fiducial marker detection methods are limited to occlusion-free point clouds. In this work, we present a novel approach for occlusion-resistant LiDAR fiducial marker detection. We first extract 3D points potentially corresponding to the markers, leveraging the 3D intensity gradients. Afterward, we analyze the 3D spatial distribution of the extracted points through clustering. Subsequently, we determine the potential marker locations by examining the geometric characteristics of these clusters. We then successively transfer the 3D points that fall within the candidate locations from the raw point cloud onto a designed intermediate plane. Finally, using the intermediate plane, we validate each location for the presence of a fiducial marker and compute the marker's pose if found. We conduct both qualitative and quantitative experiments to demonstrate that our approach is the first LiDAR fiducial marker detection method applicable to point clouds with occlusion while achieving better accuracy.
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.87)
- Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.68)
Describing Spatio-Temporal Relations between Object Volumes in Video Streams
Harbi, Nouf Al (The University of Sheffield) | Gotoh, Yoshihiko (The University of Sheffield)
This paper is concerned with extension of AngledCORE-9 by Sokeh, Gould, and Renz, a comprehensive representation of spatial information that can be efficiently extracted from interacting objects present in video using their approximated bounding box. Spatial information is important for identification of relation between multiple objects, hence the work is a step forward for tasks such as semantics content analysis and visual information access. To that end we present an approach to incorporating the spatiotemporal volume of objects into AngledCORE-9. The approach is able to detect, track and segment object volumes from a video stream, based on which spatial information is identified in an efficient manner. Accurate spatial and temporal information can be obtained by precise representation of the shape region and the oriented bounding box. A human action classification task is adopted in order to assess the performance of the approach. The experiment with two challenging datasets indicates that the outcome of this approach is comparable to the state-of-the-art.
- Europe > United Kingdom > England > South Yorkshire > Sheffield (0.04)
- Asia > Middle East > Saudi Arabia (0.04)