overlay
IGUANA: Immersive Guidance, Navigation, and Control for Consumer UAV
Victor, Victor, Krisanty, Tania, McGinity, Matthew, Gumhold, Stefan, Aßmann, Uwe
As the markets for unmanned aerial vehicles (UAVs) and mixed reality (MR) headsets continue to grow, recent research has increasingly explored their integration, which enables more intuitive, immersive, and situationally aware control systems. We present IGUANA, an MR-based immersive guidance, navigation, and control system for consumer UAVs. IGUANA introduces three key elements beyond conventional control interfaces: (1) a 3D terrain map interface with draggable waypoint markers and live camera preview for high-level control, (2) a novel spatial control metaphor that uses a virtual ball as a physical analogy for low-level control, and (3) a spatial overlay that helps track the UAV when it is not visible with the naked eye or visual line of sight is interrupted. We conducted a user study to evaluate our design, both quantitatively and qualitatively, and found that (1) the 3D map interface is intuitive and easy to use, relieving users from manual control and suggesting improved accuracy and consistency with lower perceived workload relative to conventional dual-stick controller, (2) the virtual ball interface is intuitive but limited by the lack of physical feedback, and (3) the spatial overlay is very useful in enhancing the users' situational awareness.
- North America > Canada > Quebec > Montreal (0.05)
- North America > United States > New York > New York County > New York City (0.05)
- Europe > Germany > Saxony > Dresden (0.05)
- (9 more...)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study > Negative Result (0.68)
- Government (1.00)
- Information Technology > Robotics & Automation (0.88)
References [1 ]
Mahmoud Assran et al. "Stochastic Gradient Push for Distributed Deep Learning". Keith Bonawitz et al. "Practical secure aggregation for privacy-preserving machine learning". Pierre Courtiol et al. "Deep learning-based classification of mesothelioma improves prediction "Distributed nonconvex optimization over time-varying networks". "Dual Averaging for Distributed Optimization: Matt Fredrikson, Somesh Jha, and Thomas Ristenpart. "Model inversion attacks that exploit Transactions on The Built Environment 37 (1998). Zhanhong Jiang et al. "Collaborative deep learning in fixed topology networks". Can Karakus et al. "Straggler Mitigation in Distributed Optimization Through Data Encoding". "Federated Optimization:Distributed Optimization Beyond the Datacenter". Jakub Konecný et al. "Federated Optimization: Distributed Machine Learning for On-Device Songze Li et al. "Near-Optimal Straggler Mitigation for Distributed Gradient Methods".
- Europe > Sweden > Stockholm > Stockholm (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (7 more...)
- Health & Medicine (0.87)
- Information Technology > Services (0.46)
- Europe > France > Provence-Alpes-Côte d'Azur (0.05)
- North America > Canada (0.04)
Learning to Land Anywhere: Transferable Generative Models for Aircraft Trajectories
Larsen, Olav Finne Praesteng, Ruocco, Massimiliano, Spitieris, Michail, Murad, Abdulmajid, Ragosta, Martina
Access to trajectory data is a key requirement for developing and validating Air Traffic Management (ATM) solutions, yet many secondary and regional airports face severe data scarcity. This limits the applicability of machine learning methods and the ability to perform large-scale simulations or "what-if" analyses. In this paper, we investigate whether generative models trained on data-rich airports can be efficiently adapted to data-scarce airports using transfer learning. We adapt state-of-the-art diffusion- and flow-matching-based architectures to the aviation domain and evaluate their transferability between Zurich (source) and Dublin (target) landing trajectory datasets. Models are pretrained on Zurich and fine-tuned on Dublin with varying amounts of local data, ranging from 0% to 100%. Results show that diffusion-based models achieve competitive performance with as little as 5% of the Dublin data and reach baseline-level performance around 20%, consistently outperforming models trained from scratch across metrics and visual inspections. Latent flow matching and latent diffusion models also benefit from pretraining, though with more variable gains, while flow matching models show weaker generalization. Despite challenges in capturing rare trajectory patterns, these findings demonstrate the potential of transfer learning to substantially reduce data requirements for trajectory generation in ATM, enabling realistic synthetic data generation even in environments with limited historical records.
- Europe > Switzerland > Zürich > Zürich (0.44)
- Europe > Norway > Central Norway > Trøndelag > Trondheim (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
FST.ai 2.0: An Explainable AI Ecosystem for Fair, Fast, and Inclusive Decision-Making in Olympic and Paralympic Taekwondo
Shariatmadar, Keivan, Osman, Ahmad, Ray, Ramin, Kim, Kisam
Fair, transparent, and explainable decision-making remains a critical challenge in Olympic and Paralympic combat sports. This paper presents \emph{FST.ai 2.0}, an explainable AI ecosystem designed to support referees, coaches, and athletes in real time during Taekwondo competitions and training. The system integrates {pose-based action recognition} using graph convolutional networks (GCNs), {epistemic uncertainty modeling} through credal sets, and {explainability overlays} for visual decision support. A set of {interactive dashboards} enables human--AI collaboration in referee evaluation, athlete performance analysis, and Para-Taekwondo classification. Beyond automated scoring, FST.ai~2.0 incorporates modules for referee training, fairness monitoring, and policy-level analytics within the World Taekwondo ecosystem. Experimental validation on competition data demonstrates an {85\% reduction in decision review time} and {93\% referee trust} in AI-assisted decisions. The framework thus establishes a transparent and extensible pipeline for trustworthy, data-driven officiating and athlete assessment. By bridging real-time perception, explainable inference, and governance-aware design, FST.ai~2.0 represents a step toward equitable, accountable, and human-aligned AI in sports.
- Asia > Middle East > UAE > Fujairah Emirate > Fujairah (0.04)
- Europe > Germany (0.04)
- Asia > South Korea (0.04)
- Research Report (1.00)
- Instructional Material (0.68)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Beyond CNNs: Efficient Fine-Tuning of Multi-Modal LLMs for Object Detection on Low-Data Regimes
Elamon, Nirmal, Davoudi, Rouzbeh
The field of object detection and understanding is rapidly evolving, driven by advances in both traditional CNN-based models and emerging multi-modal large language models (LLMs). While CNNs like ResNet and YOLO remain highly effective for image-based tasks, recent transformer-based LLMs introduce new capabilities such as dynamic context reasoning, language-guided prompts, and holistic scene understanding. However, when used out-of-the-box, the full potential of LLMs remains underexploited, often resulting in suboptimal performance on specialized visual tasks. In this work, we conduct a comprehensive comparison of fine-tuned traditional CNNs, zero-shot pre-trained multi-modal LLMs, and fine-tuned multi-modal LLMs on the challenging task of artificial text overlay detection in images. A key contribution of our study is demonstrating that LLMs can be effectively fine-tuned on very limited data (fewer than 1,000 images) to achieve up to 36% accuracy improvement, matching or surpassing CNN-based baselines that typically require orders of magnitude more data. By exploring how language-guided models can be adapted for precise visual understanding with minimal supervision, our work contributes to the broader effort of bridging vision and language, offering novel insights into efficient cross-modal learning strategies. These findings highlight the adaptability and data efficiency of LLM-based approaches for real-world object detection tasks and provide actionable guidance for applying multi-modal transformers in low-resource visual environments. To support continued progress in this area, we have made the code used to fine-tune the models available in our GitHub, enabling future improvements and reuse in related applications.
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)
STARC: See-Through-Wall Augmented Reality Framework for Human-Robot Collaboration in Emergency Response
Yuan, Shenghai, Guo, Weixiang, Hu, Tianxin, Yang, Yu, Chen, Jinyu, Qian, Rui, Liu, Zhongyuan, Xie, Lihua
In emergency response missions, first responders must navigate cluttered indoor environments where occlusions block direct line-of-sight, concealing both life-threatening hazards and victims in need of rescue. We present STARC, a see-through AR framework for human-robot collaboration that fuses mobile-robot mapping with responder-mounted LiDAR sensing. A ground robot running LiDAR-inertial odometry performs large-area exploration and 3D human detection, while helmet- or handheld-mounted LiDAR on the responder is registered to the robot's global map via relative pose estimation. This cross-LiDAR alignment enables consistent first-person projection of detected humans and their point clouds - rendered in AR with low latency - into the responder's view. By providing real-time visualization of hidden occupants and hazards, STARC enhances situational awareness and reduces operator risk. Experiments in simulation, lab setups, and tactical field trials confirm robust pose alignment, reliable detections, and stable overlays, underscoring the potential of our system for fire-fighting, disaster relief, and other safety-critical operations. Code and design will be open-sourced upon acceptance.
- Leisure & Entertainment > Games > Computer Games (0.93)
- Law Enforcement & Public Safety (0.88)
- Transportation (0.68)
- Information Technology (0.68)
OmniAcc: Personalized Accessibility Assistant Using Generative AI
Karki, Siddhant, Han, Ethan, Mahmud, Nadim, Bhunia, Suman, Femiani, John, Raychoudhury, Vaskar
Individuals with ambulatory disabilities often encounter significant barriers when navigating urban environments due to the lack of accessible information and tools. This paper presents OmniAcc, an AI-powered interactive navigation system that utilizes GPT -4, satellite imagery, and OpenStreetMap data to identify, classify, and map wheelchair-accessible features such as ramps and crosswalks in the built environment. OmniAcc offers personalized route planning, real-time hands-free navigation, and instant query responses regarding physical accessibility. By using zero-shot learning and customized prompts, the system ensures precise detection of accessibility features, while supporting validation through structured workflows. This paper introduces OmniAcc and explores its potential to assist urban planners and mobility-aid users, demonstrated through a case study on crosswalk detection. With a crosswalk detection accuracy of 97.5%, OmniAcc highlights the transformative potential of AI in improving navigation and fostering more inclusive urban spaces.
- North America > United States > Ohio > Butler County > Oxford (0.28)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- (2 more...)
- Workflow (1.00)
- Research Report (0.82)
- Information Technology (0.68)
- Transportation > Ground > Road (0.49)
- Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.37)
SEER-VAR: Semantic Egocentric Environment Reasoner for Vehicle Augmented Reality
Lai, Yuzhi, Yuan, Shenghai, Li, Peizheng, Lou, Jun, Zell, Andreas
Unlike existing systems that assume static or single-view settings, SEER-V AR dynamically separates cabin and road scenes via depth-guided vision-language grounding. Two SLAM branches track egocentric motion in each context, while a GPT -based module generates context-aware overlays such as dashboard cues and hazard alerts. To support evaluation, we introduce EgoSLAM-Drive, a real-world dataset featuring synchronized egocentric views, 6DoF ground-truth poses, and AR annotations across diverse driving scenarios. Experiments demonstrate that SEER-V AR achieves robust spatial alignment and perceptually coherent AR rendering across varied environments. As one of the first to explore LLM-based AR recommendation in egocentric driving, we address the lack of comparable systems through structured prompting and detailed user studies. Results show that SEER-V AR enhances perceived scene understanding, overlay relevance, and driver ease, providing an effective foundation for future research in this direction. Code and dataset will be made open source.
- Questionnaire & Opinion Survey (1.00)
- Research Report > New Finding (0.88)
- Transportation > Ground > Road (1.00)
- Information Technology > Security & Privacy (1.00)
- Automobiles & Trucks (0.93)
- Law (0.67)
DINOv3 with Test-Time Training for Medical Image Registration
Wang, Shansong, Safari, Mojtaba, Hu, Mingzhe, Li, Qiang, Chang, Chih-Wei, Qiu, Richard LJ, Yang, Xiaofeng
Prior medical image registration approaches, particularly learning-based methods, often require large amounts of training data, which constrains clinical adoption. To overcome this limitation, we propose a training-free pipeline that relies on a frozen DINOv3 encoder and test-time optimization of the deformation field in feature space. Across two representative benchmarks, the method is accurate and yields regular deformations. On Abdomen MR-CT, it attained the best mean Dice score (DSC) of 0.790 together with the lowest 95th percentile Hausdorff Distance (HD95) of 4.9+-5.0 and the lowest standard deviation of Log-Jacobian (SDLogJ) of 0.08+-0.02. On ACDC cardiac MRI, it improves mean DSC to 0.769 and reduces SDLogJ to 0.11 and HD95 to 4.8, a marked gain over the initial alignment. The results indicate that operating in a compact foundation feature space at test time offers a practical and general solution for clinical registration without additional training.
- Health & Medicine > Therapeutic Area (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.66)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)