AITopics | overlay

Collaborating Authors

overlay

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

IGUANA: Immersive Guidance, Navigation, and Control for Consumer UAV

Victor, Victor, Krisanty, Tania, McGinity, Matthew, Gumhold, Stefan, Aßmann, Uwe

arXiv.org Artificial IntelligenceDec-10-2025

As the markets for unmanned aerial vehicles (UAVs) and mixed reality (MR) headsets continue to grow, recent research has increasingly explored their integration, which enables more intuitive, immersive, and situationally aware control systems. We present IGUANA, an MR-based immersive guidance, navigation, and control system for consumer UAVs. IGUANA introduces three key elements beyond conventional control interfaces: (1) a 3D terrain map interface with draggable waypoint markers and live camera preview for high-level control, (2) a novel spatial control metaphor that uses a virtual ball as a physical analogy for low-level control, and (3) a spatial overlay that helps track the UAV when it is not visible with the naked eye or visual line of sight is interrupted. We conducted a user study to evaluate our design, both quantitatively and qualitatively, and found that (1) the 3D map interface is intuitive and easy to use, relieving users from manual control and suggesting improved accuracy and consistency with lower perceived workload relative to conventional dual-stick controller, (2) the virtual ball interface is intuitive but limited by the lack of physical feedback, and (3) the spatial overlay is very useful in enhancing the users' situational awareness.

artificial intelligence, human computer interaction, interface, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3756884.3766033

2510.07609

Country:

North America > Canada > Quebec > Montreal (0.05)
North America > United States > New York > New York County > New York City (0.05)
Europe > Germany > Saxony > Dresden (0.05)
(9 more...)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study > Negative Result (0.68)

Industry:

Government (1.00)
Information Technology > Robotics & Automation (0.88)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)

Add feedback

References [1 ]

Neural Information Processing SystemsNov-15-2025, 12:09:23 GMT

Mahmoud Assran et al. "Stochastic Gradient Push for Distributed Deep Learning". Keith Bonawitz et al. "Practical secure aggregation for privacy-preserving machine learning". Pierre Courtiol et al. "Deep learning-based classification of mesothelioma improves prediction "Distributed nonconvex optimization over time-varying networks". "Dual Averaging for Distributed Optimization: Matt Fredrikson, Somesh Jha, and Thomas Ristenpart. "Model inversion attacks that exploit Transactions on The Built Environment 37 (1998). Zhanhong Jiang et al. "Collaborative deep learning in fixed topology networks". Can Karakus et al. "Straggler Mitigation in Distributed Optimization Through Data Encoding". "Federated Optimization:Distributed Optimization Beyond the Datacenter". Jakub Konecný et al. "Federated Optimization: Distributed Machine Learning for On-Device Songze Li et al. "Near-Optimal Straggler Mitigation for Distributed Gradient Methods".

artificial intelligence, link capacity, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Europe > Sweden > Stockholm > Stockholm (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(7 more...)

Industry:

Health & Medicine (0.87)
Information Technology > Services (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Throughput-Optimal Topology Design for Cross-Silo Federated Learning

Neural Information Processing SystemsNov-15-2025, 12:09:15 GMT

Federated learning (FL) " involves training statistical models over remote devices or siloed data

artificial intelligence, machine learning, overlay, (19 more...)

Neural Information Processing Systems

Country:

Europe > France > Provence-Alpes-Côte d'Azur (0.05)
North America > Canada (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology (0.94)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

Learning to Land Anywhere: Transferable Generative Models for Aircraft Trajectories

Larsen, Olav Finne Praesteng, Ruocco, Massimiliano, Spitieris, Michail, Murad, Abdulmajid, Ragosta, Martina

arXiv.org Artificial IntelligenceNov-7-2025

Access to trajectory data is a key requirement for developing and validating Air Traffic Management (ATM) solutions, yet many secondary and regional airports face severe data scarcity. This limits the applicability of machine learning methods and the ability to perform large-scale simulations or "what-if" analyses. In this paper, we investigate whether generative models trained on data-rich airports can be efficiently adapted to data-scarce airports using transfer learning. We adapt state-of-the-art diffusion- and flow-matching-based architectures to the aviation domain and evaluate their transferability between Zurich (source) and Dublin (target) landing trajectory datasets. Models are pretrained on Zurich and fine-tuned on Dublin with varying amounts of local data, ranging from 0% to 100%. Results show that diffusion-based models achieve competitive performance with as little as 5% of the Dublin data and reach baseline-level performance around 20%, consistently outperforming models trained from scratch across metrics and visual inspections. Latent flow matching and latent diffusion models also benefit from pretraining, though with more variable gains, while flow matching models show weaker generalization. Despite challenges in capturing rare trajectory patterns, these findings demonstrate the potential of transfer learning to substantially reduce data requirements for trajectory generation in ATM, enabling realistic synthetic data generation even in environments with limited historical records.

machine learning, natural language, trajectory, (14 more...)

arXiv.org Artificial Intelligence

2511.04155

Country:

Europe > Switzerland > Zürich > Zürich (0.44)
Europe > Norway > Central Norway > Trøndelag > Trondheim (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Transportation > Air (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.63)

Add feedback

FST.ai 2.0: An Explainable AI Ecosystem for Fair, Fast, and Inclusive Decision-Making in Olympic and Paralympic Taekwondo

Shariatmadar, Keivan, Osman, Ahmad, Ray, Ramin, Kim, Kisam

arXiv.org Machine LearningOct-23-2025

Fair, transparent, and explainable decision-making remains a critical challenge in Olympic and Paralympic combat sports. This paper presents \emph{FST.ai 2.0}, an explainable AI ecosystem designed to support referees, coaches, and athletes in real time during Taekwondo competitions and training. The system integrates {pose-based action recognition} using graph convolutional networks (GCNs), {epistemic uncertainty modeling} through credal sets, and {explainability overlays} for visual decision support. A set of {interactive dashboards} enables human--AI collaboration in referee evaluation, athlete performance analysis, and Para-Taekwondo classification. Beyond automated scoring, FST.ai~2.0 incorporates modules for referee training, fairness monitoring, and policy-level analytics within the World Taekwondo ecosystem. Experimental validation on competition data demonstrates an {85\% reduction in decision review time} and {93\% referee trust} in AI-assisted decisions. The framework thus establishes a transparent and extensible pipeline for trustworthy, data-driven officiating and athlete assessment. By bridging real-time perception, explainable inference, and governance-aware design, FST.ai~2.0 represents a step toward equitable, accountable, and human-aligned AI in sports.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

2510.18193

Country:

Asia > Middle East > UAE > Fujairah Emirate > Fujairah (0.04)
Europe > Germany (0.04)
Asia > South Korea (0.04)

Genre:

Research Report (1.00)
Instructional Material (0.68)

Industry: Leisure & Entertainment > Sports > Martial Arts (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Beyond CNNs: Efficient Fine-Tuning of Multi-Modal LLMs for Object Detection on Low-Data Regimes

Elamon, Nirmal, Davoudi, Rouzbeh

arXiv.org Artificial IntelligenceOct-13-2025

The field of object detection and understanding is rapidly evolving, driven by advances in both traditional CNN-based models and emerging multi-modal large language models (LLMs). While CNNs like ResNet and YOLO remain highly effective for image-based tasks, recent transformer-based LLMs introduce new capabilities such as dynamic context reasoning, language-guided prompts, and holistic scene understanding. However, when used out-of-the-box, the full potential of LLMs remains underexploited, often resulting in suboptimal performance on specialized visual tasks. In this work, we conduct a comprehensive comparison of fine-tuned traditional CNNs, zero-shot pre-trained multi-modal LLMs, and fine-tuned multi-modal LLMs on the challenging task of artificial text overlay detection in images. A key contribution of our study is demonstrating that LLMs can be effectively fine-tuned on very limited data (fewer than 1,000 images) to achieve up to 36% accuracy improvement, matching or surpassing CNN-based baselines that typically require orders of magnitude more data. By exploring how language-guided models can be adapted for precise visual understanding with minimal supervision, our work contributes to the broader effort of bridging vision and language, offering novel insights into efficient cross-modal learning strategies. These findings highlight the adaptability and data efficiency of LLM-based approaches for real-world object detection tasks and provide actionable guidance for applying multi-modal transformers in low-resource visual environments. To support continued progress in this area, we have made the code used to fine-tune the models available in our GitHub, enabling future improvements and reuse in related applications.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2510.08589

Country: Oceania > New Zealand (0.04)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Add feedback

STARC: See-Through-Wall Augmented Reality Framework for Human-Robot Collaboration in Emergency Response

Yuan, Shenghai, Guo, Weixiang, Hu, Tianxin, Yang, Yu, Chen, Jinyu, Qian, Rui, Liu, Zhongyuan, Xie, Lihua

arXiv.org Artificial IntelligenceSep-22-2025

In emergency response missions, first responders must navigate cluttered indoor environments where occlusions block direct line-of-sight, concealing both life-threatening hazards and victims in need of rescue. We present STARC, a see-through AR framework for human-robot collaboration that fuses mobile-robot mapping with responder-mounted LiDAR sensing. A ground robot running LiDAR-inertial odometry performs large-area exploration and 3D human detection, while helmet- or handheld-mounted LiDAR on the responder is registered to the robot's global map via relative pose estimation. This cross-LiDAR alignment enables consistent first-person projection of detected humans and their point clouds - rendered in AR with low latency - into the responder's view. By providing real-time visualization of hidden occupants and hazards, STARC enhances situational awareness and reduces operator risk. Experiments in simulation, lab setups, and tactical field trials confirm robust pose alignment, reliable detections, and stable overlays, underscoring the potential of our system for fire-fighting, disaster relief, and other safety-critical operations. Code and design will be open-sourced upon acceptance.

artificial intelligence, international conference, situational awareness, (16 more...)

arXiv.org Artificial Intelligence

2509.15507

Country: Asia > Singapore (0.04)

Genre: Research Report > Experimental Study (0.46)

Industry:

Leisure & Entertainment > Games > Computer Games (0.93)
Law Enforcement & Public Safety (0.88)
Transportation (0.68)
Information Technology (0.68)

Technology: Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.92)

Add feedback

OmniAcc: Personalized Accessibility Assistant Using Generative AI

Karki, Siddhant, Han, Ethan, Mahmud, Nadim, Bhunia, Suman, Femiani, John, Raychoudhury, Vaskar

arXiv.org Artificial IntelligenceSep-10-2025

Individuals with ambulatory disabilities often encounter significant barriers when navigating urban environments due to the lack of accessible information and tools. This paper presents OmniAcc, an AI-powered interactive navigation system that utilizes GPT -4, satellite imagery, and OpenStreetMap data to identify, classify, and map wheelchair-accessible features such as ramps and crosswalks in the built environment. OmniAcc offers personalized route planning, real-time hands-free navigation, and instant query responses regarding physical accessibility. By using zero-shot learning and customized prompts, the system ensures precise detection of accessibility features, while supporting validation through structured workflows. This paper introduces OmniAcc and explores its potential to assist urban planners and mobility-aid users, demonstrated through a case study on crosswalk detection. With a crosswalk detection accuracy of 97.5%, OmniAcc highlights the transformative potential of AI in improving navigation and fostering more inclusive urban spaces.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2509.0722

Country:

North America > United States > Ohio > Butler County > Oxford (0.28)
North America > United States > California > San Francisco County > San Francisco (0.04)
North America > United States > District of Columbia > Washington (0.04)
(2 more...)

Genre:

Workflow (1.00)
Research Report (0.82)

Industry:

Information Technology (0.68)
Transportation > Ground > Road (0.49)
Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.37)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.51)

Add feedback

SEER-VAR: Semantic Egocentric Environment Reasoner for Vehicle Augmented Reality

Lai, Yuzhi, Yuan, Shenghai, Li, Peizheng, Lou, Jun, Zell, Andreas

arXiv.org Artificial IntelligenceAug-26-2025

Unlike existing systems that assume static or single-view settings, SEER-V AR dynamically separates cabin and road scenes via depth-guided vision-language grounding. Two SLAM branches track egocentric motion in each context, while a GPT -based module generates context-aware overlays such as dashboard cues and hazard alerts. To support evaluation, we introduce EgoSLAM-Drive, a real-world dataset featuring synchronized egocentric views, 6DoF ground-truth poses, and AR annotations across diverse driving scenarios. Experiments demonstrate that SEER-V AR achieves robust spatial alignment and perceptually coherent AR rendering across varied environments. As one of the first to explore LLM-based AR recommendation in egocentric driving, we address the lack of comparable systems through structured prompting and detailed user studies. Results show that SEER-V AR enhances perceived scene understanding, overlay relevance, and driver ease, providing an effective foundation for future research in this direction. Code and dataset will be made open source.

large language model, machine learning, overlay, (21 more...)

arXiv.org Artificial Intelligence

2508.17255

Country: North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report > New Finding (0.88)

Industry:

Transportation > Ground > Road (1.00)
Information Technology > Security & Privacy (1.00)
Automobiles & Trucks (0.93)
Law (0.67)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
(4 more...)

Add feedback

DINOv3 with Test-Time Training for Medical Image Registration

Wang, Shansong, Safari, Mojtaba, Hu, Mingzhe, Li, Qiang, Chang, Chih-Wei, Qiu, Richard LJ, Yang, Xiaofeng

arXiv.org Artificial IntelligenceAug-21-2025

Prior medical image registration approaches, particularly learning-based methods, often require large amounts of training data, which constrains clinical adoption. To overcome this limitation, we propose a training-free pipeline that relies on a frozen DINOv3 encoder and test-time optimization of the deformation field in feature space. Across two representative benchmarks, the method is accurate and yields regular deformations. On Abdomen MR-CT, it attained the best mean Dice score (DSC) of 0.790 together with the lowest 95th percentile Hausdorff Distance (HD95) of 4.9+-5.0 and the lowest standard deviation of Log-Jacobian (SDLogJ) of 0.08+-0.02. On ACDC cardiac MRI, it improves mean DSC to 0.769 and reduces SDLogJ to 0.11 and HD95 to 4.8, a marked gain over the initial alignment. The results indicate that operating in a compact foundation feature space at test time offers a practical and general solution for clinical registration without additional training.

machine learning, pattern recognition, registration, (19 more...)

arXiv.org Artificial Intelligence

2508.14809

Country: North America > United States > Georgia > Fulton County > Atlanta (0.04)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback