aois
Better Together: Leveraging Multiple Digital Twins for Deployment Optimization of Airborne Base Stations
Belgiovine, Mauro, Dick, Chris, Chowdhury, Kaushik
Abstract--Airborne Base Stations (ABSs) allow for flexible geographical allocation of network resources with dynamically changing load as well as rapid deployment of alternate connectivity solutions during natural disasters. Since the radio infrastructure is carried by unmanned aerial vehicles (UA Vs) with limited flight time, it is important to establish the best location for the ABS without exhaustive field trials. This paper proposes a digital twin (DT)-guided approach to achieve this goal through the following key contributions: (i) Implementation of an interactive software bridge between two open-source DTs such that the same scene is evaluated with high fidelity across NVIDIA's Sionna and Aerial Omniverse Digital Twin (AODT), highlighting the unique features of each of these platforms for this allocation problem, (ii) Design of a back-propagation-based algorithm in Sionna for rapidly converging on the physical location of the UA Vs, orientation of the antennas and transmit power to ensure efficient coverage across the swarm of the UA Vs, and (iii) numerical evaluation in AODT for large network scenarios (50 UEs, 10 ABS) that identifies the environmental conditions in which there is agreement or divergence of performance results between these twins. Finally, (iv) we propose a resilience mechanism to provide consistent coverage to mission-critical devices and demonstrate a use case for bi-directional flow of information between the two DTs. Unmanned Aerial V ehicle (UA V)-mounted Base Stations, or Airborne Base Stations (ABSs), have gained significant attention as a complement to ground-based cellular networks [1]. As UA Vs become more accessible, their ability to navigate 3-dimensional (3D) space provides flexibility in adapting to dynamic network demands [2], [3], enabling line-of-sight links to mission-critical units [4] and enhancing user tracking [5]. However, ABS-enabled connectivity introduces challenges such as collision avoidance, coordinated coverage, and optimal placement, considering limited flight times of 20 to 100 minutes [6]. These challenges are highly dependent on the RF propagation environment, making prior channel knowledge essential for effective network planning. Motivation for Digital Twins: Optimal placement of Base Stations (BSs) is traditionally handled by telecom operators relying on domain knowledge and best practices. Digital Twins (DTs) and, specifically, Digital Twins for Networking (DTNs) [7], have emerged as strategic tools for network simulation, performance analysis, and "what-if" scenarios.
More Expert-like Eye Gaze Movement Patterns are Related to Better X-ray Reading
Yang, Pingjing, Cromley, Jennifer, Diesner, Jana
Understanding how novices acquire and hone visual search skills is crucial for developing and optimizing training methods across domains. Network analysis methods can be used to analyze graph representations of visual expertise. This study investigates the relationship between eye-gaze movements and learning outcomes among undergraduate dentistry students who were diagnosing dental radiographs over multiple semesters. We use network analysis techniques to model eye-gaze scanpaths as directed graphs and examine changes in network metrics over time. Using time series clustering on each metric, we identify distinct patterns of visual search strategies and explore their association with students' diagnostic performance. Our findings suggest that the network metric of transition entropy is negatively correlated with performance scores, while the number of nodes and edges as well as average PageRank are positively correlated with performance scores. Changes in network metrics for individual students over time suggest a developmental shift from intermediate to expert-level processing. These insights contribute to understanding expertise acquisition in visual tasks and can inform the design of AI-assisted learning interventions.
DRL4AOI: A DRL Framework for Semantic-aware AOI Segmentation in Location-Based Services
Lin, Youfang, Fu, Jinji, Wen, Haomin, Wang, Jiyuan, Wei, Zhenjie, Qiang, Yuting, Mao, Xiaowei, Wu, Lixia, Hu, Haoyuan, Liang, Yuxuan, Wan, Huaiyu
In Location-Based Services (LBS), such as food delivery, a fundamental task is segmenting Areas of Interest (AOIs), aiming at partitioning the urban geographical spaces into non-overlapping regions. Traditional AOI segmentation algorithms primarily rely on road networks to partition urban areas. While promising in modeling the geo-semantics, road network-based models overlooked the service-semantic goals (e.g., workload equality) in LBS service. In this paper, we point out that the AOI segmentation problem can be naturally formulated as a Markov Decision Process (MDP), which gradually chooses a nearby AOI for each grid in the current AOI's border. Based on the MDP, we present the first attempt to generalize Deep Reinforcement Learning (DRL) for AOI segmentation, leading to a novel DRL-based framework called DRL4AOI. The DRL4AOI framework introduces different service-semantic goals in a flexible way by treating them as rewards that guide the AOI generation. To evaluate the effectiveness of DRL4AOI, we develop and release an AOI segmentation system. We also present a representative implementation of DRL4AOI - TrajRL4AOI - for AOI segmentation in the logistics service. It introduces a Double Deep Q-learning Network (DDQN) to gradually optimize the AOI generation for two specific semantic goals: i) trajectory modularity, i.e., maximize tightness of the trajectory connections within an AOI and the sparsity of connections between AOIs, ii) matchness with the road network, i.e., maximizing the matchness between AOIs and the road network. Quantitative and qualitative experiments conducted on synthetic and real-world data demonstrate the effectiveness and superiority of our method. The code and system is publicly available at https://github.com/Kogler7/AoiOpt.
NEUSIS: A Compositional Neuro-Symbolic Framework for Autonomous Perception, Reasoning, and Planning in Complex UAV Search Missions
Cai, Zhixi, Cardenas, Cristian Rojas, Leo, Kevin, Zhang, Chenyuan, Backman, Kal, Li, Hanbing, Li, Boying, Ghorbanali, Mahsa, Datta, Stavya, Qu, Lizhen, Santiago, Julian Gutierrez, Ignatiev, Alexey, Li, Yuan-Fang, Vered, Mor, Stuckey, Peter J, de la Banda, Maria Garcia, Rezatofighi, Hamid
This paper addresses the problem of autonomous UAV search missions, where a UAV must locate specific Entities of Interest (EOIs) within a time limit, based on brief descriptions in large, hazard-prone environments with keep-out zones. The UAV must perceive, reason, and make decisions with limited and uncertain information. We propose NEUSIS, a compositional neuro-symbolic system designed for interpretable UAV search and navigation in realistic scenarios. NEUSIS integrates neuro-symbolic visual perception, reasoning, and grounding (GRiD) to process raw sensory inputs, maintains a probabilistic world model for environment representation, and uses a hierarchical planning component (SNaC) for efficient path planning. Experimental results from simulated urban search missions using AirSim and Unreal Engine show that NEUSIS outperforms a state-of-the-art (SOTA) vision-language model and a SOTA search planning model in success rate, search efficiency, and 3D localization. These results demonstrate the effectiveness of our compositional neuro-symbolic approach in handling complex, real-world scenarios, making it a promising solution for autonomous UAV systems in search missions.
Uncertainty and Generalizability in Foundation Models for Earth Observation
Ramos-Pollan, Raul, Kalaitzis, Freddie, Selvam, Karthick Panner
We take the perspective in which we want to design a downstream task (such as estimating vegetation coverage) on a certain area of interest (AOI) with a limited labeling budget. By leveraging an existing Foundation Model (FM) we must decide whether we train a downstream model on a different but label-rich AOI hoping it generalizes to our AOI, or we split labels in our AOI for training and validating. In either case, we face choices concerning what FM to use, how to sample our AOI for labeling, etc. which affect both the performance and uncertainty of the results. In this work, we perform a large ablative study using eight existing FMs on either Sentinel 1 or Sentinel 2 as input data, and the classes from the ESA World Cover product as downstream tasks across eleven AOIs. We do repeated sampling and training, resulting in an ablation of some 500K simple linear regression models. Our results show both the limits of spatial generalizability across AOIs and the power of FMs where we are able to get over 0.9 correlation coefficient between predictions and targets on different chip level predictive tasks. And still, performance and uncertainty vary greatly across AOIs, tasks and FMs. We believe this is a key issue in practice, because there are many design decisions behind each FM and downstream task (input modalities, sampling, architectures, pretraining, etc.) and usually a downstream task designer is aware of and can decide upon a few of them. Through this work, we advocate for the usage of the methodology herein described (large ablations on reference global labels and simple probes), both when publishing new FMs, and to make informed decisions when designing downstream tasks to use them.
Action from Still Image Dataset and Inverse Optimal Control to Learn Task Specific Visual Scanpaths Stefan Mathe 1,3 Institute of Mathematics of the Romanian Academy of Science
Human eye movements provide a rich source of information into the human visual information processing. The complex interplay between the task and the visual stimulus is believed to determine human eye movements, yet it is not fully understood, making it difficult to develop reliable eye movement prediction systems. Our work makes three contributions towards addressing this problem. First, we complement one of the largest and most challenging static computer vision datasets, VOC 2012 Actions, with human eye movement recordings collected under the primary task constraint of action recognition, as well as, separately, for context recognition, in order to analyze the impact of different tasks. Our dataset is unique among the eyetracking datasets of still images in terms of large scale (over 1 million fixations recorded in 9157 images) and different task controls.
Multimodal Urban Areas of Interest Generation via Remote Sensing Imagery and Geographical Prior
Shi, Chuanji, Zhang, Yingying, Wang, Jiaotuan, Guo, Xin, Zhu, Qiqi
Urban area-of-interest (AOI) refers to an integrated urban functional zone with defined polygonal boundaries. The rapid development of urban commerce has led to increasing demands for highly accurate and timely AOI data. However, existing research primarily focuses on coarse-grained functional zones for urban planning or regional economic analysis, and often neglects the expiration of AOI in the real world. They fail to fulfill the precision demands of Mobile Internet Online-to-Offline (O2O) businesses. These businesses require accuracy down to a specific community, school, or hospital. In this paper, we propose a comprehensive end-to-end multimodal deep learning framework designed for simultaneously detecting accurate AOI boundaries and validating the reliability of AOI by leveraging remote sensing imagery coupled with geographical prior, titled AOITR. Unlike conventional AOI generation methods, such as the Road-cut method that segments road networks at various levels, our approach diverges from semantic segmentation algorithms that depend on pixel-level classification. Instead, our AOITR begins by selecting a point-of-interest (POI) of specific category, and uses it to retrieve corresponding remote sensing imagery and geographical prior such as entrance POIs and road nodes. This information helps to build a multimodal detection model based on transformer encoder-decoder architecture to regress the AOI polygon. Additionally, we utilize the dynamic features from human mobility, nearby POIs, and logistics addresses for AOI reliability evaluation via a cascaded network module. The experimental results reveal that our algorithm achieves a significant improvement on Intersection over Union (IoU) metric, surpassing previous methods by a large margin.
Peer attention enhances student learning
Xu, Songlin, Hu, Dongyin, Wang, Ru, Zhang, Xinyu
Human visual attention is susceptible to social influences. In education, peer effects impact student learning, but their precise role in modulating attention remains unclear. Our experiment (N=311) demonstrates that displaying peer visual attention regions when students watch online course videos enhances their focus and engagement. However, students retain adaptability in following peer attention cues. Overall, guided peer attention improves learning experiences and outcomes. These findings elucidate how peer visual attention shapes students' gaze patterns, deepening understanding of peer influence on learning. They also offer insights into designing adaptive online learning interventions leveraging peer attention modelling to optimize student attentiveness and success.
Enriching Verbal Feedback from Usability Testing: Automatic Linking of Thinking-Aloud Recordings and Stimulus using Eye Tracking and Mouse Data
Murali, Supriya, Walber, Tina, Schaefer, Christoph, Lim, Sezen
The think aloud method is an important and commonly used tool for usability optimization. However, analyzing think aloud data could be time consuming. In this paper, we put forth an automatic analysis of verbal protocols and test the link between spoken feedback and the stimulus using eye tracking and mouse tracking. The gained data - user feedback linked to a specific area of the stimulus - could be used to let an expert review the feedback on specific web page elements or to visualize on which parts of the web page the feedback was given. Specifically, we test if participants fixate on or point with the mouse to the content of the webpage that they are verbalizing. During the testing, participants were shown three websites and asked to verbally give their opinion. The verbal responses, along with the eye and cursor movements were recorded. We compared the hit rate, defined as the percentage of verbally mentioned areas of interest (AOIs) that were fixated with gaze or pointed to with the mouse. The results revealed a significantly higher hit rate for the gaze compared to the mouse data. Further investigation revealed that, while the mouse was mostly used passively to scroll, the gaze was often directed towards relevant AOIs, thus establishing a strong association between spoken words and stimuli. Therefore, eye tracking data possibly provides more detailed information and more valuable insights about the verbalizations compared to the mouse data.
Local-Global Methods for Generalised Solar Irradiance Forecasting
Cargan, Timothy, Landa-Silva, Dario, Triguero, Isaac
As the use of solar power increases, having accurate and timely forecasts will be essential for smooth grid operators. There are many proposed methods for forecasting solar irradiance / solar power production. However, many of these methods formulate the problem as a time-series, relying on near real-time access to observations at the location of interest to generate forecasts. This requires both access to a real-time stream of data and enough historical observations for these methods to be deployed. In this paper, we propose the use of Global methods to train our models in a generalised way, enabling them to generate forecasts for unseen locations. We apply this approach to both classical ML and state of the art methods. Using data from 20 locations distributed throughout the UK and widely available weather data, we show that it is possible to build systems that do not require access to this data. We utilise and compare both satellite and ground observations (e.g. temperature, pressure) of weather data. Leveraging weather observations and measurements from other locations we show it is possible to create models capable of accurately forecasting solar irradiance at new locations. This could facilitate use planning and optimisation for both newly deployed solar farms and domestic installations from the moment they come online. Additionally, we show that training a single global model for multiple locations can produce a more robust model with more consistent and accurate results across locations.