Spatial Reasoning
Eye-SpatialNet: Spatial Information Extraction from Ophthalmology Notes
Datta, Surabhi, Kaochar, Tasneem, Lam, Hio Cheng, Nwosu, Nelly, Giancardo, Luca, Chuang, Alice Z., Feldman, Robert M., Roberts, Kirk
These findings are documented based on interpretations from imaging examinations (e.g., fundus examination), complications or outcomes associated with surgeries (e.g., cataract surgery), and experiences or symptoms shared by patients. Such findings are oftentimes described along with their exact eye locations as well as other contextual information such as their timing and status. Thus, ophthalmology notes comprise of spatial relations between eye findings and their corresponding locations, and these findings are further described using different spatial characteristics such as laterality and size. Although there has been recent advancements in using natural language processing (NLP) methods in the ophthalmology domain, they are mainly targeted for specific ocular conditions. Some work leveraged electronic health record text data to identify conditions such as glaucoma [1], herpes zoster ophthalmicus [2], and exfoliation syndrome [3], while another set of work extracted quantitative measures particularly related to visual acuity [4, 5] and microbial keratitis [6]. In this work, we aim to extract more comprehensive information related to all eye findings, covering both spatial and contextual, from the ophthalmology notes. Besides automated screening and diagnosis of various ocular conditions, identifying such detailed information can aid in applications such as automated monitoring of eye findings or diseases and cohort retrieval for retrospective epidemiological studies. For this, we propose to extend our existing radiology spatial representation schema-Rad-SpatialNet [7] to the ophthalmology domain. We refer to this as the Eye-SpatialNet schema in this paper.
Interactive and Incremental Learning of Spatial Object Relations from Human Demonstrations
Kartmann, Rainer, Asfour, Tamim
Humans use semantic concepts such as spatial relations between objects to describe scenes and communicate tasks such as "Put the tea to the right of the cup" or "Move the plate between the fork and the spoon." Just as children, assistive robots must be able to learn the sub-symbolic meaning of such concepts from human demonstrations and instructions. We address the problem of incrementally learning geometric models of spatial relations from few demonstrations collected online during interaction with a human. Such models enable a robot to manipulate objects in order to fulfill desired spatial relations specified by verbal instructions. At the start, we assume the robot has no geometric model of spatial relations. Given a task as above, the robot requests the user to demonstrate the task once in order to create a model from a single demonstration, leveraging cylindrical probability distribution as generative representation of spatial relations. We show how this model can be updated incrementally with each new demonstration without access to past examples in a sample-efficient way using incremental maximum likelihood estimation, and demonstrate the approach on a real humanoid robot.
CSP: Self-Supervised Contrastive Spatial Pre-Training for Geospatial-Visual Representations
Mai, Gengchen, Lao, Ni, He, Yutong, Song, Jiaming, Ermon, Stefano
Geo-tagged images are publicly available in large quantities, whereas labels such as object classes are rather scarce and expensive to collect. Meanwhile, contrastive learning has achieved tremendous success in various natural image and language tasks with limited labeled data. However, existing methods fail to fully leverage geospatial information, which can be paramount to distinguishing objects that are visually similar. To directly leverage the abundant geospatial information associated with images in pre-training, fine-tuning, and inference stages, we present Contrastive Spatial Pre-Training (CSP), a self-supervised learning framework for geo-tagged images. We use a dual-encoder to separately encode the images and their corresponding geo-locations, and use contrastive objectives to learn effective location representations from images, which can be transferred to downstream supervised tasks such as image classification. Experiments show that CSP can improve model performance on both iNat2018 and fMoW datasets. Especially, on iNat2018, CSP significantly boosts the model performance with 10-34% relative improvement with various labeled training data sampling ratios.
Object-agnostic Affordance Categorization via Unsupervised Learning of Graph Embeddings
Toumpa, Alexia (a:1:{s:5:"en_US";s:19:"University of Leeds";}) | Cohn, Anthony G.
Acquiring knowledge about object interactions and affordances can facilitate scene understanding and human-robot collaboration tasks. As humans tend to use objects in many different ways depending on the scene and the objects' availability, learning object affordances in everyday-life scenarios is a challenging task, particularly in the presence of an open set of interactions and objects. We address the problem of affordance categorization for class-agnostic objects with an open set of interactions; we achieve this by learning similarities between object interactions in an unsupervised way and thus inducing clusters of object affordances. A novel depth-informed qualitative spatial representation is proposed for the construction of Activity Graphs (AGs), which abstract from the continuous representation of spatio-temporal interactions in RGB-D videos. These AGs are clustered to obtain groups of objects with similar affordances. Our experiments in a real-world scenario demonstrate that our method learns to create object affordance clusters with a high V-measure even in cluttered scenes. The proposed approach handles object occlusions by capturing effectively possible interactions and without imposing any object or scene constraints.
Spatial-Temporal Networks for Antibiogram Pattern Prediction
Fu, Xingbo, Chen, Chen, Dong, Yushun, Vullikanti, Anil, Klein, Eili, Madden, Gregory, Li, Jundong
An antibiogram is a periodic summary of antibiotic resistance results of organisms from infected patients to selected antimicrobial drugs. Antibiograms help clinicians to understand regional resistance rates and select appropriate antibiotics in prescriptions. In practice, significant combinations of antibiotic resistance may appear in different antibiograms, forming antibiogram patterns. Such patterns may imply the prevalence of some infectious diseases in certain regions. Thus it is of crucial importance to monitor antibiotic resistance trends and track the spread of multi-drug resistant organisms. In this paper, we propose a novel problem of antibiogram pattern prediction that aims to predict which patterns will appear in the future. Despite its importance, tackling this problem encounters a series of challenges and has not yet been explored in the literature. First of all, antibiogram patterns are not i.i.d as they may have strong relations with each other due to genomic similarities of the underlying organisms. Second, antibiogram patterns are often temporally dependent on the ones that are previously detected. Furthermore, the spread of antibiotic resistance can be significantly influenced by nearby or similar regions. To address the above challenges, we propose a novel Spatial-Temporal Antibiogram Pattern Prediction framework, STAPP, that can effectively leverage the pattern correlations and exploit the temporal and spatial information. We conduct extensive experiments on a real-world dataset with antibiogram reports of patients from 1999 to 2012 for 203 cities in the United States. The experimental results show the superiority of STAPP against several competitive baselines.
The Impact of the Geometric Properties of the Constraint Set in Safe Optimization with Bandit Feedback
Hutchinson, Spencer, Turan, Berkay, Alizadeh, Mahnoosh
We consider a safe optimization problem with bandit feedback in which an agent sequentially chooses actions and observes responses from the environment, with the goal of maximizing an arbitrary function of the response while respecting stage-wise constraints. We propose an algorithm for this problem, and study how the geometric properties of the constraint set impact the regret of the algorithm. In order to do so, we introduce the notion of the sharpness of a particular constraint set, which characterizes the difficulty of performing learning within the constraint set in an uncertain setting. This concept of sharpness allows us to identify the class of constraint sets for which the proposed algorithm is guaranteed to enjoy sublinear regret. Simulation results for this algorithm support the sublinear regret bound and provide empirical evidence that the sharpness of the constraint set impacts the performance of the algorithm.
Hydra-Multi: Collaborative Online Construction of 3D Scene Graphs with Multi-Robot Teams
Chang, Yun, Hughes, Nathan, Ray, Aaron, Carlone, Luca
3D scene graphs have recently emerged as an expressive high-level map representation that describes a 3D environment as a layered graph where nodes represent spatial concepts at multiple levels of abstraction (e.g., objects, rooms, buildings) and edges represent relations between concepts (e.g., inclusion, adjacency). This paper describes Hydra-Multi, the first multi-robot spatial perception system capable of constructing a multi-robot 3D scene graph online from sensor data collected by robots in a team. In particular, we develop a centralized system capable of constructing a joint 3D scene graph by taking incremental inputs from multiple robots, effectively finding the relative transforms between the robots' frames, and incorporating loop closure detections to correctly reconcile the scene graph nodes from different robots. We evaluate Hydra-Multi on simulated and real scenarios and show it is able to reconstruct accurate 3D scene graphs online. We also demonstrate Hydra-Multi's capability of supporting heterogeneous teams by fusing different map representations built by robots with different sensor suites.
Teaching Agents how to Map: Spatial Reasoning for Multi-Object Navigation
Marza, Pierre, Matignon, Laetitia, Simonin, Olivier, Wolf, Christian
In the context of visual navigation, the capacity to map a novel environment is necessary for an agent to exploit its observation history in the considered place and efficiently reach known goals. This ability can be associated with spatial reasoning, where an agent is able to perceive spatial relationships and regularities, and discover object characteristics. Recent work introduces learnable policies parametrized by deep neural networks and trained with Reinforcement Learning (RL). In classical RL setups, the capacity to map and reason spatially is learned end-to-end, from reward alone. In this setting, we introduce supplementary supervision in the form of auxiliary tasks designed to favor the emergence of spatial perception capabilities in agents trained for a goal-reaching downstream objective. We show that learning to estimate metrics quantifying the spatial relationships between an agent at a given location and a goal to reach has a high positive impact in Multi-Object Navigation settings. Our method significantly improves the performance of different baseline agents, that either build an explicit or implicit representation of the environment, even matching the performance of incomparable oracle agents taking ground-truth maps as input. A learning-based agent from the literature trained with the proposed auxiliary losses was the winning entry to the Multi-Object Navigation Challenge, part of the CVPR 2021 Embodied AI Workshop.
Dialectical language model evaluation: An initial appraisal of the commonsense spatial reasoning abilities of LLMs
Cohn, Anthony G, Hernandez-Orallo, Jose
Language models have become very popular recently and many claims have been made about their abilities, including for commonsense reasoning. Given the increasingly better results of current language models on previous static benchmarks for commonsense reasoning, we explore an alternative dialectical evaluation. The goal of this kind of evaluation is not to obtain an aggregate performance value but to find failures and map the boundaries of the system. Dialoguing with the system gives the opportunity to check for consistency and get more reassurance of these boundaries beyond anecdotal evidence. In this paper we conduct some qualitative investigations of this kind of evaluation for the particular case of spatial reasoning (which is a fundamental aspect of commonsense reasoning). We conclude with some suggestions for future work both to improve the capabilities of language models and to systematise this kind of dialectical evaluation.
UT Southwestern teaches med students that 'gender is independent of physical structure, chromosomes, or genes'
Nineteen protesters were arrested at the Kentucky Capitol on Wednesday amid a protest against a measure that would ban certain gender care for minors. Documents obtained by Fox News Digital show that University of Texas Southwestern medical students are being taught that gender is independent of physical structure. Fox News Digital obtained the documents via a FOIA request from Do No Harm, a national association of medical professionals that combats "woke" activism in the healthcare system. According to the University of Texas Southwestern Medical Center's Human Structure curriculum, they "explicitly acknowledge the differentiation between the terms sex and gender." RACHEL LEVINE SAYS CHANGING KIDS' GENDERS WILL SOON BE FULLY EMBRACED: 'WHEELS WILL TURN ON THIS' "The latter is a psychological, social, and cultural construct, including self-identification. Gender is independent of physical structure, chromosomes, or genes," curriculum materials read.