Goto

Collaborating Authors

 Spatial Reasoning


Robust Change Detection Based on Neural Descriptor Fields

arXiv.org Artificial Intelligence

The ability to reason about changes in the environment is crucial for robots operating over extended periods of time. Agents are expected to capture changes during operation so that actions can be followed to ensure a smooth progression of the working session. However, varying viewing angles and accumulated localization errors make it easy for robots to falsely detect changes in the surrounding world due to low observation overlap and drifted object associations. In this paper, based on the recently proposed category-level Neural Descriptor Fields (NDFs), we develop an object-level online change detection approach that is robust to partially overlapping observations and noisy localization results. Utilizing the shape completion capability and SE(3)-equivariance of NDFs, we represent objects with compact shape codes encoding full object shapes from partial observations. The objects are then organized in a spatial tree structure based on object centers recovered from NDFs for fast queries of object neighborhoods. By associating objects via shape code similarity and comparing local object-neighbor spatial layout, our proposed approach demonstrates robustness to low observation overlap and localization noises. We conduct experiments on both synthetic and real-world sequences and achieve improved change detection results compared to multiple baseline methods. Project webpage: https://yilundu.github.io/ndf_change


Proprioceptive Slip Detection for Planetary Rovers in Perceptually Degraded Extraterrestrial Environments

arXiv.org Artificial Intelligence

Slip detection is of fundamental importance for the safety and efficiency of rovers driving on the surface of extraterrestrial bodies. Current planetary rover slip detection systems rely on visual perception on the assumption that sufficient visual features can be acquired in the environment. However, visual-based methods are prone to suffer in perceptually degraded planetary environments with dominant low terrain features such as regolith, glacial terrain, salt-evaporites, and poor lighting conditions such as dark caves and permanently shadowed regions. Relying only on visual sensors for slip detection also requires additional computational power and reduces the rover traversal rate. This paper answers the question of how to detect wheel slippage of a planetary rover without depending on visual perception. In this respect, we propose a slip detection system that obtains its information from a proprioceptive localization framework that is capable of providing reliable, continuous, and computationally efficient state estimation over hundreds of meters. This is accomplished by using zero velocity update, zero angular rate update, and non-holonomic constraints as pseudo-measurement updates on an inertial navigation system framework. The proposed method is evaluated on actual hardware and field-tested in a planetary-analog environment. The method achieves greater than 92% slip detection accuracy for distances around 150 m using only an inertial measurement unit (IMU) and wheel encoders.


Progressive Voronoi Diagram Subdivision: Towards A Holistic Geometric Framework for Exemplar-free Class-Incremental Learning

arXiv.org Artificial Intelligence

Exemplar-free Class-incremental Learning (CIL) is a challenging problem because rehearsing data from previous phases is strictly prohibited, causing catastrophic forgetting of Deep Neural Networks (DNNs). In this paper, we present iVoro, a holistic framework for CIL, derived from computational geometry. We found Voronoi Diagram (VD), a classical model for space subdivision, is especially powerful for solving the CIL problem, because VD itself can be constructed favorably in an incremental manner -- the newly added sites (classes) will only affect the proximate classes, making the non-contiguous classes hardly forgettable. Further, in order to find a better set of centers for VD construction, we colligate DNN with VD using Power Diagram and show that the VD structure can be optimized by integrating local DNN models using a divide-and-conquer algorithm. Moreover, our VD construction is not restricted to the deep feature space, but is also applicable to multiple intermediate feature spaces, promoting VD to be multi-centered VD (CIVD) that efficiently captures multi-grained features from DNN. Importantly, iVoro is also capable of handling uncertainty-aware test-time Voronoi cell assignment and has exhibited high correlations between geometric uncertainty and predictive accuracy (up to ~0.9). Putting everything together, iVoro achieves up to 25.26%, 37.09%, and 33.21% improvements on CIFAR-100, TinyImageNet, and ImageNet-Subset, respectively, compared to the state-of-the-art non-exemplar CIL approaches. In conclusion, iVoro enables highly accurate, privacy-preserving, and geometrically interpretable CIL that is particularly useful when cross-phase data sharing is forbidden, e.g. in medical applications. Our code is available at https://machunwei.github.io/ivoro.


Location retrieval using visible landmarks based qualitative place signatures

arXiv.org Artificial Intelligence

Location retrieval based on visual information is to retrieve the location of an agent (e.g. human, robot) or the area they see by comparing the observations with a certain form of representation of the environment. Existing methods generally require precise measurement and storage of the observed environment features, which may not always be robust due to the change of season, viewpoint, occlusion, etc. They are also challenging to scale up and may not be applicable for humans due to the lack of measuring/imaging devices. Considering that humans often use less precise but easily produced qualitative spatial language and high-level semantic landmarks when describing an environment, a qualitative location retrieval method is proposed in this work by describing locations/places using qualitative place signatures (QPS), defined as the perceived spatial relations between ordered pairs of co-visible landmarks from viewers' perspective. After dividing the space into place cells each with individual signatures attached, a coarse-to-fine location retrieval method is proposed to efficiently identify the possible location(s) of viewers based on their qualitative observations. The usability and effectiveness of the proposed method were evaluated using openly available landmark datasets, together with simulated observations by considering the possible perception error.


Language statistics at different spatial, temporal, and grammatical scales

arXiv.org Artificial Intelligence

Statistical linguistics has advanced considerably in recent decades as data has become available. This has allowed researchers to study how statistical properties of languages change over time. In this work, we use data from Twitter to explore English and Spanish considering the rank diversity at different scales: temporal (from 3 to 96 hour intervals), spatial (from 3km to 3000+km radii), and grammatical (from monograms to pentagrams). We find that all three scales are relevant. However, the greatest changes come from variations in the grammatical scale. At the lowest grammatical scale (monograms), the rank diversity curves are most similar, independently on the values of other scales, languages, and countries. As the grammatical scale grows, the rank diversity curves vary more depending on the temporal and spatial scales, as well as on the language and country. We also study the statistics of Twitter-specific tokens: emojis, hashtags, and user mentions. These particular type of tokens show a sigmoid kind of behaviour as a rank diversity function. Our results are helpful to quantify aspects of language statistics that seem universal and what may lead to variations.


GE-Grasp: Efficient Target-Oriented Grasping in Dense Clutter

arXiv.org Artificial Intelligence

Grasping in dense clutter is a fundamental skill for autonomous robots. However, the crowdedness and occlusions in the cluttered scenario cause significant difficulties to generate valid grasp poses without collisions, which results in low efficiency and high failure rates. To address these, we present a generic framework called GE-Grasp for robotic motion planning in dense clutter, where we leverage diverse action primitives for occluded object removal and present the generator-evaluator architecture to avoid spatial collisions. Therefore, our GE-Grasp is capable of grasping objects in dense clutter efficiently with promising success rates. Specifically, we define three action primitives: target-oriented grasping for target capturing, pushing, and nontarget-oriented grasping to reduce the crowdedness and occlusions. The generators effectively provide various action candidates referring to the spatial information. Meanwhile, the evaluators assess the selected action primitive candidates, where the optimal action is implemented by the robot. Extensive experiments in simulated and real-world environments show that our approach outperforms the state-of-the-art methods of grasping in clutter with respect to motion efficiency and success rates. Moreover, we achieve comparable performance in the real world as that in the simulation environment, which indicates the strong generalization ability of our GE-Grasp. Supplementary material is available at: https://github.com/CaptainWuDaoKou/GE-Grasp.


Spatial-Temporal Federated Learning for Lifelong Person Re-identification on Distributed Edges

arXiv.org Artificial Intelligence

Data drift is a thorny challenge when deploying person re-identification (ReID) models into real-world devices, where the data distribution is significantly different from that of the training environment and keeps changing. To tackle this issue, we propose a federated spatial-temporal incremental learning approach, named FedSTIL, which leverages both lifelong learning and federated learning to continuously optimize models deployed on many distributed edge clients. Unlike previous efforts, FedSTIL aims to mine spatial-temporal correlations among the knowledge learnt from different edge clients. Specifically, the edge clients first periodically extract general representations of drifted data to optimize their local models. Then, the learnt knowledge from edge clients will be aggregated by centralized parameter server, where the knowledge will be selectively and attentively distilled from spatial- and temporal-dimension with carefully designed mechanisms. Finally, the distilled informative spatial-temporal knowledge will be sent back to correlated edge clients to further improve the recognition accuracy of each edge client with a lifelong learning method. Extensive experiments on a mixture of five real-world datasets demonstrate that our method outperforms others by nearly 4% in Rank-1 accuracy, while reducing communication cost by 62%. All implementation codes are publicly available on https://github.com/MSNLAB/Federated-Lifelong-Person-ReID


A Transferable Intersection Reconstruction Network for Traffic Speed Prediction

arXiv.org Artificial Intelligence

Traffic speed prediction is the key to many valuable applications, and it is also a challenging task because of its various influencing factors. Recent work attempts to obtain more information through various hybrid models, thereby improving the prediction accuracy. However, the spatial information acquisition schemes of these methods have two-level differentiation problems. Either the modeling is simple but contains little spatial information, or the modeling is complete but lacks flexibility. In order to introduce more spatial information on the basis of ensuring flexibility, this paper proposes IRNet (Transferable Intersection Reconstruction Network). First, this paper reconstructs the intersection into a virtual intersection with the same structure, which simplifies the topology of the road network. Then, the spatial information is subdivided into intersection information and sequence information of traffic flow direction, and spatiotemporal features are obtained through various models. Third, a self-attention mechanism is used to fuse spatiotemporal features for prediction. In the comparison experiment with the baseline, not only the prediction effect, but also the transfer performance has obvious advantages.


Spatial-Temporal Feature Extraction and Evaluation Network for Citywide Traffic Condition Prediction

arXiv.org Artificial Intelligence

Abstract: Traffic prediction plays an important role in the realization of traffic control and scheduling tasks in intelligent transportation systems. With the diversification of data sources, re asonably using rich traffic data to model the complex spatial-temporal dependence and nonlinear characteristics in traffic flow are the key challenge for intelligent transportation system. In addition, clearly evaluating the importance of spatialtemporal features extracted from different data becomes a challenge. A Double Layer - Spatial Temporal Feature Extraction and Evaluation (DL-STFEE) model is proposed. The lower layer of DL-STFEE is spatialtemporal feature extraction layer. The spatial and temporal features in traffic data are extracted by multi-graph graph convolution and attention mechanism, and different combinations of spatial and temporal features are generated. The upper layer of DL-STFEE is the spatial-temporal feature evaluation layer. Through the attention score matrix generated by the high-dimensional self-attention mechanism, the spatial-temporal features combinations are fused and evaluated, so as to get the impact of different combinations on prediction effect. Three sets of experiments are performed on actual traffic datasets to show that DL-STFEE can effectively capture the spatial-temporal features and evaluate the importance of different spatial-temporal feature combinations. With the continuous acceleration of urbanization, the population and vehicle ownership are also increasing, resulting in traffic congestion and other problems. In order to improve the efficiency, sustainability and security of transportation network, intelligent transportation system (ITS) [1] is proposed and becomes an advancing research field. Traffic prediction is an important step in the development of intelligent transportation [2]. It 2 aims to predict future traffic conditions by integrating historical observation data and measurement information of road sensor networks.


MORE: Multi-Order RElation Mining for Dense Captioning in 3D Scenes

arXiv.org Artificial Intelligence

However, it is also more challenging due to the higher complexity and wider variety of inter-object relations contained in point clouds. Existing methods only treat such relations as by-products of object feature learning in graphs without specifically encoding them, which leads to sub-optimal results. In this paper, aiming at improving 3D dense captioning via capturing and utilizing the complex relations in the 3D scene, we propose MORE, a Multi-Order RElation mining model, to support generating more descriptive and comprehensive captions. Technically, our MORE encodes object relations in a progressive manner since complex relations can be deduced from a limited number of basic ones. We first devise a novel Spatial Layout Graph Convolution (SLGC), which semantically encodes several first-order relations as edges of a graph constructed over 3D object proposals. Next, from the resulting graph, we further extract multiple triplets which encapsulate basic first-order relations as the basic unit, and construct several Object-centric Triplet Attention Graphs (OTAG) to infer multi-order relations for every target object. The updated node features from OTAG are aggregated and fed into the caption decoder to provide abundant relational cues, so that captions including diverse relations with context objects can be generated. Extensive experiments on the Scan2Cap dataset prove the effectiveness of our proposed MORE and its components, and we also outperform the current state-of-the-art method. Our code is available at https://github.com/SxJyJay/MORE.