Xu, Qian
CCTNet: A Circular Convolutional Transformer Network for LiDAR-based Place Recognition Handling Movable Objects Occlusion
Wang, Gang, Zhu, Chaoran, Xu, Qian, Zhang, Tongzhou, Zhang, Hai, Fan, XiaoPeng, Hu, Jue
Abstract--Place recognition is a fundamental task for robotic application, allowing robots to perform loop closure detection within simultaneous localization and mapping (SLAM), and achieve re-localization on prior maps. Current range imagebased networks use single-column convolution to maintain feature invariance to shifts in image columns caused by LiDAR viewpoint change. However, this raises the issues such as restricted receptive fields and excessive focus on local regions, degrading the performance of networks. To address the aforementioned issues, we propose a lightweight circular convolutional Transformer network denoted as CCTNet, which boosts performance by capturing structural information in point clouds and facilitating cross-dimensional interaction of spatial and channel information. Through extensive experiments on the KITTI and Ford Campus datasets, CCTNet surpasses comparable methods, achieving Recall@1 of 0.924 and 0.965, Results on the self-collected dataset further demonstrate the proposed method's potential for practical Hai Zhang is with the Centre for Composite Materials and Structures, Harbin Institute of Technology, Harbin 150001, P.R.China (e-mail: Materials and Structures, Harbin Institute of Technology, Harbin 150001, P.R.China (e-mail: juehundt@hit.edu.cn). Rhling et al. [14] proposed In this paper, a circular convolutional Transformer network a statistical-based method called Fast Histogram algorithm, with a regression loss is proposed for place recognition task which generates a one-dimensional histogram as a descriptor in scenarios with movable object occlusion. It treats the range image as Moreover, Scan Context [11] employed the polar coordinate a ring, utilizing multi-column convolution to learn local feature to map the point cloud into a two-dimensional (2D) matrix details, relationships between range image columns, and along radial and angular directions, serving as descriptors for circular structural features of the point clouds. However, crafting manual features usually a Range Transformer module is proposed to dynamically allocate requires domain-specific expertise, and manual descriptors weights to various channels and pixel regions, enabling exhibit limited robustness in handling variations and uncertainties the fusion and interaction of information from both channel in complex scenes [15].
Uncertainty-Encoded Multi-Modal Fusion for Robust Object Detection in Autonomous Driving
Lou, Yang, Song, Qun, Xu, Qian, Tan, Rui, Wang, Jianping
Multi-modal fusion has shown initial promising results for object detection of autonomous driving perception. However, many existing fusion schemes do not consider the quality of each fusion input and may suffer from adverse conditions on one or more sensors. While predictive uncertainty has been applied to characterize single-modal object detection performance at run time, incorporating uncertainties into the multi-modal fusion still lacks effective solutions due primarily to the uncertainty's cross-modal incomparability and distinct sensitivities to various adverse conditions. To fill this gap, this paper proposes Uncertainty-Encoded Mixture-of-Experts (UMoE) that explicitly incorporates single-modal uncertainties into LiDAR-camera fusion. UMoE uses individual expert network to process each sensor's detection result together with encoded uncertainty. Then, the expert networks' outputs are analyzed by a gating network to determine the fusion weights. The proposed UMoE module can be integrated into any proposal fusion pipeline. Evaluation shows that UMoE achieves a maximum of 10.67%, 3.17%, and 5.40% performance gain compared with the state-of-the-art proposal-level multi-modal object detectors under extreme weather, adversarial, and blinding attack scenarios.
Neural Architecture Search for Intel Movidius VPU
Xu, Qian, Li, Victor, S, Crews Darren
Intel Movidius VPU enable demanding computer vision and AI workloads with efficiency. By coupling highly parallel programmable compute with workload-specific AI hardware acceleration in a unique architecture that minimizes data movement, Movidius VPUs achieve a balance of power efficiency, and compute performance. But the AI models from customers are usually generally built and not designed for a specific hardware as fig.1 left shows. Due to the different designs of various AI accelerators, general models can't fully utilize hardware's capability. That gives the chance to design better models for hardware: higher fps at same accuracy level or higher accuracy at same fps. However, even for hardware specialists, the design space of possible networks is still extremely large and impossible for handcrafting.
A Survey on Vertical Federated Learning: From a Layered Perspective
Yang, Liu, Chai, Di, Zhang, Junxue, Jin, Yilun, Wang, Leye, Liu, Hao, Tian, Han, Xu, Qian, Chen, Kai
Vertical federated learning (VFL) is a promising category of federated learning for the scenario where data is vertically partitioned and distributed among parties. VFL enriches the description of samples using features from different parties to improve model capacity. Compared with horizontal federated learning, in most cases, VFL is applied in the commercial cooperation scenario of companies. Therefore, VFL contains tremendous business values. In the past few years, VFL has attracted more and more attention in both academia and industry. In this paper, we systematically investigate the current work of VFL from a layered perspective. From the hardware layer to the vertical federated system layer, researchers contribute to various aspects of VFL. Moreover, the application of VFL has covered a wide range of areas, e.g., finance, healthcare, etc. At each layer, we categorize the existing work and explore the challenges for the convenience of further research and development of VFL. Especially, we design a novel MOSP tree taxonomy to analyze the core component of VFL, i.e., secure vertical federated machine learning algorithm. Our taxonomy considers four dimensions, i.e., machine learning model (M), protection object (O), security model (S), and privacy-preserving protocol (P), and provides a comprehensive investigation.
SecureBoost+ : A High Performance Gradient Boosting Tree Framework for Large Scale Vertical Federated Learning
Chen, Weijing, Ma, Guoqiang, Fan, Tao, Kang, Yan, Xu, Qian, Yang, Qiang
Gradient boosting decision tree (GBDT) is a widely used ensemble algorithm in the industry. Its vertical federated learning version, SecureBoost, is one of the most popular algorithms used in cross-silo privacy-preserving modeling. As the area of privacy computation thrives in recent years, demands for large-scale and high-performance federated learning have grown dramatically in real-world applications. In this paper, to fulfill these requirements, we propose SecureBoost+ that is both novel and improved from the prior work SecureBoost. SecureBoost+ integrates several ciphertext calculation optimizations and engineering optimizations. The experimental results demonstrate that Secureboost+ has significant performance improvements on large and high dimensional data sets compared to SecureBoost. It makes effective and efficient large-scale vertical federated learning possible.
Graph Random Neural Network for Semi-Supervised Learning on Graphs
Feng, Wenzheng, Zhang, Jie, Dong, Yuxiao, Han, Yu, Luan, Huanbo, Xu, Qian, Yang, Qiang, Kharlamov, Evgeny, Tang, Jie
We study the problem of semi-supervised learning on graphs, for which graph neural networks (GNNs) have been extensively explored. However, most existing GNNs inherently suffer from the limitations of over-smoothing, non-robustness, and weak-generalization when labeled nodes are scarce. In this paper, we propose a simple yet effective framework---GRAPH RANDOM NEURAL NETWORKS (GRAND)---to address these issues. In GRAND, we first design a random propagation strategy to perform graph data augmentation. Then we leverage consistency regularization to optimize the prediction consistency of unlabeled nodes across different data augmentations. Extensive experiments on graph benchmark datasets suggest that GRAND significantly outperforms state-of-the-art GNN baselines on semi-supervised node classification. Finally, we show that GRAND mitigates the issues of over-smoothing and non-robustness, exhibiting better generalization behavior than existing GNNs. The source code of GRAND is publicly available at https://github.com/Grand20/grand.
A Learning-based Discretionary Lane-Change Decision-Making Model with Driving Style Awareness
Zhang, Yifan, Xu, Qian, Wang, Jianping, Wu, Kui, Zheng, Zuduo, Lu, Kejie
Discretionary lane change (DLC) is a basic but complex maneuver in driving, which aims at reaching a faster speed or better driving conditions, e.g., further line of sight or better ride quality. Although many DLC decision-making models have been studied in traffic engineering and autonomous driving, the impact of human factors, which is an integral part of current and future traffic flow, is largely ignored in the existing literature. In autonomous driving, the ignorance of human factors of surrounding vehicles will lead to poor interaction between the ego vehicle and the surrounding vehicles, thus, a high risk of accidents. The human factors are also a crucial part to simulate a human-like traffic flow in the traffic engineering area. In this paper, we integrate the human factors that are represented by driving styles to design a new DLC decision-making model. Specifically, our proposed model takes not only the contextual traffic information but also the driving styles of surrounding vehicles into consideration and makes lane-change/keep decisions. Moreover, the model can imitate human drivers' decision-making maneuvers to the greatest extent by learning the driving style of the ego vehicle. Our evaluation results show that the proposed model almost follows the human decision-making maneuvers, which can achieve 98.66% prediction accuracy with respect to human drivers' decisions against the ground truth. Besides, the lane-change impact analysis results demonstrate that our model even performs better than human drivers in terms of improving the safety and speed of traffic.
A Fully-Automatic Framework for Parkinson's Disease Diagnosis by Multi-Modality Images
Xu, Jiahang, Jiao, Fangyang, Huang, Yechong, Luo, Xinzhe, Xu, Qian, Li, Ling, Liu, Xueling, Zuo, Chuantao, Wu, Ping, Zhuang, Xiahai
Background: Parkinson's disease (PD) is a prevalent long-term neurodegenerative disease. Though the diagnostic criteria of PD are relatively well defined, the current medical imaging diagnostic procedures are expertise-demanding, and thus call for a higher-integrated AI-based diagnostic algorithm. Methods: In this paper, we proposed an automatic, end-to-end, multi-modality diagnosis framework, including segmentation, registration, feature generation and machine learning, to process the information of the striatum for the diagnosis of PD. Multiple modalities, including T1- weighted MRI and 11C-CFT PET, were used in the proposed framework. The reliability of this framework was then validated on a dataset from the PET center of Huashan Hospital, as the dataset contains paired T1-MRI and CFT-PET images of 18 Normal (NL) subjects and 49 PD subjects. Results: We obtained an accuracy of 100% for the PD/NL classification task, besides, we conducted several comparative experiments to validate the diagnosis ability of our framework. Conclusion: Through experiment we illustrate that (1) automatic segmentation has the same classification effect as the manual segmentation, (2) the multi-modality images generates a better prediction than single modality images, and (3) volume feature is shown to be irrelevant to PD diagnosis.
Federated Reinforcement Learning
Zhuo, Hankz Hankui, Feng, Wenfeng, Xu, Qian, Yang, Qiang, Lin, Yufeng
In reinforcement learning, building policies of high-quality is challenging when the feature space of states is small and the training data is limited. Directly transferring data or knowledge from an agent to another agent will not work due to the privacy requirement of data and models. In this paper, we propose a novel reinforcement learning approach to considering the privacy requirement and building Q-network for each agent with the help of other agents, namely federated reinforcement learning (FRL). To protect the privacy of data and models, we exploit Gausian differentials on the information shared with each other when updating their local models. In the experiment, we evaluate our FRL framework in two diverse domains, Grid-world and Text2Action domains, by comparing to various baselines.
Neural network state estimation for full quantum state tomography
Xu, Qian, Xu, Shuqi
An efficient state estimation model, neural network estimation (NNE), empowered by machine learning techniques, is presented for full quantum state tomography (FQST). A parameterized function based on neural network is applied to map the measurement outcomes to the estimated quantum states. Parameters are updated with supervised learning procedures. From the computational complexity perspective our algorithm is the most efficient one among existing state estimation algorithms for full quantum state tomography. We perform numerical tests to prove both the accuracy and scalability of our model.