Wang, Yaonan
TransKD: Transformer Knowledge Distillation for Efficient Semantic Segmentation
Liu, Ruiping, Yang, Kailun, Roitberg, Alina, Zhang, Jiaming, Peng, Kunyu, Liu, Huayao, Wang, Yaonan, Stiefelhagen, Rainer
Semantic segmentation benchmarks in the realm of autonomous driving are dominated by large pre-trained transformers, yet their widespread adoption is impeded by substantial computational costs and prolonged training durations. To lift this constraint, we look at efficient semantic segmentation from a perspective of comprehensive knowledge distillation and consider to bridge the gap between multi-source knowledge extractions and transformer-specific patch embeddings. We put forward the Transformer-based Knowledge Distillation (TransKD) framework which learns compact student transformers by distilling both feature maps and patch embeddings of large teacher transformers, bypassing the long pre-training process and reducing the FLOPs by >85.0%. Specifically, we propose two fundamental and two optimization modules: (1) Cross Selective Fusion (CSF) enables knowledge transfer between cross-stage features via channel attention and feature map distillation within hierarchical transformers; (2) Patch Embedding Alignment (PEA) performs dimensional transformation within the patchifying process to facilitate the patch embedding distillation; (3) Global-Local Context Mixer (GL-Mixer) extracts both global and local information of a representative embedding; (4) Embedding Assistant (EA) acts as an embedding method to seamlessly bridge teacher and student models with the teacher's number of channels. Experiments on Cityscapes, ACDC, NYUv2, and Pascal VOC2012 datasets show that TransKD outperforms state-of-the-art distillation frameworks and rivals the time-consuming pre-training method. The source code is publicly available at https://github.com/RuipingL/TransKD.
OAFuser: Towards Omni-Aperture Fusion for Light Field Semantic Segmentation
Teng, Fei, Zhang, Jiaming, Peng, Kunyu, Wang, Yaonan, Stiefelhagen, Rainer, Yang, Kailun
Light field cameras, by harnessing the power of micro-lens array, are capable of capturing intricate angular and spatial details. This allows for acquiring complex light patterns and details from multiple angles, significantly enhancing the precision of image semantic segmentation, a critical aspect of scene interpretation in vision intelligence. However, the extensive angular information of light field cameras contains a large amount of redundant data, which is overwhelming for the limited hardware resources of intelligent vehicles. Besides, inappropriate compression leads to information corruption and data loss. To excavate representative information, we propose a new paradigm, Omni-Aperture Fusion model (OAFuser), which leverages dense context from the central view and discovers the angular information from sub-aperture images to generate a semantically consistent result. To avoid feature loss during network propagation and simultaneously streamline the redundant information from the light field camera, we present a simple yet very effective Sub-Aperture Fusion Module (SAFM) to embed sub-aperture images into angular features without any additional memory cost. Furthermore, to address the mismatched spatial information across viewpoints, we present a Center Angular Rectification Module (CARM) to realize feature resorting and prevent feature occlusion caused by asymmetric information. Our proposed OAFuser achieves state-of-the-art performance on the UrbanLF-Real and -Syn datasets and sets a new record of 84.93% in mIoU on the UrbanLF-Real Extended dataset, with a gain of +4.53%. The source code of OAFuser will be available at https://github.com/FeiBryantkit/OAFuser.
Towards Anytime Optical Flow Estimation with Event Cameras
Ye, Yaozu, Shi, Hao, Yang, Kailun, Wang, Ze, Yin, Xiaoting, Lin, Yining, Liu, Mao, Wang, Yaonan, Wang, Kaiwei
Optical flow estimation is a fundamental task in the field of autonomous driving. Event cameras are capable of responding to log-brightness changes in microseconds. Its characteristic of producing responses only to the changing region is particularly suitable for optical flow estimation. In contrast to the super low-latency response speed of event cameras, existing datasets collected via event cameras, however, only provide limited frame rate optical flow ground truth, (e.g., at 10Hz), greatly restricting the potential of event-driven optical flow. To address this challenge, we put forward a high-frame-rate, low-latency event representation Unified Voxel Grid, sequentially fed into the network bin by bin. We then propose EVA-Flow, an EVent-based Anytime Flow estimation network to produce high-frame-rate event optical flow with only low-frame-rate optical flow ground truth for supervision. The key component of our EVA-Flow is the stacked Spatiotemporal Motion Refinement (SMR) module, which predicts temporally dense optical flow and enhances the accuracy via spatial-temporal motion refinement. The time-dense feature warping utilized in the SMR module provides implicit supervision for the intermediate optical flow. Additionally, we introduce the Rectified Flow Warp Loss (RFWL) for the unsupervised evaluation of intermediate optical flow in the absence of ground truth. This is, to the best of our knowledge, the first work focusing on anytime optical flow estimation via event cameras. A comprehensive variety of experiments on MVSEC, DESC, and our EVA-FlowSet demonstrates that EVA-Flow achieves competitive performance, super-low-latency (5ms), fastest inference (9.2ms), time-dense motion estimation (200Hz), and strong generalization. Our code will be available at https://github.com/Yaozhuwa/EVA-Flow.
A Graph Reconstruction by Dynamic Signal Coefficient for Fault Classification
He, Wenbin, Mao, Jianxu, Wang, Yaonan, Li, Zhe, Fang, Qiu, Wu, Haotian
To improve the performance in identifying the faults under strong noise for rotating machinery, this paper presents a dynamic feature reconstruction signal graph method, which plays the key role of the proposed end-to-end fault diagnosis model. Specifically, the original mechanical signal is first decomposed by wavelet packet decomposition (WPD) to obtain multiple subbands including coefficient matrix. Then, with originally defined two feature extraction factors MDD and DDD, a dynamic feature selection method based on L2 energy norm (DFSL) is proposed, which can dynamically select the feature coefficient matrix of WPD based on the difference in the distribution of norm energy, enabling each sub-signal to take adaptive signal reconstruction. Next the coefficient matrices of the optimal feature sub-bands are reconstructed and reorganized to obtain the feature signal graphs. Finally, deep features are extracted from the feature signal graphs by 2D-Convolutional neural network (2D-CNN). Experimental results on a public data platform of a bearing and our laboratory platform of robot grinding show that this method is better than the existing methods under different noise intensities.
Towards Source-free Domain Adaptive Semantic Segmentation via Importance-aware and Prototype-contrast Learning
Cao, Yihong, Zhang, Hui, Lu, Xiao, Xiao, Zheng, Yang, Kailun, Wang, Yaonan
Domain adaptive semantic segmentation enables robust pixel-wise understanding in real-world driving scenes. Source-free domain adaptation, as a more practical technique, addresses the concerns of data privacy and storage limitations in typical unsupervised domain adaptation methods. It utilizes a well-trained source model and unlabeled target data to achieve adaptation in the target domain. However, in the absence of source data and target labels, current solutions cannot sufficiently reduce the impact of domain shift and fully leverage the information from the target data. In this paper, we propose an end-to-end source-free domain adaptation semantic segmentation method via Importance-Aware and Prototype-Contrast (IAPC) learning. The proposed IAPC framework effectively extracts domain-invariant knowledge from the well-trained source model and learns domain-specific knowledge from the unlabeled target domain. Specifically, considering the problem of domain shift in the prediction of the target domain by the source model, we put forward an importance-aware mechanism for the biased target prediction probability distribution to extract domain-invariant knowledge from the source model. We further introduce a prototype-contrast strategy, which includes a prototype-symmetric cross-entropy loss and a prototype-enhanced cross-entropy loss, to learn target intra-domain knowledge without relying on labels. A comprehensive variety of experiments on two domain adaptive semantic segmentation benchmarks demonstrates that the proposed end-to-end IAPC solution outperforms existing state-of-the-art methods. Code will be made publicly available at https://github.com/yihong-97/Source-free_IAPC.
A Signed Subgraph Encoding Approach via Linear Optimization for Link Sign Prediction
Fang, Zhihong, Tan, Shaolin, Wang, Yaonan
In this paper, we consider the problem of inferring the sign of a link based on limited sign data in signed networks. Regarding this link sign prediction problem, SDGNN (Signed Directed Graph Neural Networks) provides the best prediction performance currently to the best of our knowledge. In this paper, we propose a different link sign prediction architecture call SELO (Subgraph Encoding via Linear Optimization), which obtains overall leading prediction performances compared the state-of-the-art algorithm SDGNN. The proposed model utilizes a subgraph encoding approach to learn edge embeddings for signed directed networks. In particular, a signed subgraph encoding approach is introduced to embed each subgraph into a likelihood matrix instead of the adjacency matrix through a linear optimization method. Comprehensive experiments are conducted on six real-world signed networks with AUC, F1, micro-F1, and Macro-F1 as the evaluation metrics. The experiment results show that the proposed SELO model outperforms existing baseline feature-based methods and embedding-based methods on all the six real-world networks and in all the four evaluation metrics.
Artificial intelligence empowered multi-AGVs in manufacturing systems
Li, Dong, Ouyang, Bo, Wu, Duanpo, Wang, Yaonan
AGVs are driverless robotic vehicles that picks up and delivers materials. How to improve the efficiency while preventing deadlocks is the core issue in designing AGV systems. In this paper, we propose an approach to tackle this problem.The proposed approach includes a traditional AGV scheduling algorithm, which aims at solving deadlock problems, and an artificial neural network based component, which predict future tasks of the AGV system, and make decisions on whether to send an AGV to the predicted starting location of the upcoming task,so as to save the time of waiting for an AGV to go to there first when the upcoming task is created. Simulation results show that the proposed method significantly improves the efficiency as against traditional method, up to 20% to 30%.