Query-based Temporal Fusion with Explicit Motion for 3D Object Detection

Neural Information Processing Systems 

Existing methods either conduct temporal fusion based on the dense BEV features or sparse 3D proposal features. However, the former does not pay more attention to foreground objects, leading to more computation costs and sub-optimal performance.