Goto

Collaborating Authors

 feature extraction


SAFE Quantum Machine Learning with Variational Quantum Classifiers

arXiv.org Machine Learning

We propose a variational quantum classifier operating on high dimensional deep representations via amplitude encoding, stabilized by a learnable classical pre encoding layer.By combining normalized amplitude embeddings with bounded quantum observables, the resulting model induces a structured and smooth hypothesis class with controlled sensitivity to input variations. Model reliability is assessed using SAFE-AI metrics derived from the Cramer von Mises divergence, enabling consistent evaluation across accuracy, robustness, and explainability dimensions. Empirical results show that the proposed quantum model provides competitive predictive performance compared with strong classical baselines while exhibiting a more balanced SAFE reliability profile, with improved robustness to noise and stability under structured feature removal. These findings suggest that variational quantum circuits offer a principled mechanism for stability oriented SAFE learning in safety critical settings.


Fully Sparse Detection

Neural Information Processing Systems

As the perception range of LiDAR increases, LiDAR-based 3D object detection becomes a dominant task in the long-range perception task of autonomous driving. The mainstream 3D object detectors usually build dense feature maps in the network backbone and prediction head. However, the computational and spatial costs on the dense feature map are quadratic to the perception range, which makes them hardly scale up to the long-range setting. To enable efficient long-range LiDAR-based object detection, we build a fully sparse 3D object detector (FSD). The computational and spatial cost of FSD is roughly linear to the number of points and independent of the perception range. FSD is built upon the general sparse voxel encoder and a novel sparse instance recognition (SIR) module.


UPS: Unified Projection Sharing for Lightweight Single-Image Super-resolution and Beyond

Neural Information Processing Systems

To date, transformer-based frameworks have demonstrated impressive results in single-image super-resolution (SISR). However, under practical lightweight scenarios, the complex interaction of deep image feature extraction and similarity modeling limits the performance of these methods, since they require simultaneous layer-specific optimization of both two tasks. In this work, we introduce a novel Unified Projection Sharing algorithm(UPS) to decouple the feature extraction and similarity modeling, achieving notable performance. To do this, we establish a unified projection space defined by a learnable projection matrix, for similarity calculation across all self-attention layers. As a result, deep image feature extraction remains a per-layer optimization manner, while similarity modeling is carried out by projecting these image features onto the shared projection space. Extensive experiments demonstrate that our proposed UPS achieves state-of-the-art performance relative to leading lightweight SISR methods, as verified by various popular benchmarks. Moreover, our unified optimized projection space exhibits encouraging robustness performance for unseen data (degraded and depth images). Finally, UPS also demonstrates promising results across various image restoration tasks, including real-world and classic SISR, image denoising, and image deblocking.


DenoiseRep: Denoising Model for Representation Learning

Neural Information Processing Systems

The denoising model has been proven a powerful generative model but has little exploration of discriminative tasks. Representation learning is important in discriminative tasks, which is defined as . In this paper, we propose a novel Denoising Model for Representation Learning () to improve feature discrimination with joint feature extraction and denoising.


EEGPT: Pretrained Transformer for Universal and Reliable Representation of EEG Signals

Neural Information Processing Systems

Electroencephalography (EEG) is crucial for recording brain activity, with applications in medicine, neuroscience, and brain-computer interfaces (BCI). However, challenges such as low signal-to-noise ratio (SNR), high inter-subject variability, and channel mismatch complicate the extraction of robust, universal EEG representations. We propose EEGPT, a novel 10-million-parameter pretrained transformer model designed for universal EEG feature extraction. In EEGPT, a mask-based dual self-supervised learning method for efficient feature extraction is designed. Compared to other mask-based self-supervised learning methods, EEGPT introduces spatio-temporal representation alignment. This involves constructing a self-supervised task based on EEG representations that possess high SNR and rich semantic information, rather than on raw signals. Consequently, this approach mitigates the issue of poor feature quality typically extracted from low SNR signals. Additionally, EEGPT's hierarchical structure processes spatial and temporal information separately, reducing computational complexity while increasing flexibility and adaptability for BCI applications. By training on a large mixed multi-task EEG dataset, we fully exploit EEGPT's capabilities.


Learning Spherical Convolution for Fast Features from 360 Imagery

Neural Information Processing Systems

While 360 cameras offer tremendous new possibilities in vision, graphics, and augmented reality, the spherical images they produce make core feature extraction non-trivial. Convolutional neural networks (CNNs) trained on images from perspective cameras yield "flat filters, yet 360 images cannot be projected to a single plane without significant distortion. A naive solution that repeatedly projects the viewing sphere to all tangent planes is accurate, but much too computationally intensive for real problems. We propose to learn a spherical convolutional network that translates a planar CNN to process 360 imagery directly in its equirectangular projection. Our approach learns to reproduce the flat filter outputs on 360 data, sensitive to the varying distortion effects across the viewing sphere. The key benefits are 1) efficient feature extraction for 360 images and video, and 2) the ability to leverage powerful pre-trained networks researchers have carefully honed (together with massive labeled image training sets) for perspective images. We validate our approach compared to several alternative methods in terms of both raw CNN output accuracy as well as applying a state-of-the-art "flat object detector to 360 data. Our method yields the most accurate results while saving orders of magnitude in computation versus the existing exact reprojection solution.




DenoiseRep: DenoisingModelfor RepresentationLearning

Neural Information Processing Systems

Experimental results on various discriminative vision tasks, including re-identification (Market-1501, DukeMTMC-reID, MSMT17, CUHK-03,vehicleID),imageclassification(ImageNet,UB200,Oxford-Pet,Flowers), object detection (COCO), image segmentation (ADE20K) show stability and impressive improvements.


3198dfd0aef271d22f7bcddd6f12f5cb-Paper.pdf

Neural Information Processing Systems

Classical approaches are based on adetect-thendescribeparadigm where separate handcrafted methods areusedtofirstidentify repeatable keypoints and then represent them with a local descriptor.