Plotting

 Chen, Dongdong


Self-Supervised Learning based on Heat Equation

arXiv.org Artificial Intelligence

This paper presents a new perspective of self-supervised learning based on extending heat equation into high dimensional feature space. In particular, we remove time dependence by steady-state condition, and extend the remaining 2D Laplacian from x--y isotropic to linear correlated. Furthermore, we simplify it by splitting x and y axes as two first-order linear differential equations. Such simplification explicitly models the spatial invariance along horizontal and vertical directions separately, supporting prediction across image blocks. This introduces a very simple masked image modeling (MIM) method, named QB-Heat. QB-Heat leaves a single block with size of quarter image unmasked and extrapolates other three masked quarters linearly. It brings MIM to CNNs without bells and whistles, and even works well for pre-training light-weight networks that are suitable for both image classification and object detection without fine-tuning. Compared with MoCo-v2 on pre-training a Mobile-Former with 5.8M parameters and 285M FLOPs, QB-Heat is on par in linear probing on ImageNet, but clearly outperforms in non-linear probing that adds a transformer block before linear classifier (65.6% vs. 52.9%). When transferring to object detection with frozen backbone, QB-Heat outperforms MoCo-v2 and supervised pre-training on ImageNet by 7.9 and 4.5 AP respectively. This work provides an insightful hypothesis on the invariance within visual representation over different shapes and textures: the linear relationship between horizontal and vertical derivatives. The code will be publicly released.


Sampling Theorems for Learning from Incomplete Measurements

arXiv.org Machine Learning

In many real-world settings, only incomplete measurement data are available which can pose a problem for learning. Unsupervised learning of the signal model using a fixed incomplete measurement process is impossible in general, as there is no information in the nullspace of the measurement operator. This limitation can be overcome by using measurements from multiple operators. While this idea has been successfully applied in various applications, a precise characterization of the conditions for learning is still lacking. In this paper, we fill this gap by presenting necessary and sufficient conditions for learning the signal model which indicate the interplay between the number of distinct measurement operators $G$, the number of measurements per operator $m$, the dimension of the model $k$ and the dimension of the signals $n$. In particular, we show that generically unsupervised learning is possible if each operator obtains at least $m>k+n/G$ measurements. Our results are agnostic of the learning algorithm and have implications in a wide range of practical algorithms, from low-rank matrix recovery to deep neural networks.


Florence: A New Foundation Model for Computer Vision

arXiv.org Artificial Intelligence

Automated visual understanding of our diverse and open world demands computer vision models to generalize well with minimal customization for specific tasks, similar to human vision. Computer vision foundation models, which are trained on diverse, large-scale dataset and can be adapted to a wide range of downstream tasks, are critical for this mission to solve real-world computer vision applications. While existing vision foundation models such as CLIP, ALIGN, and Wu Dao 2.0 focus mainly on mapping images and textual representations to a cross-modal shared representation, we introduce a new computer vision foundation model, Florence, to expand the representations from coarse (scene) to fine (object), from static (images) to dynamic (videos), and from RGB to multiple modalities (caption, depth). By incorporating universal visual-language representations from Web-scale image-text data, our Florence model can be easily adapted for various computer vision tasks, such as classification, retrieval, object detection, VQA, image caption, video retrieval and action recognition. Moreover, Florence demonstrates outstanding performance in many types of transfer learning: fully sampled fine-tuning, linear probing, few-shot transfer and zero-shot transfer for novel images and objects. All of these properties are critical for our vision foundation model to serve general purpose vision tasks. Florence achieves new state-of-the-art results in majority of 44 representative benchmarks, e.g., ImageNet-1K zero-shot classification with top-1 accuracy of 83.74 and the top-5 accuracy of 97.18, 62.4 mAP on COCO fine tuning, 80.36 on VQA, and 87.8 on Kinetics-600.


Weak NAS Predictors Are All You Need

arXiv.org Machine Learning

Neural Architecture Search (NAS) finds the best network architecture by exploring the architecture-to-performance manifold. It often trains and evaluates a large number of architectures, causing tremendous computation costs. Recent predictor-based NAS approaches attempt to solve this problem with two key steps: sampling some architecture-performance pairs and fitting a proxy accuracy predictor. Given limited samples, these predictors, however, are far from accurate to locate top architectures. In this paper, we shift the paradigm from finding a complicated predictor that covers the whole architecture space to a set of weaker predictors that progressively move towards the high-performance sub-space. It is based on the key property of the proposed weak predictors that their probabilities of sampling better architectures keep increasing. We thus only sample a few well-performed architectures guided by the previously learned predictor and estimate a new better weak predictor. By this coarse-to-fine iteration, the ranking of sampling space is refined gradually, which helps find the optimal architectures eventually. Experiments demonstrate that our method costs fewer samples to find the top-performance architectures on NAS-Bench-101 and NAS-Bench-201, and it achieves the state-of-the-art ImageNet performance on the NASNet search space. The code is available at https://github.com/VITA-Group/WeakNAS


Signed Laplacian Deep Learning with Adversarial Augmentation for Improved Mammography Diagnosis

arXiv.org Machine Learning

Computer-aided breast cancer diagnosis in mammography is limited by inadequate data and the similarity between benign and cancerous masses. To address this, we propose a signed graph regularized deep neural network with adversarial augmentation, named \textsc{DiagNet}. Firstly, we use adversarial learning to generate positive and negative mass-contained mammograms for each mass class. After that, a signed similarity graph is built upon the expanded data to further highlight the discrimination. Finally, a deep convolutional neural network is trained by jointly optimizing the signed graph regularization and classification loss. Experiments show that the \textsc{DiagNet} framework outperforms the state-of-the-art in breast mass diagnosis in mammography.


A Deep DUAL-PATH Network for Improved Mammogram Image Processing

arXiv.org Machine Learning

We present, for the first time, a novel deep neural network architecture called \dcn with a dual-path connection between the input image and output class label for mammogram image processing. This architecture is built upon U-Net, which non-linearly maps the input data into a deep latent space. One path of the \dcnn, the locality preserving learner, is devoted to hierarchically extracting and exploiting intrinsic features of the input, while the other path, called the conditional graph learner, focuses on modeling the input-mask correlations. The learned mask is further used to improve classification results, and the two learning paths complement each other. By integrating the two learners our new architecture provides a simple but effective way to jointly learn the segmentation and predict the class label. Benefiting from the powerful expressive capacity of deep neural networks a more discriminative representation can be learned, in which both the semantics and structure are well preserved. Experimental results show that \dcn achieves the best mammography segmentation and classification simultaneously, outperforming recent state-of-the-art models.


A deep learning approach for Magnetic Resonance Fingerprinting

arXiv.org Machine Learning

Current popular methods for Magnetic Resonance Fingerprint (MRF) recovery are bottlenecked by the heavy storage and computation requirements of a matched-filtering step due to the growing size and complexity of the fingerprint dictionaries in multi-parametric quantitative MRI applications. In this abstract we investigate and evaluate advantages of a deep learning approach for embedding the manifold of solutions of the Bloch equations and to address these shortcomings.


Unsupervised Multi-Manifold Clustering by Learning Deep Representation

AAAI Conferences

In this paper, we propose a novel deep manifold clustering (DMC) method for learning effective deep representations and partitioning a dataset into clusters where each cluster contains data points from a single nonlinear manifold. Different from other previous research efforts, we adopt deep neural network to classify and parameterize unlabeled data which lie on multiple manifolds. Firstly, motivated by the observation that nearby points lie on the local of manifold should possess similar representations, a locality preserving objective is defined to iteratively explore data relation and learn structure preserving representations. Secondly, by finding the corresponding cluster centers from the representations, a clustering-oriented objective is then proposed to guide the model to extract both discriminative and cluster-specific representations. Finally, by integrating two objectives into a single model with a unified cost function and optimizing it by using back propagation, we can obtain not only more powerful representations, but also more precise clusters of data. In addition, our model can be intuitively extended to cluster out-of-sample datum. The experimental results and comparisons with existing state-of-the-art methods show that the proposed method consistently achieves the best performance on various benchmark datasets.


A Local Non-Negative Pursuit Method for Intrinsic Manifold Structure Preservation

AAAI Conferences

The local neighborhood selection plays a crucial role for most representation based manifold learning algorithms. This paper reveals that an improper selection of neighborhood for learning representation will introduce negative components in the learnt representations. Importantly, the representations with negative components will affect the intrinsic manifold structure preservation. In this paper, a local non-negative pursuit (LNP) method is proposed for neighborhood selection and non-negative representations are learnt. Moreover, it is proved that the learnt representations are sparse and convex. Theoretical analysis and experimental results show that the proposed method achieves or outperforms the state-of-the-art results on various manifold learning problems.