self-supervised method
- Health & Medicine > Diagnostic Medicine > Imaging (0.70)
- Health & Medicine > Therapeutic Area > Oncology (0.69)
- Europe > Germany > Brandenburg > Potsdam (0.05)
- North America > United States > California > San Diego County > San Diego (0.05)
- North America > Canada (0.05)
- Health & Medicine > Diagnostic Medicine > Imaging (0.84)
- Health & Medicine > Health Care Technology (0.65)
- Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.05)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
AutoLink: Self-supervised Learning of Human Skeletons and Object Outlines by Linking Keypoints
Structured representations such as keypoints are widely used in pose transfer, conditional image generation, animation, and 3D reconstruction. However, their supervised learning requires expensive annotation for each target domain. We propose a self-supervised method that learns to disentangle object structure from the appearance with a graph of 2D keypoints linked by straight edges. Both the keypoint location and their pairwise edge weights are learned, given only a collection of images depicting the same object class. The resulting graph is interpretable, for example, AutoLink recovers the human skeleton topology when applied to images showing people. Our key ingredients are i) an encoder that predicts keypoint locations in an input image, ii) a shared graph as a latent variable that links the same pairs of keypoints in every image, iii) an intermediate edge map that combines the latent graph edge weights and keypoint locations in a soft, differentiable manner, and iv) an inpainting objective on randomly masked images. Although simpler, AutoLink outperforms existing self-supervised methods on the established keypoint and pose estimation benchmarks and paves the way for structure-conditioned generative models on more diverse datasets.
- Information Technology > Artificial Intelligence > Machine Learning (0.80)
- Information Technology > Sensing and Signal Processing > Image Processing (0.60)
- Information Technology > Artificial Intelligence > Vision (0.60)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.60)
3D Self-Supervised Methods for Medical Imaging
Self-supervised learning methods have witnessed a recent surge of interest after proving successful in multiple application fields. In this work, we leverage these techniques, and we propose 3D versions for five different self-supervised methods, in the form of proxy tasks. Our methods facilitate neural network feature learning from unlabeled 3D images, aiming to reduce the required cost for expert annotation. The developed algorithms are 3D Contrastive Predictive Coding, 3D Rotation prediction, 3D Jigsaw puzzles, Relative 3D patch location, and 3D Exemplar networks. Our experiments show that pretraining models with our 3D tasks yields more powerful semantic representations, and enables solving downstream tasks more accurately and efficiently, compared to training the models from scratch and to pretraining them on 2D slices.
- Health & Medicine > Health Care Technology (0.45)
- Health & Medicine > Diagnostic Medicine > Imaging (0.45)
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
Visual and audio modalities are highly correlated, yet they contain different information. Their strong correlation makes it possible to predict the semantics of one from the other with good accuracy. Their intrinsic differences make cross-modal prediction a potentially more rewarding pretext task for self-supervised learning of video and audio representations compared to within-modality learning. Based on this intuition, we propose Cross-Modal Deep Clustering (XDC), a novel self-supervised method that leverages unsupervised clustering in one modality (e.g., audio) as a supervisory signal for the other modality (e.g., video). This cross-modal supervision helps XDC utilize the semantic correlation and the differences between the two modalities. Our experiments show that XDC outperforms single-modality clustering and other multi-modal variants. XDC achieves state-of-the-art accuracy among self-supervised methods on multiple video and audio benchmarks. Most importantly, our video model pretrained on large-scale unlabeled data significantly outperforms the same model pretrained with full-supervision on ImageNet and Kinetics for action recognition on HMDB51 and UCF101. To the best of our knowledge, XDC is the first self-supervised learning method that outperforms large-scale fully-supervised pretraining for action recognition on the same architecture.
Cross-Scale Self-Supervised Blind Image Deblurring via Implicit Neural Representation
Blind image deblurring (BID) is an important yet challenging image recovery problem. Most existing deep learning methods require supervised training with ground truth (GT) images. This paper introduces a self-supervised method for BID that does not require GT images. The key challenge is to regularize the training to prevent over-fitting due to the absence of GT images. By leveraging an exact relationship among the blurred image, latent image, and blur kernel across consecutive scales, we propose an effective cross-scale consistency loss. This is implemented by representing the image and kernel with implicit neural representations (INRs), whose resolution-free property enables consistent yet efficient computation for network training across multiple scales. Combined with a progressively coarse-to-fine training scheme, the proposed method significantly outperforms existing self-supervised methods in extensive experiments.
- Europe > Germany > Brandenburg > Potsdam (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- (10 more...)
- Health & Medicine > Therapeutic Area (1.00)
- Health & Medicine > Health Care Technology (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)