Inductive Learning
DiFaReli: Diffusion Face Relighting
Ponglertnapakorn, Puntawat, Tritrong, Nontawat, Suwajanakorn, Supasorn
We present a novel approach to single-view face relighting in the wild. Handling non-diffuse effects, such as global illumination or cast shadows, has long been a challenge in face relighting. Prior work often assumes Lambertian surfaces, simplified lighting models or involves estimating 3D shape, albedo, or a shadow map. This estimation, however, is error-prone and requires many training examples with lighting ground truth to generalize well. Our work bypasses the need for accurate estimation of intrinsic components and can be trained solely on 2D images without any light stage data, multi-view images, or lighting ground truth. Our key idea is to leverage a conditional diffusion implicit model (DDIM) for decoding a disentangled light encoding along with other encodings related to 3D shape and facial identity inferred from off-the-shelf estimators. We also propose a novel conditioning technique that eases the modeling of the complex interaction between light and geometry by using a rendered shading reference to spatially modulate the DDIM. We achieve state-of-the-art performance on standard benchmark Multi-PIE and can photorealistically relight in-the-wild images. Please visit our page: https://diffusion-face-relighting.github.io
Fast FixMatch: Faster Semi-Supervised Learning with Curriculum Batch Size
Chen, John, Dun, Chen, Kyrillidis, Anastasios
Advances in Semi-Supervised Learning (SSL) have almost entirely closed the gap between SSL and Supervised Learning at a fraction of the number of labels. However, recent performance improvements have often come \textit{at the cost of significantly increased training computation}. To address this, we propose Curriculum Batch Size (CBS), \textit{an unlabeled batch size curriculum which exploits the natural training dynamics of deep neural networks.} A small unlabeled batch size is used in the beginning of training and is gradually increased to the end of training. A fixed curriculum is used regardless of dataset, model or number of epochs, and reduced training computations is demonstrated on all settings. We apply CBS, strong labeled augmentation, Curriculum Pseudo Labeling (CPL) \citep{FlexMatch} to FixMatch \citep{FixMatch} and term the new SSL algorithm Fast FixMatch. We perform an ablation study to show that strong labeled augmentation and/or CPL do not significantly reduce training computations, but, in synergy with CBS, they achieve optimal performance. Fast FixMatch also achieves substantially higher data utilization compared to previous state-of-the-art. Fast FixMatch achieves between $2.1\times$ - $3.4\times$ reduced training computations on CIFAR-10 with all but 40, 250 and 4000 labels removed, compared to vanilla FixMatch, while attaining the same cited state-of-the-art error rate \citep{FixMatch}. Similar results are achieved for CIFAR-100, SVHN and STL-10. Finally, Fast MixMatch achieves between $2.6\times$ - $3.3\times$ reduced training computations in federated SSL tasks and online/streaming learning SSL tasks, which further demonstrate the generializbility of Fast MixMatch to different scenarios and tasks.
ViewMix: Augmentation for Robust Representation in Self-Supervised Learning
Joint Embedding Architecture-based self-supervised learning methods have attributed the composition of data augmentations as a crucial factor for their strong representation learning capabilities. While regional dropout strategies have proven to guide models to focus on lesser indicative parts of the objects in supervised methods, it hasn't been adopted by self-supervised methods for generating positive pairs. This is because the regional dropout methods are not suitable for the input sampling process of the self-supervised methodology. Whereas dropping informative pixels from the positive pairs can result in inefficient training, replacing patches of a specific object with a different one can steer the model from maximizing the agreement between different positive pairs. Moreover, joint embedding representation learning methods have not made robustness their primary training outcome. To this end, we propose the ViewMix augmentation policy, specially designed for self-supervised learning, upon generating different views of the same image, patches are cut and pasted from one view to another. By leveraging the different views created by this augmentation strategy, multiple joint embedding-based self-supervised methodologies obtained better localization capability and consistently outperformed their corresponding baseline methods. It is also demonstrated that incorporating ViewMix augmentation policy promotes robustness of the representations in the state-of-the-art methods. Furthermore, our experimentation and analysis of compute times suggest that ViewMix augmentation doesn't introduce any additional overhead compared to other counterparts.
Detecting Manufacturing Defects in PCBs via Data-Centric Machine Learning on Solder Paste Inspection Features
Prasad-Rao, Jubilee, Heidary, Roohollah, Williams, Jesse
Automated detection of defects in Printed Circuit Board (PCB) manufacturing using Solder Paste Inspection (SPI) and Automated Optical Inspection (AOI) machines can help improve operational efficiency and significantly reduce the need for manual intervention. In this paper, using SPI-extracted features of 6 million pins, we demonstrate a data-centric approach to train Machine Learning (ML) models to detect PCB defects at three stages of PCB manufacturing. The 6 million PCB pins correspond to 2 million components that belong to 15,387 PCBs. Using a base extreme gradient boosting (XGBoost) ML model, we iterate on the data pre-processing step to improve detection performance. Combining pin-level SPI features using component and PCB IDs, we developed training instances also at the component and PCB level. This allows the ML model to capture any inter-pin, inter-component, or spatial effects that may not be apparent at the pin level. Models are trained at the pin, component, and PCB levels, and the detection results from the different models are combined to identify defective components.
Internet Explorer: Targeted Representation Learning on the Open Web
Li, Alexander C., Brown, Ellis, Efros, Alexei A., Pathak, Deepak
Modern vision models typically rely on fine-tuning general-purpose models pre-trained on large, static datasets. These general-purpose models only capture the knowledge within their pre-training datasets, which are tiny, out-of-date snapshots of the Internet -- where billions of images are uploaded each day. We suggest an alternate approach: rather than hoping our static datasets transfer to our desired tasks after large-scale pre-training, we propose dynamically utilizing the Internet to quickly train a small-scale model that does extremely well on the task at hand. Our approach, called Internet Explorer, explores the web in a self-supervised manner to progressively find relevant examples that improve performance on a desired target dataset. It cycles between searching for images on the Internet with text queries, self-supervised training on downloaded images, determining which images were useful, and prioritizing what to search for next. We evaluate Internet Explorer across several datasets and show that it outperforms or matches CLIP oracle performance by using just a single GPU desktop to actively query the Internet for 30--40 hours. Results, visualizations, and videos at https://internet-explorer-ssl.github.io/
Learning Multiscale Consistency for Self-supervised Electron Microscopy Instance Segmentation
Chen, Yinda, Huang, Wei, Liu, Xiaoyu, Deng, Shiyu, Chen, Qi, Xiong, Zhiwei
Instance segmentation in electron microscopy (EM) volumes is tough due to complex shapes and sparse annotations. Self-supervised learning helps but still struggles with intricate visual patterns in EM. To address this, we propose a pretraining framework that enhances multiscale consistency in EM volumes. Our approach leverages a Siamese network architecture, integrating both strong and weak data augmentations to effectively extract multiscale features. We uphold voxel-level coherence by reconstructing the original input data from these augmented instances. Furthermore, we incorporate cross-attention mechanisms to facilitate fine-grained feature alignment between these augmentations. Finally, we apply contrastive learning techniques across a feature pyramid, allowing us to distill distinctive representations spanning various scales. After pretraining on four large-scale EM datasets, our framework significantly improves downstream tasks like neuron and mitochondria segmentation, especially with limited finetuning data. It effectively captures voxel and feature consistency, showing promise for learning transferable representations for EM analysis.
Variational Self-Supervised Contrastive Learning Using Beta Divergence
Yavuz, Mehmet Can, Yanikoglu, Berrin
Learning a discriminative semantic space using unlabelled and noisy data remains unaddressed in a multi-label setting. We present a contrastive self-supervised learning method which is robust to data noise, grounded in the domain of variational methods. The method (VCL) utilizes variational contrastive learning with beta-divergence to learn robustly from unlabelled datasets, including uncurated and noisy datasets. We demonstrate the effectiveness of the proposed method through rigorous experiments including linear evaluation and fine-tuning scenarios with multi-label datasets in the face understanding domain. In almost all tested scenarios, VCL surpasses the performance of state-of-the-art self-supervised methods, achieving a noteworthy increase in accuracy.
Contrastive Learning as Kernel Approximation
Tsiolis, Konstantinos Christopher
In standard supervised machine learning, it is necessary to provide a label for every input in the data. While raw data in many application domains is easily obtainable on the Internet, manual labelling of this data is prohibitively expensive. To circumvent this issue, contrastive learning methods produce low-dimensional vector representations (also called features) of high-dimensional inputs on large unlabelled datasets. This is done by training with a contrastive loss function, which enforces that similar inputs have high inner product and dissimilar inputs have low inner product in the feature space. Rather than annotating each input individually, it suffices to define a means of sampling pairs of similar and dissimilar inputs. Contrastive features can then be fed as inputs to supervised learning systems on much smaller labelled datasets to obtain high accuracy on end tasks of interest. The goal of this thesis is to provide an overview of the current theoretical understanding of contrastive learning, specifically as it pertains to the minimizers of contrastive loss functions and their relationship to prior methods for learning features from unlabelled data. We highlight popular contrastive loss functions whose minimizers implicitly approximate a positive semidefinite (PSD) kernel. The latter is a well-studied object in functional analysis and learning theory that formalizes a notion of similarity between elements of a space. PSD kernels provide an implicit definition of features through the theory of reproducing kernel Hilbert spaces.
A Survey of the Impact of Self-Supervised Pretraining for Diagnostic Tasks with Radiological Images
VanBerlo, Blake, Hoey, Jesse, Wong, Alexander
Self-supervised pretraining has been observed to be effective at improving feature representations for transfer learning, leveraging large amounts of unlabelled data. This review summarizes recent research into its usage in X-ray, computed tomography, magnetic resonance, and ultrasound imaging, concentrating on studies that compare self-supervised pretraining to fully supervised learning for diagnostic tasks such as classification and segmentation. The most pertinent finding is that self-supervised pretraining generally improves downstream task performance compared to full supervision, most prominently when unlabelled examples greatly outnumber labelled examples. Based on the aggregate evidence, recommendations are provided for practitioners considering using self-supervised learning. Motivated by limitations identified in current research, directions and practices for future study are suggested, such as integrating clinical knowledge with theoretically justified self-supervised learning methods, evaluating on public datasets, growing the modest body of evidence for ultrasound, and characterizing the impact of self-supervised pretraining on generalization.
RoBoSS: A Robust, Bounded, Sparse, and Smooth Loss Function for Supervised Learning
Akhtar, Mushir, Tanveer, M., Arshad, Mohd.
In the domain of machine learning algorithms, the significance of the loss function is paramount, especially in supervised learning tasks. It serves as a fundamental pillar that profoundly influences the behavior and efficacy of supervised learning algorithms. Traditional loss functions, while widely used, often struggle to handle noisy and high-dimensional data, impede model interpretability, and lead to slow convergence during training. In this paper, we address the aforementioned constraints by proposing a novel robust, bounded, sparse, and smooth (RoBoSS) loss function for supervised learning. Further, we incorporate the RoBoSS loss function within the framework of support vector machine (SVM) and introduce a new robust algorithm named $\mathcal{L}_{rbss}$-SVM. For the theoretical analysis, the classification-calibrated property and generalization ability are also presented. These investigations are crucial for gaining deeper insights into the performance of the RoBoSS loss function in the classification tasks and its potential to generalize well to unseen data. To empirically demonstrate the effectiveness of the proposed $\mathcal{L}_{rbss}$-SVM, we evaluate it on $88$ real-world UCI and KEEL datasets from diverse domains. Additionally, to exemplify the effectiveness of the proposed $\mathcal{L}_{rbss}$-SVM within the biomedical realm, we evaluated it on two medical datasets: the electroencephalogram (EEG) signal dataset and the breast cancer (BreaKHis) dataset. The numerical results substantiate the superiority of the proposed $\mathcal{L}_{rbss}$-SVM model, both in terms of its remarkable generalization performance and its efficiency in training time.