Goto

Collaborating Authors

 Granger, Eric


Deep Weakly-Supervised Learning Methods for Classification and Localization in Histology Images: A Survey

arXiv.org Artificial Intelligence

Using deep learning models to diagnose cancer from histology data presents several challenges. Cancer grading and localization of regions of interest (ROIs) in these images normally relies on both image- and pixel-level labels, the latter requiring a costly annotation process. Deep weakly-supervised object localization (WSOL) methods provide different strategies for low-cost training of deep learning models. Using only image-class annotations, these methods can be trained to classify an image, and yield class activation maps (CAMs) for ROI localization. This paper provides a review of state-of-art DL methods for WSOL. We propose a taxonomy where these methods are divided into bottom-up and top-down methods according to the information flow in models. Although the latter have seen limited progress, recent bottom-up methods are currently driving much progress with deep WSOL methods. Early works focused on designing different spatial pooling functions. However, these methods reached limited localization accuracy, and unveiled a major limitation -- the under-activation of CAMs which leads to high false negative localization. Subsequent works aimed to alleviate this issue and recover complete object. Representative methods from our taxonomy are evaluated and compared in terms of classification and localization accuracy on two challenging histology datasets. Overall, the results indicate poor localization performance, particularly for generic methods that were initially designed to process natural images. Methods designed to address the challenges of histology data yielded good results. However, all methods suffer from high false positive/negative localization. Four key challenges are identified for the application of deep WSOL methods in histology -- under/over activation of CAMs, sensitivity to thresholding, and model selection.


Multimodal Data Augmentation for Visual-Infrared Person ReID with Corrupted Data

arXiv.org Artificial Intelligence

The re-identification (ReID) of individuals over a complex network of cameras is a challenging task, especially under real-world surveillance conditions. Several deep learning models have been proposed for visible-infrared (V-I) person ReID to recognize individuals from images captured using RGB and IR cameras. However, performance may decline considerably if RGB and IR images captured at test time are corrupted (e.g., noise, blur, and weather conditions). Although various data augmentation (DA) methods have been explored to improve the generalization capacity, these are not adapted for V-I person ReID. In this paper, a specialized DA strategy is proposed to address this multimodal setting. Given both the V and I modalities, this strategy allows to diminish the impact of corruption on the accuracy of deep person ReID models. Corruption may be modality-specific, and an additional modality often provides complementary information. Our multimodal DA strategy is designed specifically to encourage modality collaboration and reinforce generalization capability. For instance, punctual masking of modalities forces the model to select the informative modality. Local DA is also explored for advanced selection of features within and among modalities. The impact of training baseline fusion models for V-I person ReID using the proposed multimodal DA strategy is assessed on corrupted versions of the SYSU-MM01, RegDB, and ThermalWORLD datasets in terms of complexity and efficiency. Results indicate that using our strategy provides V-I ReID models the ability to exploit both shared and individual modality knowledge so they can outperform models trained with no or unimodal DA. GitHub code: https://github.com/art2611/ML-MDA.


Knowledge Distillation Methods for Efficient Unsupervised Adaptation Across Multiple Domains

arXiv.org Artificial Intelligence

Beyond the complexity of CNNs that require training on large annotated datasets, the domain shift between design and operational data has limited the adoption of CNNs in many real-world applications. For instance, in person re-identification, videos are captured over a distributed set of cameras with non-overlapping viewpoints. The shift between the source (e.g. lab setting) and target (e.g. cameras) domains may lead to a significant decline in recognition accuracy. Additionally, state-of-the-art CNNs may not be suitable for such real-time applications given their computational requirements. Although several techniques have recently been proposed to address domain shift problems through unsupervised domain adaptation (UDA), or to accelerate/compress CNNs through knowledge distillation (KD), we seek to simultaneously adapt and compress CNNs to generalize well across multiple target domains. In this paper, we propose a progressive KD approach for unsupervised single-target DA (STDA) and multi-target DA (MTDA) of CNNs. Our method for KD-STDA adapts a CNN to a single target domain by distilling from a larger teacher CNN, trained on both target and source domain data in order to maintain its consistency with a common representation. Our proposed approach is compared against state-of-the-art methods for compression and STDA of CNNs on the Office31 and ImageClef-DA image classification datasets. It is also compared against state-of-the-art methods for MTDA on Digits, Office31, and OfficeHome. In both settings -- KD-STDA and KD-MTDA -- results indicate that our approach can achieve the highest level of accuracy across target domains, while requiring a comparable or lower CNN complexity.


Laplacian Regularized Few-Shot Learning

arXiv.org Machine Learning

We propose a transductive Laplacian-regularized inference for few-shot tasks. Given any feature embedding learned from the base classes, we minimize a quadratic binary-assignment function containing two terms: (1) a unary term assigning query samples to the nearest class prototype, and (2) a pairwise Laplacian term encouraging nearby query samples to have consistent label assignments. Our transductive inference does not re-train the base model, and can be viewed as a graph clustering of the query set, subject to supervision constraints from the support set. We derive a computationally efficient bound optimizer of a relaxation of our function, which computes independent (parallel) updates for each query sample, while guaranteeing convergence. Following a simple cross-entropy training on the base classes, and without complex meta-learning strategies, we conducted comprehensive experiments over five few-shot learning benchmarks. Our LaplacianShot consistently outperforms state-of-the-art methods by significant margins across different models, settings, and data sets. Furthermore, our transductive inference is very fast, with computational times that are close to inductive inference, and can be used for large-scale few-shot tasks.


Joint Progressive Knowledge Distillation and Unsupervised Domain Adaptation

arXiv.org Machine Learning

Currently, the divergence in distributions of design and operational data, and large computational complexity are limiting factors in the adoption of CNNs in real-world applications. For instance, person re-identification systems typically rely on a distributed set of cameras, where each camera has different capture conditions. This can translate to a considerable shift between source (e.g. lab setting) and target (e.g. operational camera) domains. Given the cost of annotating image data captured for fine-tuning in each target domain, unsupervised domain adaptation (UDA) has become a popular approach to adapt CNNs. Moreover, state-of-the-art deep learning models that provide a high level of accuracy often rely on architectures that are too complex for real-time applications. Although several compression and UDA approaches have recently been proposed to overcome these limitations, they do not allow optimizing a CNN to simultaneously address both. In this paper, we propose an unexplored direction -- the joint optimization of CNNs to provide a compressed model that is adapted to perform well for a given target domain. In particular, the proposed approach performs unsupervised knowledge distillation (KD) from a complex teacher model to a compact student model, by leveraging both source and target data. It also improves upon existing UDA techniques by progressively teaching the student about domain-invariant features, instead of directly adapting a compact model on target domain data. Our method is compared against state-of-the-art compression and UDA techniques, using two popular classification datasets for UDA -- Office31 and ImageClef-DA. In both datasets, results indicate that our method can achieve the highest level of accuracy while requiring a comparable or lower time complexity.


Emotion Recognition with Spatial Attention and Temporal Softmax Pooling

arXiv.org Machine Learning

Video-based emotion recognition is a challenging task because it requires to distinguish the small deformations of the human face that represent emotions, while being invariant to stronger visual differences due to different identities. State-of-the-art methods normally use complex deep learning models such as recurrent neural networks (RNNs, LSTMs, GRUs), convolutional neural networks (CNNs, C3D, residual networks) and their combination. In this paper, we propose a simpler approach that combines a CNN pre-trained on a public dataset of facial images with (1) a spatial attention mechanism, to localize the most important regions of the face for a given emotion, and (2) temporal softmax pooling, to select the most important frames of the given video. Results on the challenging EmotiW dataset show that this approach can achieve higher accuracy than more complex approaches.


Weakly Supervised Object Localization using Min-Max Entropy: an Interpretable Framework

arXiv.org Machine Learning

Weakly supervised object localization (WSOL) models aim to locate objects of interest in an image after being trained only on data with coarse image level labels. Deep learning models for WSOL rely typically on convolutional attention maps with no constraints on the regions of interest which allows them to select any region, making them vulnerable to false positive regions. This issue occurs in many application domains, e.g., medical image analysis, where interpretability is central to the prediction. In order to improve the localization reliability, we propose a deep learning framework for WSOL with pixel level localization. It is composed of two sequential sub-networks: a localizer that localizes regions of interest; followed by a classifier that classifies them. Within its end-to-end training, we incorporate the prior knowledge stating that in an agnostic-class setup an image is more likely to contain relevant --object of interest-- and irrelevant regions --noise--. Based on the conditional entropy (CE) measured at the classifier, the localizer is driven to spot relevant regions (low CE), and irrelevant regions (high CE). Our framework is able to recover large discriminative regions using our recursive erasing algorithm that we incorporate within the backpropagation during training. Moreover, the framework handles intrinsically multi-instances. Experimental results on public datasets with medical images (GlaS colon cancer) and natural images (Caltech-UCSD Birds-200-2011, Oxford flower 102) show that, compared to state of the art WSOL methods, our framework can provide significant improvements in terms of image-level classification, pixel-level localization, and robustness to overfitting when dealing with few training samples. A public reproducible PyTorch implementation is provided in: https://github.com/sbelharbi/wsol-min-max-entropy-interpretability .


An Improved Trade-off Between Accuracy and Complexity with Progressive Gradient Pruning

arXiv.org Machine Learning

Although deep neural networks (NNs) have achieved state-of-the-art accuracy in many visual recognition tasks ,the growing computational complexity and energy consumption of networks remains an issue, especially for applications on platforms with limited resources and requiring real-time processing. Channel pruning techniques have recently shown promising results for the compression of convolutional NNs (CNNs). However, these techniques can result in low accuracy and complex optimisations because some only prune after training CNNs, while others prune from scratch during training by integrating sparsity constraints or modifying the loss function. The progressive soft filter pruning technique provides greater training efficiency, but its soft pruning strategy does no thandle the backward pass which is needed for better optimization. In this paper, a new Progressive Gradient Pruning (PGP) technique is proposed for iterative channel pruning during training. It relies on a criterion that measures the change in channel weights that improves existing progressive pruning, and on an effective hard and soft pruning strategies to adapt momentum tensors during the backward propagation pass. Experimental results obtained after training various CNNs on the MNIST and CIFAR10 datasets indicate that the PGP technique canachieve a better tradeoff between classification accuracy and network (time and memory) complexity than state-of-the-art channel pruning techniques


Clustering with Fairness Constraints: A Flexible and Scalable Approach

arXiv.org Machine Learning

This study investigates a general variational formulation of fair clustering, which can integrate fairness constraints with a large class of clustering objectives. Unlike the existing methods, our formulation can impose any desired (target) demographic proportions within each cluster. Furthermore, it enables to control the trade-off between fairness and the clustering objective. We derive an auxiliary function (tight upper bound) of our KL-based fairness penalty via its concave-convex decomposition and Lipschitz-gradient property. Our upper bound can be optimized jointly with various clustering objectives, including both prototype-based such as K-means and graph-based such as Normalized Cut. Interestingly, at each iteration, our general fair-clustering algorithm performs an independent update for each assignment variable, while guaranteeing convergence. Therefore, it can be easily distributed for large-scale data sets. Such scalability is important as it enables to explore different trade-off levels between fairness and clustering objectives. Unlike existing fairness-constrained spectral clustering, our formulation does not need storing an affinity matrix and computing its eigenvalue decomposition. Moreover, unlike existing prototype-based methods, our experiments reveal that fairness does not come at a significant cost of the clustering objective. In fact, several of our tests showed that our fairness penalty helped to avoid weak local minima of the clustering objective (i.e., with fairness, we obtained better clustering objectives). We demonstrate the flexibility and scalability of our algorithm with comprehensive evaluations over both synthetic and real world data sets, many of which are much larger than those used in recent fair-clustering methods.


Scalable Laplacian K-modes

Neural Information Processing Systems

We advocate Laplacian K-modes for joint clustering and density mode finding, and propose a concave-convex relaxation of the problem, which yields a parallel algorithm that scales up to large datasets and high dimensions. We optimize a tight bound (auxiliary function) of our relaxation, which, at each iteration, amounts to computing an independent update for each cluster-assignment variable, with guar- anteed convergence. Therefore, our bound optimizer can be trivially distributed for large-scale data sets. Furthermore, we show that the density modes can be obtained as byproducts of the assignment variables via simple maximum-value operations whose additional computational cost is linear in the number of data points. Our formulation does not need storing a full affinity matrix and computing its eigenvalue decomposition, neither does it perform expensive projection steps and Lagrangian-dual inner iterates for the simplex constraints of each point. Fur- thermore, unlike mean-shift, our density-mode estimation does not require inner- loop gradient-ascent iterates. It has a complexity independent of feature-space dimension, yields modes that are valid data points in the input set and is appli- cable to discrete domains as well as arbitrary kernels. We report comprehensive experiments over various data sets, which show that our algorithm yields very competitive performances in term of optimization quality (i.e., the value of the discrete-variable objective at convergence) and clustering accuracy.