Goto

Collaborating Authors

 subimage


Domain-decomposed image classification algorithms using linear discriminant analysis and convolutional neural networks

arXiv.org Artificial Intelligence

In many modern computer application problems, the classification of image data plays an important role. Among many different supervised machine learning models, convolutional neural networks (CNNs) and linear discriminant analysis (LDA) as well as sophisticated variants thereof are popular techniques. In this work, two different domain decomposed CNN models are experimentally compared for different image classification problems. Both models are loosely inspired by domain decomposition methods and in addition, combined with a transfer learning strategy. The resulting models show improved classification accuracies compared to the corresponding, composed global CNN model without transfer learning and besides, also help to speed up the training process. Moreover, a novel decomposed LDA strategy is proposed which also relies on a localization approach and which is combined with a small neural network model. In comparison with a global LDA applied to the entire input data, the presented decomposed LDA approach shows increased classification accuracies for the considered test problems.


Model Parallel Training and Transfer Learning for Convolutional Neural Networks by Domain Decomposition

arXiv.org Artificial Intelligence

Deep convolutional neural networks (CNNs) have been shown to be very successful in a wide range of image processing applications. However, due to their increasing number of model parameters and an increasing availability of large amounts of training data, parallelization strategies to efficiently train complex CNNs are necessary. In previous work by the authors, a novel model parallel CNN architecture was proposed which is loosely inspired by domain decomposition. In particular, the novel network architecture is based on a decomposition of the input data into smaller subimages. For each of these subimages, local CNNs with a proportionally smaller number of parameters are trained in parallel and the resulting local classifications are then aggregated in a second step by a dense feedforward neural network (DNN). In the present work, we compare the resulting CNN-DNN architecture to less costly alternatives to combine the local classifications into a final, global decision. Additionally, we investigate the performance of the CNN-DNN trained as one coherent model as well as using a transfer learning strategy, where the parameters of the pre-trained local CNNs are used as initial values for a subsequently trained global coherent CNN-DNN model.


DDU-Net: A Domain Decomposition-based CNN for High-Resolution Image Segmentation on Multiple GPUs

arXiv.org Artificial Intelligence

The segmentation of ultra-high resolution images poses challenges such as loss of spatial information or computational inefficiency. In this work, a novel approach that combines encoder-decoder architectures with domain decomposition strategies to address these challenges is proposed. Specifically, a domain decomposition-based U-Net (DDU-Net) architecture is introduced, which partitions input images into non-overlapping patches that can be processed independently on separate devices. A communication network is added to facilitate inter-patch information exchange to enhance the understanding of spatial context. Experimental validation is performed on a synthetic dataset that is designed to measure the effectiveness of the communication network. Then, the performance is tested on the DeepGlobe land cover classification dataset as a real-world benchmark data set. The results demonstrate that the approach, which includes inter-patch communication for images divided into $16\times16$ non-overlapping subimages, achieves a $2-3\,\%$ higher intersection over union (IoU) score compared to the same network without inter-patch communication. The performance of the network which includes communication is equivalent to that of a baseline U-Net trained on the full image, showing that our model provides an effective solution for segmenting ultra-high-resolution images while preserving spatial context. The code is available at https://github.com/corne00/HiRes-Seg-CNN.


ComCLIP: Training-Free Compositional Image and Text Matching

arXiv.org Artificial Intelligence

Contrastive Language-Image Pretraining (CLIP) has demonstrated great zero-shot performance for matching images and text. However, it is still challenging to adapt vision-lanaguage pretrained models like CLIP to compositional image and text matching -- a more challenging image and text matching task requiring the model understanding of compositional word concepts and visual components. Towards better compositional generalization in zero-shot image and text matching, in this paper, we study the problem from a causal perspective: the erroneous semantics of individual entities are essentially confounders that cause the matching failure. Therefore, we propose a novel \textbf{\textit{training-free}} compositional CLIP model (ComCLIP). ComCLIP disentangles input images into subjects, objects, and action sub-images and composes CLIP's vision encoder and text encoder to perform evolving matching over compositional text embedding and sub-image embeddings. In this way, ComCLIP can mitigate spurious correlations introduced by the pretrained CLIP models and dynamically evaluate the importance of each component. Experiments on four compositional image-text matching datasets: SVO, ComVG, Winoground, and VL-checklist, and two general image-text retrieval datasets: Flick30K, and MSCOCO demonstrate the effectiveness of our plug-and-play method, which boosts the \textbf{\textit{zero-shot}} inference ability of CLIP, SLIP, and BLIP2 even without further training or fine-tuning. Our codes can be found at https://github.com/eric-ai-lab/ComCLIP.


Better Understanding Differences in Attribution Methods via Systematic Evaluations

arXiv.org Artificial Intelligence

Deep neural networks are very successful on many vision tasks, but hard to interpret due to their black box nature. To overcome this, various post-hoc attribution methods have been proposed to identify image regions most influential to the models' decisions. Evaluating such methods is challenging since no ground truth attributions exist. We thus propose three novel evaluation schemes to more reliably measure the faithfulness of those methods, to make comparisons between them more fair, and to make visual inspection more systematic. To address faithfulness, we propose a novel evaluation setting (DiFull) in which we carefully control which parts of the input can influence the output in order to distinguish possible from impossible attributions. To address fairness, we note that different methods are applied at different layers, which skews any comparison, and so evaluate all methods on the same layers (ML-Att) and discuss how this impacts their performance on quantitative metrics. For more systematic visualizations, we propose a scheme (AggAtt) to qualitatively evaluate the methods on complete datasets. We use these evaluation schemes to study strengths and shortcomings of some widely used attribution methods over a wide range of models. Finally, we propose a post-processing smoothing step that significantly improves the performance of some attribution methods, and discuss its applicability.


Publishers use AI to catch bad scientists doctoring data

#artificialintelligence

Analysis Shady scientists trying to publish bad research may want to think twice as academic publishers are increasingly using AI software to automatically spot signs of data tampering. Duplications of images, where the same picture of a cluster of cells, for example, is copied, flipped, rotated, shifted, or cropped is, unfortunately, quite common. In cases where the errors aren't accidental, the doctored images are created to look as if the researchers have more data and conducted more experiments then they really did. Image duplication was the top reason papers were retracted for the American Association for Cancer Research (AACR) over 2016 to 2020, according to Daniel Evanko, the company's Director of Journal Operations and Systems. Having to retract a paper damages the authors and the publishers' reputation.


Towards Better Understanding Attribution Methods

arXiv.org Artificial Intelligence

Deep neural networks are very successful on many vision tasks, but hard to interpret due to their black box nature. To overcome this, various post-hoc attribution methods have been proposed to identify image regions most influential to the models' decisions. Evaluating such methods is challenging since no ground truth attributions exist. We thus propose three novel evaluation schemes to more reliably measure the faithfulness of those methods, to make comparisons between them more fair, and to make visual inspection more systematic. To address faithfulness, we propose a novel evaluation setting (DiFull) in which we carefully control which parts of the input can influence the output in order to distinguish possible from impossible attributions. To address fairness, we note that different methods are applied at different layers, which skews any comparison, and so evaluate all methods on the same layers (ML-Att) and discuss how this impacts their performance on quantitative metrics. For more systematic visualizations, we propose a scheme (AggAtt) to qualitatively evaluate the methods on complete datasets. We use these evaluation schemes to study strengths and shortcomings of some widely used attribution methods. Finally, we propose a post-processing smoothing step that significantly improves the performance of some attribution methods, and discuss its applicability.


Image Segmentation : Part 1

#artificialintelligence

In this article we will cover Threshold Based and Edge based Segmentation. Other segmentation techniques will be discussed in later parts. Image thresholding segmentation is a simple form of image segmentation. It is a way to create a binary or multi color image based on setting a threshold value on the pixel intensity of the original image. In this thresholding process, we will consider the intensity histogram of all the pixels in the image.


AskReddit: Help with a guidance for my graduation thesis • /r/MachineLearning

#artificialintelligence

Hello, I'm a computer scientist student, I will finish CS this year so I already started my graduation thesis. I work on a Computer Vision - Robotics lab here on my university and my main field of interest and that I want to pursue as an academic field is machine learning / deep learning, so I thought about mixing robotics with machine learning which is something very common. My main idea is Outdoor Autonomous Navigation, I want my robot to know what a grass is, what a tree is, what people and cars are so he can avoid it or do the things I will set it to do, my approach to the problem so far and what I already did is: For every image frame I slice the image into subImages and for each subImage I calculate it's histogram and compare with a huge data base containing tons of histograms of grass/sky/trees (for example) and run a knn/svm to classify the subImage into one of the closest histograms, and if everything goes by the script I will have a full labeled system for the robot, but I'm facing some problems and I'm not a really expert on the field yet so I really wan't some guidance because I don't know what to do, my professor told me this will be kinda hard to do this way and for a graduation thesis, I have implemented a LBP descriptor to classificate some textures like grass and asphalt but I can't use LBP for everything, I don't even know if the LBP will be accurate for grass and asphalt (if my dataset is huge enough), anyways, sorry for the long text, I just don't know what path to seek now, I don't even know if my current approach is a good one or I'm doing something silly.


Fractally Finding the Odd One Out: An Analogical Strategy For Noticing Novelty

AAAI Conferences

The Odd One Out test of intelligence consists of 3x3 matrix reasoning problems organized in 20 levels of difficulty. Addressing problems on this test appears to require integration of multiple cognitive abilities usually associated with creativity, including visual encoding, similarity assessment, pattern detection, and analogical transfer. We describe a novel fractal strategy for addressing visual analogy problems on the Odd One Out test. In our strategy, the relationship between images is encoded fractally, capturing important aspects of similarity as well as inherent self-similarity. The strategy starts with fractal representations encoded at a high level of resolution, but, if that is not sufficient to resolve ambiguity, it automatically adjusts itself to the right level of resolution for addressing a given problem. Similarly, the strategy starts with searching for fractally-derived similarity between simpler relationships, but, if that is not sufficient to resolve ambiguity, it automatically shifts to search for such similarity between higher-order relationships.  We present preliminary results and initial analysis from applying the fractal technique on nearly 3,000 problems from the Odd One Out test.