Goto

Collaborating Authors

Image Understanding: Instructional Materials


Satellite Image Classification With Deep Learning

#artificialintelligence

In this tutorial, you'll see how to build a satellite image classifier using Python and Tensorflow. Satellite image classification is an important task when it comes down to agriculture, crop/forest monitoring, or even in urban scenarios, with planning tasks. We're going to use the EuroSAT dataset, which consists of Sentinel-2 satellite images covering…


A Survey on Visual Transfer Learning using Knowledge Graphs

arXiv.org Artificial Intelligence

Recent approaches of computer vision utilize deep learning methods as they perform quite well if training and testing domains follow the same underlying data distribution. However, it has been shown that minor variations in the images that occur when using these methods in the real world can lead to unpredictable errors. Transfer learning is the area of machine learning that tries to prevent these errors. Especially, approaches that augment image data using auxiliary knowledge encoded in language embeddings or knowledge graphs (KGs) have achieved promising results in recent years. This survey focuses on visual transfer learning approaches using KGs. KGs can represent auxiliary knowledge either in an underlying graph-structured schema or in a vector-based knowledge graph embedding. Intending to enable the reader to solve visual transfer learning problems with the help of specific KG-DL configurations we start with a description of relevant modeling structures of a KG of various expressions, such as directed labeled graphs, hypergraphs, and hyper-relational graphs. We explain the notion of feature extractor, while specifically referring to visual and semantic features. We provide a broad overview of knowledge graph embedding methods and describe several joint training objectives suitable to combine them with high dimensional visual embeddings. The main section introduces four different categories on how a KG can be combined with a DL pipeline: 1) Knowledge Graph as a Reviewer; 2) Knowledge Graph as a Trainee; 3) Knowledge Graph as a Trainer; and 4) Knowledge Graph as a Peer. To help researchers find evaluation benchmarks, we provide an overview of generic KGs and a set of image processing datasets and benchmarks including various types of auxiliary knowledge. Last, we summarize related surveys and give an outlook about challenges and open issues for future research.


An Analysis on Ensemble Learning optimized Medical Image Classification with Deep Convolutional Neural Networks

arXiv.org Artificial Intelligence

Novel and high-performance medical image classification pipelines are heavily utilizing ensemble learning strategies. The idea of ensemble learning is to assemble diverse models or multiple predictions and, thus, boost prediction performance. However, it is still an open question to what extent as well as which ensemble learning strategies are beneficial in deep learning based medical image classification pipelines. In this work, we proposed a reproducible medical image classification pipeline for analyzing the performance impact of the following ensemble learning techniques: Augmenting, Stacking, and Bagging. The pipeline consists of state-of-the-art preprocessing and image augmentation methods as well as 9 deep convolution neural network architectures. It was applied on four popular medical imaging datasets with varying complexity. Furthermore, 12 pooling functions for combining multiple predictions were analyzed, ranging from simple statistical functions like unweighted averaging up to more complex learning-based functions like support vector machines. Our results revealed that Stacking achieved the largest performance gain of up to 13% F1-score increase. Augmenting showed consistent improvement capabilities by up to 4% and is also applicable to single model based pipelines. Cross-validation based Bagging demonstrated to be the most complex ensemble learning method, which resulted in an F1-score decrease in all analyzed datasets (up to -10%). Furthermore, we demonstrated that simple statistical pooling functions are equal or often even better than more complex pooling functions. We concluded that the integration of Stacking and Augmentation ensemble learning techniques is a powerful method for any medical image classification pipeline to improve robustness and boost performance.


Towards holistic scene understanding: Semantic segmentation and beyond

arXiv.org Artificial Intelligence

This dissertation addresses visual scene understanding and enhances segmentation performance and generalization, training efficiency of networks, and holistic understanding. First, we investigate semantic segmentation in the context of street scenes and train semantic segmentation networks on combinations of various datasets. In Chapter 2 we design a framework of hierarchical classifiers over a single convolutional backbone, and train it end-to-end on a combination of pixel-labeled datasets, improving generalizability and the number of recognizable semantic concepts. Chapter 3 focuses on enriching semantic segmentation with weak supervision and proposes a weakly-supervised algorithm for training with bounding box-level and image-level supervision instead of only with per-pixel supervision. The memory and computational load challenges that arise from simultaneous training on multiple datasets are addressed in Chapter 4. We propose two methodologies for selecting informative and diverse samples from datasets with weak supervision to reduce our networks' ecological footprint without sacrificing performance. Motivated by memory and computation efficiency requirements, in Chapter 5, we rethink simultaneous training on heterogeneous datasets and propose a universal semantic segmentation framework. This framework achieves consistent increases in performance metrics and semantic knowledgeability by exploiting various scene understanding datasets. Chapter 6 introduces the novel task of part-aware panoptic segmentation, which extends our reasoning towards holistic scene understanding. This task combines scene and parts-level semantics with instance-level object detection. In conclusion, our contributions span over convolutional network architectures, weakly-supervised learning, part and panoptic segmentation, paving the way towards a holistic, rich, and sustainable visual scene understanding.


Challenges of Artificial Intelligence -- From Machine Learning and Computer Vision to Emotional Intelligence

arXiv.org Artificial Intelligence

Artificial intelligence (AI) has become a part of everyday conversation and our lives. It is considered as the new electricity that is revolutionizing the world. AI is heavily invested in both industry and academy. However, there is also a lot of hype in the current AI debate. AI based on so-called deep learning has achieved impressive results in many problems, but its limits are already visible. AI has been under research since the 1940s, and the industry has seen many ups and downs due to over-expectations and related disappointments that have followed. The purpose of this book is to give a realistic picture of AI, its history, its potential and limitations. We believe that AI is a helper, not a ruler of humans. We begin by describing what AI is and how it has evolved over the decades. After fundamentals, we explain the importance of massive data for the current mainstream of artificial intelligence. The most common representations for AI, methods, and machine learning are covered. In addition, the main application areas are introduced. Computer vision has been central to the development of AI. The book provides a general introduction to computer vision, and includes an exposure to the results and applications of our own research. Emotions are central to human intelligence, but little use has been made in AI. We present the basics of emotional intelligence and our own research on the topic. We discuss super-intelligence that transcends human understanding, explaining why such achievement seems impossible on the basis of present knowledge,and how AI could be improved. Finally, a summary is made of the current state of AI and what to do in the future. In the appendix, we look at the development of AI education, especially from the perspective of contents at our own university.


Pose Estimation of Specific Rigid Objects

arXiv.org Artificial Intelligence

In this thesis, we address the problem of estimating the 6D pose of rigid objects from a single RGB or RGB-D input image, assuming that 3D models of the objects are available. This problem is of great importance to many application fields such as robotic manipulation, augmented reality, and autonomous driving. First, we propose EPOS, a method for 6D object pose estimation from an RGB image. The key idea is to represent an object by compact surface fragments and predict the probability distribution of corresponding fragments at each pixel of the input image by a neural network. Each pixel is linked with a data-dependent number of fragments, which allows systematic handling of symmetries, and the 6D poses are estimated from the links by a RANSAC-based fitting method. EPOS outperformed all RGB and most RGB-D and D methods on several standard datasets. Second, we present HashMatch, an RGB-D method that slides a window over the input image and searches for a match against templates, which are pre-generated by rendering 3D object models in different orientations. The method applies a cascade of evaluation stages to each window location, which avoids exhaustive matching against all templates. Third, we propose ObjectSynth, an approach to synthesize photorealistic images of 3D object models for training methods based on neural networks. The images yield substantial improvements compared to commonly used images of objects rendered on top of random photographs. Fourth, we introduce T-LESS, the first dataset for 6D object pose estimation that includes 3D models and RGB-D images of industry-relevant objects. Fifth, we define BOP, a benchmark that captures the status quo in the field. BOP comprises eleven datasets in a unified format, an evaluation methodology, an online evaluation system, and public challenges held at international workshops organized at the ICCV and ECCV conferences.


Self-Supervised Learning Advances Medical Image Classification

#artificialintelligence

Posted by Shekoofeh Azizi, AI Resident, Google Research In recent years, there has been increasing interest in applying deep learning to ...


Artificial Intellgence -- Application in Life Sciences and Beyond. The Upper Rhine Artificial Intelligence Symposium UR-AI 2021

arXiv.org Artificial Intelligence

The TriRhenaTech alliance presents the accepted papers of the 'Upper-Rhine Artificial Intelligence Symposium' held on October 27th 2021 in Kaiserslautern, Germany. Topics of the conference are applications of Artificial Intellgence in life sciences, intelligent systems, industry 4.0, mobility and others. The TriRhenaTech alliance is a network of universities in the Upper-Rhine Trinational Metropolitan Region comprising of the German universities of applied sciences in Furtwangen, Kaiserslautern, Karlsruhe, Offenburg and Trier, the Baden-Wuerttemberg Cooperative State University Loerrach, the French university network Alsace Tech (comprised of 14 'grandes \'ecoles' in the fields of engineering, architecture and management) and the University of Applied Sciences and Arts Northwestern Switzerland. The alliance's common goal is to reinforce the transfer of knowledge, research, and technology, as well as the cross-border mobility of students.


Deep AUC Maximization for Medical Image Classification: Challenges and Opportunities

arXiv.org Artificial Intelligence

In this extended abstract, we will present and discuss opportunities and challenges brought about by a new deep learning method by AUC maximization (aka \underline{\bf D}eep \underline{\bf A}UC \underline{\bf M}aximization or {\bf DAM}) for medical image classification. Since AUC (aka area under ROC curve) is a standard performance measure for medical image classification, hence directly optimizing AUC could achieve a better performance for learning a deep neural network than minimizing a traditional loss function (e.g., cross-entropy loss). Recently, there emerges a trend of using deep AUC maximization for large-scale medical image classification. In this paper, we will discuss these recent results by highlighting (i) the advancements brought by stochastic non-convex optimization algorithms for DAM; (ii) the promising results on various medical image classification problems. Then, we will discuss challenges and opportunities of DAM for medical image classification from three perspectives, feature learning, large-scale optimization, and learning trustworthy AI models.


Workflow Augmentation of Video Data for Event Recognition with Time-Sensitive Neural Networks

arXiv.org Artificial Intelligence

Supervised training of neural networks requires large, diverse and well annotated data sets. In the medical field, this is often difficult to achieve due to constraints in time, expert knowledge and prevalence of an event. Artificial data augmentation can help to prevent overfitting and improve the detection of rare events as well as overall performance. However, most augmentation techniques use purely spatial transformations, which are not sufficient for video data with temporal correlations. In this paper, we present a novel methodology for workflow augmentation and demonstrate its benefit for event recognition in cataract surgery. The proposed approach increases the frequency of event alternation by creating artificial videos. The original video is split into event segments and a workflow graph is extracted from the original annotations. Finally, the segments are assembled into new videos based on the workflow graph. Compared to the original videos, the frequency of event alternation in the augmented cataract surgery videos increased by 26%. Further, a 3% higher classification accuracy and a 7.8% higher precision was achieved compared to a state-of-the-art approach. Our approach is particularly helpful to increase the occurrence of rare but important events and can be applied to a large variety of use cases.