Goto

Collaborating Authors

 holzinger


Network Module Detection from Multi-Modal Node Features with a Greedy Decision Forest for Actionable Explainable AI

Pfeifer, Bastian, Saranti, Anna, Holzinger, Andreas

arXiv.org Artificial Intelligence

Network-based algorithms are used in most domains of research and industry in a wide variety of applications and are of great practical use. In this work, we demonstrate subnetwork detection based on multi-modal node features using a new Greedy Decision Forest for better interpretability. The latter will be a crucial factor in retaining experts and gaining their trust in such algorithms in the future. To demonstrate a concrete application example, we focus in this paper on bioinformatics and systems biology with a special focus on biomedicine. However, our methodological approach is applicable in many other domains as well. Systems biology serves as a very good example of a field in which statistical data-driven machine learning enables the analysis of large amounts of multi-modal biomedical data. This is important to reach the future goal of precision medicine, where the complexity of patients is modeled on a system level to best tailor medical decisions, health practices and therapies to the individual patient. Our glass-box approach could help to uncover disease-causing network modules from multi-omics data to better understand diseases such as cancer.


KANDINSKYPatterns -- An experimental exploration environment for Pattern Analysis and Machine Intelligence

Holzinger, Andreas, Saranti, Anna, Mueller, Heimo

arXiv.org Artificial Intelligence

Machine intelligence is very successful at standard recognition tasks when having high-quality training data. There is still a significant gap between machine-level pattern recognition and human-level concept learning. Humans can learn under uncertainty from only a few examples and generalize these concepts to solve new problems. The growing interest in explainable machine intelligence, requires experimental environments and diagnostic tests to analyze weaknesses in existing approaches to drive progress in the field. In this paper, we discuss existing diagnostic tests and test data sets such as CLEVR, CLEVERER, CLOSURE, CURI, Bongard-LOGO, V-PROM, and present our own experimental environment: The KANDINSKYPatterns, named after the Russian artist Wassily Kandinksy, who made theoretical contributions to compositivity, i.e. that all perceptions consist of geometrically elementary individual components. This was experimentally proven by Hubel &Wiesel in the 1960s and became the basis for machine learning approaches such as the Neocognitron and the even later Deep Learning. While KANDINSKYPatterns have computationally controllable properties on the one hand, bringing ground truth, they are also easily distinguishable by human observers, i.e., controlled patterns can be described by both humans and algorithms, making them another important contribution to international research in machine intelligence.


Kandinsky Patterns

Mueller, Heimo, Holzinger, Andreas

arXiv.org Artificial Intelligence

Kandinsky Figures and Kandinsky Patterns are mathematically describable, simple self-contained hence controllable test data sets for the development, validation and training of explainability in artificial intelligence. Whilst Kandinsky Patterns have these computationally manageable properties, they are at the same time easily distinguishable from human observers. Consequently, controlled patterns can be described by both humans and computers. We define a Kandinsky Pattern as a set of Kandinsky Figures, where for each figure an "infallible authority" defines that the figure belongs to the Kandinsky Pattern. With this simple principle we build training and validation data sets for automatic interpretability and context learning. In this paper we describe the basic idea and some underlying principles of Kandinsky Patterns and provide a Github repository to invite the international machine learning research community to a challenge to experiment with our Kandinsky Patterns to expand and thus make progress in the field of explainable AI and to contribute to the upcoming field of explainability and causability.


Finding a Healthier Approach to Managing Medical Data

Communications of the ACM

One of the formidable challenges healthcare providers face is putting medical data to maximum use. Somewhere between the quest to unlock the mysteries of medicine and design better treatments, therapies, and procedures, lies the real world of applying data and protecting patient privacy. "Today, there are many barriers to putting data to work in the most effective way possible," observes Drew Harris, director of health policy and population health at Thomas Jefferson University's College of Population Health in Philadelphia, PA. "The goals of protecting patients and finding answers are frequently at odds." It is a critical issue and one that will define the future of medicine. Medical advances are increasingly dependent on the analysis of enormous datasets--as well as data that extends beyond any one agency or enterprise.


The Need for Speed of AI Applications: Performance Comparison of Native vs. Browser-based Algorithm Implementations

Malle, Bernd, Giuliani, Nicola, Kieseberg, Peter, Holzinger, Andreas

arXiv.org Machine Learning

AI applications pose increasing demands on performance, so it is not surprising that the era of client-side distributed software is becoming important. On top of many AI applications already using mobile hardware, and even browsers for computationally demanding AI applications, we are already witnessing the emergence of client-side (federated) machine learning algorithms, driven by the interests of large corporations and startups alike. Apart from mathematical and algorithmic concerns, this trend especially demands new levels of computational efficiency from client environments. Consequently, this paper deals with the question of state-of-the-art performance by presenting a comparison study between native code and different browser-based implementations: JavaScript, ASM.js as well as WebAssembly on a representative mix of algorithms. Our results show that current efforts in runtime optimization push the boundaries well towards (and even beyond) native binary performance. We analyze the results obtained and speculate on the reasons behind some surprises, rounding the paper off by outlining future possibilities as well as some of our own research efforts.


What do we need to build explainable AI systems for the medical domain?

Holzinger, Andreas, Biemann, Chris, Pattichis, Constantinos S., Kell, Douglas B.

arXiv.org Machine Learning

Artificial intelligence (AI) generally and machine learning (ML) specifically demonstrate impressive practical success in many different application domains, e.g. in autonomous driving, speech recognition, or recommender systems. Deep learning approaches, trained on extremely large data sets or using reinforcement learning methods have even exceeded human performance in visual tasks, particularly on playing games such as Atari, or mastering the game of Go. Even in the medical domain there are remarkable results. The central problem of such models is that they are regarded as black-box models and even if we understand the underlying mathematical principles, they lack an explicit declarative knowledge representation, hence have difficulty in generating the underlying explanatory structures. This calls for systems enabling to make decisions transparent, understandable and explainable. A huge motivation for our approach are rising legal and privacy aspects. The new European General Data Protection Regulation entering into force on May 25th 2018, will make black-box approaches difficult to use in business. This does not imply a ban on automatic learning approaches or an obligation to explain everything all the time, however, there must be a possibility to make the results re-traceable on demand. In this paper we outline some of our research topics in the context of the relatively new area of explainable-AI with a focus on the application in medicine, which is a very special domain. This is due to the fact that medical professionals are working mostly with distributed heterogeneous and complex sources of data. In this paper we concentrate on three sources: images, *omics data and text. We argue that research in explainable-AI would generally help to facilitate the implementation of AI/ML in the medical domain, and specifically help to facilitate transparency and trust.


Towards the Augmented Pathologist: Challenges of Explainable-AI in Digital Pathology

Holzinger, Andreas, Malle, Bernd, Kieseberg, Peter, Roth, Peter M., Müller, Heimo, Reihs, Robert, Zatloukal, Kurt

arXiv.org Machine Learning

Digital pathology is not only one of the most promising fields of diagnostic medicine, but at the same time a hot topic for fundamental research. Digital pathology is not just the transfer of histopathological slides into digital representations. The combination of different data sources (images, patient records, and *omics data) together with current advances in artificial intelligence/machine learning enable to make novel information accessible and quantifiable to a human expert, which is not yet available and not exploited in current medical settings. The grand goal is to reach a level of usable intelligence to understand the data in the context of an application task, thereby making machine decisions transparent, interpretable and explainable. The foundation of such an "augmented pathologist" needs an integrated approach: While machine learning algorithms require many thousands of training examples, a human expert is often confronted with only a few data points. Interestingly, humans can learn from such few examples and are able to instantly interpret complex patterns. Consequently, the grand goal is to combine the possibilities of artificial intelligence with human intelligence and to find a well-suited balance between them to enable what neither of them could do on their own. This can raise the quality of education, diagnosis, prognosis and prediction of cancer and other diseases. In this paper we describe some (incomplete) research issues which we believe should be addressed in an integrated and concerted effort for paving the way towards the augmented pathologist.


A glass-box interactive machine learning approach for solving NP-hard problems with the human-in-the-loop

Holzinger, Andreas, Plass, Markus, Holzinger, Katharina, Crisan, Gloria Cerasela, Pintea, Camelia-M., Palade, Vasile

arXiv.org Machine Learning

The goal of Machine Learning to automatically learn from data, extract knowledge and to make decisions without any human intervention. Such automatic (aML) approaches show impressive success. Recent results even demonstrate intriguingly that deep learning applied for automatic classification of skin lesions is on par with the performance of dermatologists, yet outperforms the average. As human perception is inherently limited, such approaches can discover patterns, e.g. that two objects are similar, in arbitrarily high-dimensional spaces what no human is able to do. Humans can deal only with limited amounts of data, whilst big data is beneficial for aML; however, in health informatics, we are often confronted with a small number of data sets, where aML suffer of insufficient training samples and many problems are computationally hard. Here, interactive machine learning (iML) may be of help, where a human-in-the-loop contributes to reduce the complexity of NP-hard problems. A further motivation for iML is that standard black-box approaches lack transparency, hence do not foster trust and acceptance of ML among end-users. Rising legal and privacy aspects, e.g. with the new European General Data Protection Regulations, make black-box approaches difficult to use, because they often are not able to explain why a decision has been made. In this paper, we present some experiments to demonstrate the effectiveness of the human-in-the-loop approach, particularly in opening the black-box to a glass-box and thus enabling a human directly to interact with an learning algorithm. We selected the Ant Colony Optimization framework, and applied it on the Traveling Salesman Problem, which is a good example, due to its relevance for health informatics, e.g. for the study of protein folding. From studies of how humans extract so much from so little data, fundamental ML-research also may benefit.