Goto

Collaborating Authors

 Expert Systems


Approximating Score-based Explanation Techniques Using Conformal Regression

arXiv.org Artificial Intelligence

Score-based explainable machine-learning techniques are often used to understand the logic behind black-box models. However, such explanation techniques are often computationally expensive, which limits their application in time-critical contexts. Therefore, we propose and investigate the use of computationally less costly regression models for approximating the output of score-based explanation techniques, such as SHAP. Moreover, validity guarantees for the approximated values are provided by the employed inductive conformal prediction framework. We propose several non-conformity measures designed to take the difficulty of approximating the explanations into account while keeping the computational cost low. We present results from a large-scale empirical investigation, in which the approximate explanations generated by our proposed models are evaluated with respect to efficiency (interval size). The results indicate that the proposed method can significantly improve execution time compared to the fast version of SHAP, TreeSHAP. The results also suggest that the proposed method can produce tight intervals, while providing validity guarantees. Moreover, the proposed approach allows for comparing explanations of different approximation methods and selecting a method based on how informative (tight) are the predicted intervals.


BELB: a Biomedical Entity Linking Benchmark

arXiv.org Artificial Intelligence

Biomedical entity linking (BEL) is the task of grounding entity mentions to a knowledge base. It plays a vital role in information extraction pipelines for the life sciences literature. We review recent work in the field and find that, as the task is absent from existing benchmarks for biomedical text mining, different studies adopt different experimental setups making comparisons based on published numbers problematic. Furthermore, neural systems are tested primarily on instances linked to the broad coverage knowledge base UMLS, leaving their performance to more specialized ones, e.g. genes or variants, understudied. We therefore developed BELB, a Biomedical Entity Linking Benchmark, providing access in a unified format to 11 corpora linked to 7 knowledge bases and spanning six entity types: gene, disease, chemical, species, cell line and variant. BELB greatly reduces preprocessing overhead in testing BEL systems on multiple corpora offering a standardized testbed for reproducible experiments. Using BELB we perform an extensive evaluation of six rule-based entity-specific systems and three recent neural approaches leveraging pre-trained language models. Our results reveal a mixed picture showing that neural approaches fail to perform consistently across entity types, highlighting the need of further studies towards entity-agnostic models.


Multi-Source Domain Adaptation for Cross-Domain Fault Diagnosis of Chemical Processes

arXiv.org Artificial Intelligence

Fault diagnosis is an essential component in process supervision. Indeed, it determines which kind of fault has occurred, given that it has been previously detected, allowing for appropriate intervention. Automatic fault diagnosis systems use machine learning for predicting the fault type from sensor readings. Nonetheless, these models are sensible to changes in the data distributions, which may be caused by changes in the monitored process, such as changes in the mode of operation. This scenario is known as Cross-Domain Fault Diagnosis (CDFD). We provide an extensive comparison of single and multi-source unsupervised domain adaptation (SSDA and MSDA respectively) algorithms for CDFD. We study these methods in the context of the Tennessee-Eastmann Process, a widely used benchmark in the chemical industry. We show that using multiple domains during training has a positive effect, even when no adaptation is employed. As such, the MSDA baseline improves over the SSDA baseline classification accuracy by 23% on average. In addition, under the multiple-sources scenario, we improve classification accuracy of the no adaptation setting by 8.4% on average.


Deciphering Raw Data in Neuro-Symbolic Learning with Provable Guarantees

arXiv.org Artificial Intelligence

Neuro-symbolic hybrid systems are promising for integrating machine learning and symbolic reasoning, where perception models are facilitated with information inferred from a symbolic knowledge base through logical reasoning. Despite empirical evidence showing the ability of hybrid systems to learn accurate perception models, the theoretical understanding of learnability is still lacking. Hence, it remains unclear why a hybrid system succeeds for a specific task and when it may fail given a different knowledge base. In this paper, we introduce a novel way of characterising supervision signals from a knowledge base, and establish a criterion for determining the knowledge's efficacy in facilitating successful learning. This, for the first time, allows us to address the two questions above by inspecting the knowledge base under investigation. Our analysis suggests that many knowledge bases satisfy the criterion, thus enabling effective learning, while some fail to satisfy it, indicating potential failures. Comprehensive experiments confirm the utility of our criterion on benchmark tasks.


NeSyFOLD: Neurosymbolic Framework for Interpretable Image Classification

arXiv.org Artificial Intelligence

Deep learning models such as CNNs have surpassed human performance in computer vision tasks such as image classification. However, despite their sophistication, these models lack interpretability which can lead to biased outcomes reflecting existing prejudices in the data. We aim to make predictions made by a CNN interpretable. Hence, we present a novel framework called NeSyFOLD to create a neurosymbolic (NeSy) model for image classification tasks. The model is a CNN with all layers following the last convolutional layer replaced by a stratified answer set program (ASP). A rule-based machine learning algorithm called FOLD-SE-M is used to derive the stratified answer set program from binarized filter activations of the last convolutional layer. The answer set program can be viewed as a rule-set, wherein the truth value of each predicate depends on the activation of the corresponding kernel in the CNN. The rule-set serves as a global explanation for the model and is interpretable. A justification for the predictions made by the NeSy model can be obtained using an ASP interpreter. We also use our NeSyFOLD framework with a CNN that is trained using a sparse kernel learning technique called Elite BackProp (EBP). This leads to a significant reduction in rule-set size without compromising accuracy or fidelity thus improving scalability of the NeSy model and interpretability of its rule-set. Evaluation is done on datasets with varied complexity and sizes. To make the rule-set more intuitive to understand, we propose a novel algorithm for labelling each kernel's corresponding predicate in the rule-set with the semantic concept(s) it learns. We evaluate the performance of our "semantic labelling algorithm" to quantify the efficacy of the semantic labelling for both the NeSy model and the NeSy-EBP model.


SDF's harassment consultation system seeing limited use

The Japan Times

Over 60% of the Self-Defense Forces personnel who claimed to have been harassed did not use a consultation system set up to aid in such cases, a survey by the Defense Ministry showed Friday. The survey found that many SDF personnel are distrustful of the consultation system. According to the survey, 1,325 cases of harassment have been reported. Power harassment accounted for 77% of the total and sexual harassment for 12%. Of the total, 850 cases, or 64.2%, did not use the harassment consultation system, according to the survey.


Predictive Authoring for Brazilian Portuguese Augmentative and Alternative Communication

arXiv.org Artificial Intelligence

Individuals with complex communication needs (CCN) often rely on augmentative and alternative communication (AAC) systems to have conversations and communique their wants. Such systems allow message authoring by arranging pictograms in sequence. However, the difficulty of finding the desired item to complete a sentence can increase as the user's vocabulary increases. This paper proposes using BERTimbau, a Brazilian Portuguese version of BERT, for pictogram prediction in AAC systems. To finetune BERTimbau, we constructed an AAC corpus for Brazilian Portuguese to use as a training corpus. We tested different approaches to representing a pictogram for prediction: as a word (using pictogram captions), as a concept (using a dictionary definition), and as a set of synonyms (using related terms). We also evaluated the usage of images for pictogram prediction. The results demonstrate that using embeddings computed from the pictograms' caption, synonyms, or definitions have a similar performance. Using synonyms leads to lower perplexity, but using captions leads to the highest accuracies. This paper provides insight into how to represent a pictogram for prediction using a BERT-like model and the potential of using images for pictogram prediction.


Explainable AI for clinical risk prediction: a survey of concepts, methods, and modalities

arXiv.org Artificial Intelligence

Recent advancements in AI applications to healthcare have shown incredible promise in surpassing human performance in diagnosis and disease prognosis. With the increasing complexity of AI models, however, concerns regarding their opacity, potential biases, and the need for interpretability. To ensure trust and reliability in AI systems, especially in clinical risk prediction models, explainability becomes crucial. Explainability is usually referred to as an AI system's ability to provide a robust interpretation of its decision-making logic or the decisions themselves to human stakeholders. In clinical risk prediction, other aspects of explainability like fairness, bias, trust, and transparency also represent important concepts beyond just interpretability. In this review, we address the relationship between these concepts as they are often used together or interchangeably. This review also discusses recent progress in developing explainable models for clinical risk prediction, highlighting the importance of quantitative and clinical evaluation and validation across multiple common modalities in clinical practice. It emphasizes the need for external validation and the combination of diverse interpretability methods to enhance trust and fairness. Adopting rigorous testing, such as using synthetic datasets with known generative factors, can further improve the reliability of explainability methods. Open access and code-sharing resources are essential for transparency and reproducibility, enabling the growth and trustworthiness of explainable research. While challenges exist, an end-to-end approach to explainability in clinical risk prediction, incorporating stakeholders from clinicians to developers, is essential for success.


MDDial: A Multi-turn Differential Diagnosis Dialogue Dataset with Reliability Evaluation

arXiv.org Artificial Intelligence

Dialogue systems for Automatic Differential Diagnosis (ADD) have a wide range of real-life applications. These dialogue systems are promising for providing easy access and reducing medical costs. Building end-to-end ADD dialogue systems requires dialogue training datasets. However, to the best of our knowledge, there is no publicly available ADD dialogue dataset in English (although non-English datasets exist). Driven by this, we introduce MDDial, the first differential diagnosis dialogue dataset in English which can aid to build and evaluate end-to-end ADD dialogue systems. Additionally, earlier studies present the accuracy of diagnosis and symptoms either individually or as a combined weighted score. This method overlooks the connection between the symptoms and the diagnosis. We introduce a unified score for the ADD system that takes into account the interplay between symptoms and diagnosis. This score also indicates the system's reliability. To the end, we train two moderate-size of language models on MDDial. Our experiments suggest that while these language models can perform well on many natural language understanding tasks, including dialogue tasks in the general domain, they struggle to relate relevant symptoms and disease and thus have poor performance on MDDial. MDDial will be released publicly to aid the study of ADD dialogue research.


Decidability of Querying First-Order Theories via Countermodels of Finite Width

arXiv.org Artificial Intelligence

We propose a generic framework for establishing the decidability of a wide range of logical entailment problems (briefly called querying), based on the existence of countermodels that are structurally simple, gauged by certain types of width measures (with treewidth and cliquewidth as popular examples). As an important special case of our framework, we identify logics exhibiting width-finite finitely universal model sets, warranting decidable entailment for a wide range of homomorphism-closed queries, subsuming a diverse set of practically relevant query languages. As a particularly powerful width measure, we propose Blumensath's partitionwidth, which subsumes various other commonly considered width measures and exhibits highly favorable computational and structural properties. Focusing on the formalism of existential rules as a popular showcase, we explain how finite-partitionwidth sets of rules subsume other known abstract decidable classes but - leveraging existing notions of stratification - also cover a wide range of new rulesets. We expose natural limitations for fitting the class of finite-unification sets into our picture and provide several options for remedy.