Goto

Collaborating Authors

 Country


Proxy Tasks and Subjective Measures Can Be Misleading in Evaluating Explainable AI Systems

arXiv.org Artificial Intelligence

Explainable artificially intelligent (XAI) systems form part of sociotechnical systems, e.g., human+AI teams tasked with making decisions. Yet, current XAI systems are rarely evaluated by measuring the performance of human+AI teams on actual decision-making tasks. We conducted two online experiments and one in-person think-aloud study to evaluate two currently common techniques for evaluating XAI systems: (1) using proxy, artificial tasks such as how well humans predict the AI's decision from the given explanations, and (2) using subjective measures of trust and preference as predictors of actual performance. The results of our experiments demonstrate that evaluations with proxy tasks did not predict the results of the evaluations with the actual decision-making tasks. Further, the subjective measures on evaluations with actual decision-making tasks did not predict the objective performance on those same tasks. Our results suggest that by employing misleading evaluation methods, our field may be inadvertently slowing its progress toward developing human+AI teams that can reliably perform better than humans or AIs alone.


Search-Based Software Engineering for Self-Adaptive Systems: One Survey, Five Disappointments and Six Opportunities

arXiv.org Artificial Intelligence

Search-Based Software Engineering (SBSE) is a promising paradigm that exploits computational search to optimize different processes when engineering complex software systems. Self-adaptive system (SAS) is one category of such complex systems that permits to optimize different functional and non-functional objectives/criteria under changing environment (e.g., requirements and workload), which involves problems that are subject to search. In this regard, over years, there have been a considerable amount of work that investigates SBSE for SASs. In this paper, we provide the first systematic and comprehensive survey exclusively on SBSE for SASs, covering 3,740 papers in 27 venues from 7 repositories, which eventually leads to several key statistics from the most notable 73 primary studies in this particular field of research. Our results, surprisingly, have revealed five disappointed issues that are of utmost importance, but have been overwhelmingly ignored in existing studies. We provide evidences to justify our arguments against the disappointments and highlight six emergent, but currently under-explored opportunities for future work on SBSE for SASs. By mitigating the disappointed issues revealed in this work, together with the highlighted opportunities, we hope to be able to excite a much more significant growth on this particular research direction.


DeepEnroll: Patient-Trial Matching with Deep Embedding and Entailment Prediction

arXiv.org Artificial Intelligence

Clinical trials are essential for drug development but often suffer from expensive, inaccurate and insufficient patient recruitment. The core problem of patient-trial matching is to find qualified patients for a trial, where patient information is stored in electronic health records (EHR) while trial eligibility criteria (EC) are described in text documents available on the web. How to represent longitudinal patient EHR? How to extract complex logical rules from EC? Most existing works rely on manual rule-based extraction, which is time consuming and inflexible for complex inference. To address these challenges, we proposed DeepEnroll, a cross-modal inference learning model to jointly encode enrollment criteria (text) and patients records (tabular data) into a shared latent space for matching inference. DeepEnroll applies a pre-trained Bidirectional Encoder Representations from Transformers(BERT) model to encode clinical trial information into sentence embedding. And uses a hierarchical embedding model to represent patient longitudinal EHR. In addition, DeepEnroll is augmented by a numerical information embedding and entailment module to reason over numerical information in both EC and EHR. These encoders are trained jointly to optimize patient-trial matching score. We evaluated DeepEnroll on the trial-patient matching task with demonstrated on real world datasets. DeepEnroll outperformed the best baseline by up to 12.4% in average F1.


Causality based Feature Fusion for Brain Neuro-Developmental Analysis

arXiv.org Artificial Intelligence

REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE - CLICK HERE TO EDIT) 1 Abstract -- Human brain development is a complex and dynamic process that is affected by several factors such as genetic s, sex hormones, and environmental changes . A number of recent studies on brain development have examined functional connectivity (FC) defi ned by the temporal correlation between time series of different brain regions. We propose to add the directional flow of information during brain maturation . To do so, w e extract effective connectivity (EC) through Granger causality (GC) for two different groups of subjects, i.e., children and young adults. The motivation is that the inclusion of causal interaction may further discriminate brain connections between two age groups and help to discover new conn ections between brain regions. The contributions of this study are three fold. First, t here has been a lack of attention to EC - based feature extraction in the context of brain development . T o this end, we propose a new kernel - based GC ( K GC) method to learn nonlinearity of complex brain network, where a reduced Sine hyperbolic polynomial ( RSP) neural network wa s used as our proposed learner . S econd, we use d causality values as the weight for the directional connectivity between brain regions . Our f indings indicate d that the strength of connections was significantly higher in young adult s relative to children. In addition, our new EC - based feature outperform ed FC - based analysis from Philadelphia neurocohort (PNC) study wi th better discrimination of the different age groups . Moreover, the fusion of these two sets of features (FC EC) improve d brain age prediction accuracy by more than 4 %, indicating that they should be used together for brain development stud ies . I NTRODUCTION uman brain development is a prolonged process that is initiated from the third gestational week (GW) to late adolescence, and presumably to the entire lifespan [ 1 ].


Benchmarking Symbolic Execution Using Constraint Problems -- Initial Results

arXiv.org Artificial Intelligence

Symbolic execution is a powerful technique for bug finding and program testing. It is successful in finding bugs in real-world code. The core reasoning techniques use constraint solving, path exploration, and search, which are also the same techniques used in solving combinatorial problems, e.g., finite-domain constraint satisfaction problems (CSPs). We propose CSP instances as more challenging benchmarks to evaluate the effectiveness of the core techniques in symbolic execution. We transform CSP benchmarks into C programs suitable for testing the reasoning capabilities of symbolic execution tools. From a single CSP P, we transform P depending on transformation choice into different C programs. Preliminary testing with the KLEE, Tracer-X, and LLBMC tools show substantial runtime differences from transformation and solver choice. Our C benchmarks are effective in showing the limitations of existing symbolic execution tools. The motivation for this work is we believe that benchmarks of this form can spur the development and engineering of improved core reasoning in symbolic execution engines.


StarAI: Reducing incompleteness in the game of Bridge using PLP

arXiv.org Artificial Intelligence

Bridge is a trick-taking card game requiring the ability to evaluate probabilities since it is a game of incomplete information where each player only sees its cards. In order to choose a strategy, a player needs to gather information about the hidden cards in the other players' hand. We present a methodology allowing us to model a part of card playing in Bridge using Probabilistic Logic Programming.


Q-Learning in enormous action spaces via amortized approximate maximization

arXiv.org Artificial Intelligence

Applying Q-learning to high-dimensional or continuous action spaces can be difficult due to the required maximization over the set of possible actions. Motivated by techniques from amortized inference, we replace the expensive maximization over all actions with a maximization over a small subset of possible actions sampled from a learned proposal distribution. The resulting approach, which we dub Amortized Q-learning (AQL), is able to handle discrete, continuous, or hybrid action spaces while maintaining the benefits of Q-learning. Our experiments on continuous control tasks with up to 21 dimensional actions show that AQL outperforms D3PG (Barth-Maron et al, 2018) and QT-Opt (Kalashnikov et al, 2018). Experiments on structured discrete action spaces demonstrate that AQL can efficiently learn good policies in spaces with thousands of discrete actions.


ManyModalQA: Modality Disambiguation and QA over Diverse Inputs

arXiv.org Artificial Intelligence

We present a new multimodal question answering challenge, ManyModalQA, in which an agent must answer a question by considering three distinct modalities: text, images, and tables. We collect our data by scraping Wikipedia and then utilize crowdsourcing to collect question-answer pairs. Our questions are ambiguous, in that the modality that contains the answer is not easily determined based solely upon the question. To demonstrate this ambiguity, we construct a modality selector (or disambiguator) network, and this model gets substantially lower accuracy on our challenge set, compared to existing datasets, indicating that our questions are more ambiguous. By analyzing this model, we investigate which words in the question are indicative of the modality. Next, we construct a simple baseline ManyModalQA model, which, based on the prediction from the modality selector, fires a corresponding pre-trained state-of-the-art unimodal QA model. We focus on providing the community with a new manymodal evaluation set and only provide a fine-tuning set, with the expectation that existing datasets and approaches will be transferred for most of the training, to encourage low-resource generalization without large, monolithic training sets for each new task. There is a significant gap between our baseline models and human performance; therefore, we hope that this challenge encourages research in end-to-end modality disambiguation and multimodal QA models, as well as transfer learning. Code and data available at: https://github.com/hannandarryl/ManyModalQA


A Neural Architecture for Person Ontology population

arXiv.org Artificial Intelligence

A person ontology comprising concepts, attributes and relationships of people has a number of applications in data protection, de-identification, population of knowledge graphs for business intelligence and fraud prevention. While artificial neural networks have led to improvements in Entity Recognition, Entity Classification, and Relation Extraction, creating an ontology largely remains a manual process, because it requires a fixed set of semantic relations between concepts. In this work, we present a system for automatically populating a person ontology graph from unstructured data using neural models for Entity Classification and Relation Extraction. We introduce a new dataset for these tasks and discuss our results. Introduction We can define Personal Data Entity (PDE) as any information about a person.


FULLY BOOKED: "A world without work: technology, automation and how…

#artificialintelligence

New technologies have always provoked panic about workers being replaced by machines. In the past, such fears have been misplaced, and many economists maintain that they remain so today. Yet in A World Without Work, Daniel Susskind shows why this time really is different. Advances in artificial intelligence mean that all kinds of jobs are increasingly at risk. Susskind will argue that machines no longer need to reason like us in order to outperform us.