Davis, Jesse
DMON: A Simple yet Effective Approach for Argument Structure Learning
Sun, Wei, Li, Mingxiao, Sun, Jingyuan, Davis, Jesse, Moens, Marie-Francine
Argument structure learning~(ASL) entails predicting relations between arguments. Because it can structure a document to facilitate its understanding, it has been widely applied in many fields~(medical, commercial, and scientific domains). Despite its broad utilization, ASL remains a challenging task because it involves examining the complex relationships between the sentences in a potentially unstructured discourse. To resolve this problem, we have developed a simple yet effective approach called Dual-tower Multi-scale cOnvolution neural Network~(DMON) for the ASL task. Specifically, we organize arguments into a relationship matrix that together with the argument embeddings forms a relationship tensor and design a mechanism to capture relations with contextual arguments. Experimental results on three different-domain argument mining datasets demonstrate that our framework outperforms state-of-the-art models. The code is available at https://github.com/VRCMF/DMON.git .
Faster Repeated Evasion Attacks in Tree Ensembles
Cascioli, Lorenzo, Devos, Laurens, Kuลพelka, Ondลej, Davis, Jesse
Tree ensembles are one of the most widely used model classes. However, these models are susceptible to adversarial examples, i.e., slightly perturbed examples that elicit a misprediction. There has been significant research on designing approaches to construct such examples for tree ensembles. But this is a computationally challenging problem that often must be solved a large number of times (e.g., for all examples in a training set). This is compounded by the fact that current approaches attempt to find such examples from scratch. In contrast, we exploit the fact that multiple similar problems are being solved. Specifically, our approach exploits the insight that adversarial examples for tree ensembles tend to perturb a consistent but relatively small set of features. We show that we can quickly identify this set of features and use this knowledge to speedup constructing adversarial examples.
Deep Neural Network Benchmarks for Selective Classification
Pugnana, Andrea, Perini, Lorenzo, Davis, Jesse, Ruggieri, Salvatore
With the increasing deployment of machine learning models in many socially-sensitive tasks, there is a growing demand for reliable and trustworthy predictions. One way to accomplish these requirements is to allow a model to abstain from making a prediction when there is a high risk of making an error. This requires adding a selection mechanism to the model, which selects those examples for which the model will provide a prediction. The selective classification framework aims to design a mechanism that balances the fraction of rejected predictions (i.e., the proportion of examples for which the model does not make a prediction) versus the improvement in predictive performance on the selected predictions. Multiple selective classification frameworks exist, most of which rely on deep neural network architectures. However, the empirical evaluation of the existing approaches is still limited to partial comparisons among methods and settings, providing practitioners with little insight into their relative merits. We fill this gap by benchmarking 18 baselines on a diverse set of 44 datasets that includes both image and tabular data. Moreover, there is a mix of binary and multiclass tasks. We evaluate these approaches using several criteria, including selective error rate, empirical coverage, distribution of rejected instance's classes, and performance on out-of-distribution instances. The results indicate that there is not a single clear winner among the surveyed baselines, and the best method depends on the users' objectives.
Biases in Expected Goals Models Confound Finishing Ability
Davis, Jesse, Robberechts, Pieter
Expected Goals (xG) has emerged as a popular tool for evaluating finishing skill in soccer analytics. It involves comparing a player's cumulative xG with their actual goal output, where consistent overperformance indicates strong finishing ability. However, the assessment of finishing skill in soccer using xG remains contentious due to players' difficulty in consistently outperforming their cumulative xG. In this paper, we aim to address the limitations and nuances surrounding the evaluation of finishing skill using xG statistics. Specifically, we explore three hypotheses: (1) the deviation between actual and expected goals is an inadequate metric due to the high variance of shot outcomes and limited sample sizes, (2) the inclusion of all shots in cumulative xG calculation may be inappropriate, and (3) xG models contain biases arising from interdependencies in the data that affect skill measurement. We found that sustained overperformance of cumulative xG requires both high shot volumes and exceptional finishing, including all shot types can obscure the finishing ability of proficient strikers, and that there is a persistent bias that makes the actual and expected goals closer for excellent finishers than it really is. Overall, our analysis indicates that we need more nuanced quantitative approaches for investigating a player's finishing ability, which we achieved using a technique from AI fairness to learn an xG model that is calibrated for multiple subgroups of players. As a concrete use case, we show that (1) the standard biased xG model underestimates Messi's GAX by 17% and (2) Messi's GAX is 27% higher than the typical elite high-shot-volume attacker, indicating that Messi is even a more exceptional finisher than people commonly believed.
Generating Explanations in Medical Question-Answering by Expectation Maximization Inference over Evidence
Sun, Wei, Li, Mingxiao, Sileo, Damien, Davis, Jesse, Moens, Marie-Francine
Medical Question Answering~(medical QA) systems play an essential role in assisting healthcare workers in finding answers to their questions. However, it is not sufficient to merely provide answers by medical QA systems because users might want explanations, that is, more analytic statements in natural language that describe the elements and context that support the answer. To do so, we propose a novel approach for generating natural language explanations for answers predicted by medical QA systems. As high-quality medical explanations require additional medical knowledge, so that our system extract knowledge from medical textbooks to enhance the quality of explanations during the explanation generation process. Concretely, we designed an expectation-maximization approach that makes inferences about the evidence found in these texts, offering an efficient way to focus attention on lengthy evidence passages. Experimental results, conducted on two datasets MQAE-diag and MQAE, demonstrate the effectiveness of our framework for reasoning with textual evidence. Our approach outperforms state-of-the-art models, achieving a significant improvement of \textbf{6.86} and \textbf{9.43} percentage points on the Rouge-1 score; \textbf{8.23} and \textbf{7.82} percentage points on the Bleu-4 score on the respective datasets.
A Markov Framework for Learning and Reasoning About Strategies in Professional Soccer
Van Roy, Maaike (a:1:{s:5:"en_US";s:9:"KU Leuven";}) | Robberechts, Pieter | Yang, Wen-Chi | De Raedt, Luc | Davis, Jesse
Strategy-optimization is a fundamental element of dynamic and complex team sports such as soccer, American football, and basketball. As the amount of data that is collected from matches in these sports has increased, so has the demand for data-driven decisionmaking support. If alternative strategies need to be balanced, a data-driven approach can uncover insights that are not available from qualitative analysis. This could tremendously aid teams in their match preparations. In this work, we propose a novel Markov modelbased framework for soccer that allows reasoning about the specific strategies teams use in order to gain insights into the efficiency of each strategy. The framework consists of two components: (1) a learning component, which entails modeling a teamโs offensive behavior by learning a Markov decision process (MDP) from event data that is collected from the teamโs matches, and (2) a reasoning component, which involves a novel application of probabilistic model checking to reason about the efficacy of the learned strategies of each team. In this paper, we provide an overview of this framework and illustrate it on several use cases using real-world event data from three leagues. Our results show that the framework can be used to reason about the shot decision-making of teams and to optimise the defensive strategies used when playing against a particular team. The general ideas presented in this framework can easily be extended to other sports.
How to Allocate your Label Budget? Choosing between Active Learning and Learning to Reject in Anomaly Detection
Perini, Lorenzo, Giannuzzi, Daniele, Davis, Jesse
Anomaly detection attempts at finding examples that deviate from the expected behaviour. Usually, anomaly detection is tackled from an unsupervised perspective because anomalous labels are rare and difficult to acquire. However, the lack of labels makes the anomaly detector have high uncertainty in some regions, which usually results in poor predictive performance or low user trust in the predictions. One can reduce such uncertainty by collecting specific labels using Active Learning (AL), which targets examples close to the detector's decision boundary. Alternatively, one can increase the user trust by allowing the detector to abstain from making highly uncertain predictions, which is called Learning to Reject (LR). One way to do this is by thresholding the detector's uncertainty based on where its performance is low, which requires labels to be evaluated. Although both AL and LR need labels, they work with different types of labels: AL seeks strategic labels, which are evidently biased, while LR requires i.i.d. labels to evaluate the detector's performance and set the rejection threshold. Because one usually has a unique label budget, deciding how to optimally allocate it is challenging. In this paper, we propose a mixed strategy that, given a budget of labels, decides in multiple rounds whether to use the budget to collect AL labels or LR labels. The strategy is based on a reward function that measures the expected gain when allocating the budget to either side. We evaluate our strategy on 18 benchmark datasets and compare it to some baselines.
Lifted Reasoning for Combinatorial Counting
Totis, Pietro (a:1:{s:5:"en_US";s:9:"KU Leuven";}) | Davis, Jesse (KU Leuven) | de Raedt, Luc | Kimmig, Angelika
Combinatorics math problems are often used as a benchmark to test human cognitive and logical problem-solving skills. These problems are concerned with counting the number of solutions that exist in a specific scenario that is sketched in natural language. Humans are adept at solving such problems as they can identify commonly occurring structures in the questions for which a closed-form formula exists for computing the answer. These formulas exploit the exchangeability of objects and symmetries to avoid a brute-force enumeration of all possible solutions. Unfortunately, current AI approaches are still unable to solve combinatorial problems in this way. This paper aims to fill this gap by developing novel AI techniques for representing and solving such problems. It makes the following five contributions. First, we identify a class of combinatorics math problems which traditional lifted counting techniques fail to model or solve efficiently. Second, we propose a novel declarative language for this class of problems. Third, we propose novel lifted solving algorithms bridging probabilistic inference techniques and constraint programming. Fourth, we implement them in a lifted solver that solves efficiently the class of problems under investigation. Finally, we evaluate our contributions on a real-world combinatorics math problems dataset and synthetic benchmarks.
Machine Learning with a Reject Option: A survey
Hendrickx, Kilian, Perini, Lorenzo, Van der Plas, Dries, Meert, Wannes, Davis, Jesse
Machine learning models always make a prediction, even when it is likely to be inaccurate. This behavior should be avoided in many decision support applications, where mistakes can have severe consequences. Albeit already studied in 1970, machine learning with a reject option recently gained interest. This machine learning subfield enables machine learning models to abstain from making a prediction when likely to make a mistake. This survey aims to provide an overview on machine learning with a reject option. We introduce the conditions leading to two types of rejection, ambiguity and novelty rejection. Moreover, we define the existing architectures for models with a reject option, describe the standard learning strategies to train such models and relate traditional machine learning techniques to rejection. Additionally, we review strategies to evaluate a model's predictive and rejective quality. Finally, we provide examples of relevant application domains and show how machine learning with rejection relates to other machine learning research areas.
Leaving Goals on the Pitch: Evaluating Decision Making in Soccer
Van Roy, Maaike, Robberechts, Pieter, Yang, Wen-Chi, De Raedt, Luc, Davis, Jesse
Analysis of the popular expected goals (xG) metric in soccer has determined that a (slightly) smaller number of high-quality attempts will likely yield more goals than a slew of low-quality ones. This observation has driven a change in shooting behavior. Teams are passing up on shots from outside the penalty box, in the hopes of generating a better shot closer to goal later on. This paper evaluates whether this decrease in long-distance shots is warranted. Therefore, we propose a novel generic framework to reason about decision-making in soccer by combining techniques from machine learning and artificial intelligence (AI). First, we model how a team has behaved offensively over the course of two seasons by learning a Markov Decision Process (MDP) from event stream data. Second, we use reasoning techniques arising from the AI literature on verification to each team's MDP. This allows us to reason about the efficacy of certain potential decisions by posing counterfactual questions to the MDP. Our key conclusion is that teams would score more goals if they shot more often from outside the penalty box in a small number of team-specific locations. The proposed framework can easily be extended and applied to analyze other aspects of the game.