AITopics | Flach, Peter

Collaborating Authors

Flach, Peter

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Beyond temperature scaling: Obtaining well-calibrated multiclass probabilities with Dirichlet calibration

Kull, Meelis, Perello-Nieto, Miquel, Kängsepp, Markus, Filho, Telmo Silva, Song, Hao, Flach, Peter

arXiv.org Machine LearningOct-28-2019

Class probabilities predicted by most multiclass classifiers are uncalibrated, often tending towards over-confidence. With neural networks, calibration can be improved by temperature scaling, a method to learn a single corrective multiplicative factor for inputs to the last softmax layer. On non-neural models the existing methods apply binary calibration in a pairwise or one-vs-rest fashion. We propose a natively multiclass calibration method applicable to classifiers from any model class, derived from Dirichlet distributions and generalising the beta calibration method from binary classification. It is easily implemented with neural nets since it is equivalent to log-transforming the uncalibrated probabilities, followed by one linear layer and softmax. Experiments demonstrate improved probabilistic predictions according to multiple measures (confidence-ECE, classwise-ECE, log-loss, Brier score) across a wide range of datasets and classifiers. Parameters of the learned Dirichlet calibration map provide insights to the biases in the uncalibrated model.

artificial intelligence, neural network, well-calibrated multiclass probability, (2 more...)

arXiv.org Machine Learning

1910.12656

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.53)

Add feedback

FACE: Feasible and Actionable Counterfactual Explanations

Poyiadzi, Rafael, Sokol, Kacper, Santos-Rodriguez, Raul, De Bie, Tijl, Flach, Peter

arXiv.org Machine LearningSep-20-2019

Work in Counterfactual Explanations tends to focus on the principle of ``the closest possible world'' that identifies small changes leading to the desired outcome. In this paper we argue that while this approach might initially seem intuitively appealing it exhibits shortcomings not addressed in the current literature. First, a counterfactual example generated by the state-of-the-art systems is not necessarily representative of the underlying data distribution, and may therefore prescribe unachievable goals(e.g., an unsuccessful life insurance applicant with severe disability may be advised to do more sports). Secondly, the counterfactuals may not be based on a ``feasible path'' between the current state of the subject and the suggested one, making actionable recourse infeasible (e.g., low-skilled unsuccessful mortgage applicants may be told to double their salary, which may be hard without first increasing their skill level). These two shortcomings may render counterfactual explanations impractical and sometimes outright offensive. To address these two major flaws, first of all, we propose a new line of Counterfactual Explanations research aimed at providing actionable and feasible paths to transform a selected instance into one that meets a certain goal. Secondly, we propose FACE: an algorithmically sound way of uncovering these ``feasible paths'' based on the shortest path distances defined via density-weighted metrics. Our approach generates counterfactuals that are coherent with the underlying data distribution and supported by the ``feasible paths'' of change, which are achievable and can be tailored to the problem at hand.

artificial intelligence, banking & finance, counterfactual, (19 more...)

arXiv.org Machine Learning

1909.09369

Genre: Research Report (0.50)

Industry: Banking & Finance (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)

Add feedback

FAT Forensics: A Python Toolbox for Algorithmic Fairness, Accountability and Transparency

Sokol, Kacper, Santos-Rodriguez, Raul, Flach, Peter

arXiv.org Artificial IntelligenceSep-11-2019

Machine learning algorithms can take important decisions, sometimes legally binding, about our everyday life. In most cases, however, these systems and decisions are neither regulated nor certified. Given the potential harm that these algorithms can cause, qualities such as fairness, accountability and transparency of predictive systems are of paramount importance. Recent literature suggested voluntary self-reporting on these aspects of predictive systems -- e.g., data sheets for data sets -- but their scope is often limited to a single component of a machine learning pipeline, and producing them requires manual labour. To resolve this impasse and ensure high-quality, fair, transparent and reliable machine learning systems, we developed an open source toolbox that can inspect selected fairness, accountability and transparency aspects of these systems to automatically and objectively report them back to their engineers and users. We describe design, scope and usage examples of this Python toolbox in this paper. The toolbox provides functionality for inspecting fairness, accountability and transparency of all aspects of the machine learning process: data (and their features), models and predictions. It is available to the public under the BSD 3-Clause open source licence.

artificial intelligence, fairness, upstream oil & gas, (18 more...)

arXiv.org Artificial Intelligence

1909.05167

Country:

Europe > United Kingdom (0.46)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (0.93)
Energy > Oil & Gas > Upstream (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

HyperStream: a Workflow Engine for Streaming Data

Diethe, Tom, Kull, Meelis, Twomey, Niall, Sokol, Kacper, Song, Hao, Perello-Nieto, Miquel, Tonkin, Emma, Flach, Peter

arXiv.org Machine LearningAug-7-2019

Journal of Machine Learning Research 1 (2019) 1-48 Submitted 8/19; Published 10/00 HyperStream: a Workflow Engine for Streaming Data Tom Diethe tdiethe@amazon.com Intelligent Systems Laboratory, University of Bristol, BS8 1UB, UK Editor: A. N. Other Abstract This paper describes HyperStream, a large-scale, flexible and robust software package, written in the Python language, for processing streaming data with workflow creation capabilities. HyperStream overcomes the limitations of other computational engines and provides high-level interfaces to execute complex nesting, fusion, and prediction both in online and offline forms in streaming environments. HyperStream is a general purpose tool that is well-suited for the design, development, and deployment of Machine Learning algorithms and predictive models in a wide space of sequential predictive problems. Introduction Scientific workflow systems are designed to compose and execute a series of computational or data manipulation operations (workflow) (Deelman et al., 2009).

artificial intelligence, data mining, hyperstream, (16 more...)

arXiv.org Machine Learning

1908.02858

Country:

Europe > United Kingdom (0.37)
North America > United States (0.29)

Genre: Workflow (1.00)

Industry: Health & Medicine (0.48)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Distribution Calibration for Regression

Song, Hao, Diethe, Tom, Kull, Meelis, Flach, Peter

arXiv.org Machine LearningMay-15-2019

We are concerned with obtaining well-calibrated output distributions from regression models. Such distributions allow us to quantify the uncertainty that the model has regarding the predicted target value. We introduce the novel concept of distribution calibration, and demonstrate its advantages over the existing definition of quantile calibration. We further propose a post-hoc approach to improving the predictions from previously trained regression models, using multi-output Gaussian Processes with a novel Beta link function. The proposed method is experimentally verified on a set of common regression models and shows improvements for both distribution-level and quantile-level calibration.

artificial intelligence, calibration, machine learning, (18 more...)

arXiv.org Machine Learning

1905.06023

Country:

North America > United States (0.67)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

$\beta^3$-IRT: A New Item Response Model and its Applications

Chen, Yu, Filho, Telmo Silva, Prudêncio, Ricardo B. C., Diethe, Tom, Flach, Peter

arXiv.org Machine LearningMar-13-2019

Item Response Theory (IRT) aims to assess latent abilities of respondents based on the correctness of their answers in aptitude test items with different difficulty levels. In this paper, we propose the $\beta^3$-IRT model, which models continuous responses and can generate a much enriched family of Item Characteristic Curve (ICC). In experiments we applied the proposed model to data from an online exam platform, and show our model outperforms a more standard 2PL-ND model on all datasets. Furthermore, we show how to apply $\beta^3$-IRT to assess the ability of machine learning classifiers. This novel application results in a new metric for evaluating the quality of the classifier's probability estimates, based on the inferred difficulty and discrimination of data instances.

artificial intelligence, discrimination, neural network, (15 more...)

arXiv.org Machine Learning

1903.04016

Country:

South America > Brazil (0.14)
Asia > Japan (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Non-Parametric Calibration of Probabilistic Regression

Song, Hao, Kull, Meelis, Flach, Peter

arXiv.org Machine LearningJun-20-2018

The task of calibration is to retrospectively adjust the outputs from a machine learning model to provide better probability estimates on the target variable. While calibration has been investigated thoroughly in classification, it has not yet been well-established for regression tasks. This paper considers the problem of calibrating a probabilistic regression model to improve the estimated probability densities over the real-valued targets. We propose to calibrate a regression model through the cumulative probability density, which can be derived from calibrating a multi-class classifier. We provide three non-parametric approaches to solve the problem, two of which provide empirical estimates and the third providing smooth density estimates. The proposed approaches are experimentally evaluated to show their ability to improve the performance of regression models on the predictive likelihood.

artificial intelligence, calibration, machine learning, (18 more...)

arXiv.org Machine Learning

1806.0769

Country:

North America > United States (0.46)
Europe > United Kingdom > England (0.14)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)

Add feedback

Probabilistic Sensor Fusion for Ambient Assisted Living

Diethe, Tom, Twomey, Niall, Kull, Meelis, Flach, Peter, Craddock, Ian

arXiv.org Machine LearningFeb-3-2017

There is a widely-accepted need to revise current forms of health-care provision, with particular interest in sensing systems in the home. Given a multiple-modality sensor platform with heterogeneous network connectivity, as is under development in the Sensor Platform for HEalthcare in Residential Environment (SPHERE) Interdisciplinary Research Collaboration (IRC), we face specific challenges relating to the fusion of the heterogeneous sensor modalities. We introduce Bayesian models for sensor fusion, which aims to address the challenges of fusion of heterogeneous sensor modalities. Using this approach we are able to identify the modalities that have most utility for each particular activity, and simultaneously identify which features within that activity are most relevant for a given activity. We further show how the two separate tasks of location prediction and activity recognition can be fused into a single model, which allows for simultaneous learning an prediction for both tasks. We analyse the performance of this model on data collected in the SPHERE house, and show its utility. We also compare against some benchmark models which do not have the full structure,and show how the proposed model compares favourably to these methods

bayesian inference, information fusion, prediction, (17 more...)

arXiv.org Machine Learning

1702.01209

Country:

Europe (0.28)
North America > United States > New York (0.14)

Genre: Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Consumer Health (0.50)
Health & Medicine > Health Care Providers & Services (0.41)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

Precision-Recall-Gain Curves: PR Analysis Done Right

Flach, Peter, Kull, Meelis

Neural Information Processing SystemsDec-31-2015

Precision-Recall analysis abounds in applications of binary classification where true negatives do not add value and hence should not affect assessment of the classifier's performance. Perhaps inspired by the many advantages of receiver operating characteristic (ROC) curves and the area under such curves for accuracy-based performance assessment, many researchers have taken to report Precision-Recall (PR) curves and associated areas as performance metric. We demonstrate in this paper that this practice is fraught with difficulties, mainly because of incoherent scale assumptions -- e.g., the area under a PR curve takes the arithmetic mean of precision values whereas the $F_{\beta}$ score applies the harmonic mean. We show how to fix this by plotting PR curves in a different coordinate system, and demonstrate that the new Precision-Recall-Gain curves inherit all key advantages of ROC curves. In particular, the area under Precision-Recall-Gain curves conveys an expected $F_1$ score on a harmonic scale, and the convex hull of a Precision-Recall-Gain curve allows us to calibrate the classifier's scores so as to determine, for each operating point on the convex hull, the interval of $\beta$ values for which the point optimises $F_{\beta}$. We demonstrate experimentally that the area under traditional PR curves can easily favour models with lower expected $F_1$ score than others, and so the use of Precision-Recall-Gain curves will result in better model selection.

artificial intelligence, classifier, machine learning, (15 more...)

Neural Information Processing Systems

Country: Europe > United Kingdom (0.28)

Genre: Instructional Material (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Threshold Choice Methods: the Missing Link

Hernández-Orallo, José, Flach, Peter, Ferri, Cèsar

arXiv.org Artificial IntelligenceJan-28-2012

Many performance metrics have been introduced for the evaluation of classification performance, with different origins and niches of application: accuracy, macro-accuracy, area under the ROC curve, the ROC convex hull, the absolute error, and the Brier score (with its decomposition into refinement and calibration). One way of understanding the relation among some of these metrics is the use of variable operating conditions (either in the form of misclassification costs or class proportions). Thus, a metric may correspond to some expected loss over a range of operating conditions. One dimension for the analysis has been precisely the distribution we take for this range of operating conditions, leading to some important connections in the area of proper scoring rules. However, we show that there is another dimension which has not received attention in the analysis of performance metrics. This new dimension is given by the decision rule, which is typically implemented as a threshold choice method when using scoring models. In this paper, we explore many old and new threshold choice methods: fixed, score-uniform, score-driven, rate-driven and optimal, among others. By calculating the loss of these methods for a uniform range of operating conditions we get the 0-1 loss, the absolute error, the Brier score (mean squared error), the AUC and the refinement loss respectively. This provides a comprehensive view of performance metrics as well as a systematic approach to loss minimisation, namely: take a model, apply several threshold choice methods consistent with the information which is (and will be) available about the operating condition, and compare their expected losses. In order to assist in this procedure we also derive several connections between the aforementioned performance metrics, and we highlight the role of calibration in choosing the threshold choice method.

artificial intelligence, machine learning, threshold choice method, (15 more...)

arXiv.org Artificial Intelligence

1112.264

Country:

Europe (0.45)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (0.45)

Industry: Leisure & Entertainment (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.45)

Add feedback