Accuracy
The Impact of Algorithmic Risk Assessments on Human Predictions and its Analysis via Crowdsourcing Studies
Fogliato, Riccardo, Chouldechova, Alexandra, Lipton, Zachary
As algorithmic risk assessment instruments (RAIs) are increasingly adopted to assist decision makers, their predictive performance and potential to promote inequity have come under scrutiny. However, while most studies examine these tools in isolation, researchers have come to recognize that assessing their impact requires understanding the behavior of their human interactants. In this paper, building off of several recent crowdsourcing works focused on criminal justice, we conduct a vignette study in which laypersons are tasked with predicting future re-arrests. Our key findings are as follows: (1) Participants often predict that an offender will be rearrested even when they deem the likelihood of re-arrest to be well below 50%; (2) Participants do not anchor on the RAI's predictions; (3) The time spent on the survey varies widely across participants and most cases are assessed in less than 10 seconds; (4) Judicial decisions, unlike participants' predictions, depend in part on factors that are orthogonal to the likelihood of re-arrest. These results highlight the influence of several crucial but often overlooked design decisions and concerns around generalizability when constructing crowdsourcing studies to analyze the impacts of RAIs.
Artificial Intelligence in Dry Eye Disease
Storås, Andrea M., Strümke, Inga, Riegler, Michael A., Grauslund, Jakob, Hammer, Hugo L., Yazidi, Anis, Halvorsen, Pål, Gundersen, Kjell G., Utheim, Tor P., Jackson, Catherine
Dry eye disease (DED) has a prevalence of between 5 and 50\%, depending on the diagnostic criteria used and population under study. However, it remains one of the most underdiagnosed and undertreated conditions in ophthalmology. Many tests used in the diagnosis of DED rely on an experienced observer for image interpretation, which may be considered subjective and result in variation in diagnosis. Since artificial intelligence (AI) systems are capable of advanced problem solving, use of such techniques could lead to more objective diagnosis. Although the term `AI' is commonly used, recent success in its applications to medicine is mainly due to advancements in the sub-field of machine learning, which has been used to automatically classify images and predict medical outcomes. Powerful machine learning techniques have been harnessed to understand nuances in patient data and medical images, aiming for consistent diagnosis and stratification of disease severity. This is the first literature review on the use of AI in DED. We provide a brief introduction to AI, report its current use in DED research and its potential for application in the clinic. Our review found that AI has been employed in a wide range of DED clinical tests and research applications, primarily for interpretation of interferometry, slit-lamp and meibography images. While initial results are promising, much work is still needed on model development, clinical testing and standardisation.
Computing Graph Descriptors on Edge Streams
Hassan, Zohair Raza, Khan, Imdadullah, Shabbir, Mudassir, Abbas, Waseem
Graph feature extraction is a fundamental task in graphs analytics. Using feature vectors (graph descriptors) in tandem with data mining algorithms that operate on Euclidean data, one can solve problems such as classification, clustering, and anomaly detection on graph-structured data. This idea has proved fruitful in the past, with spectral-based graph descriptors providing state-of-the-art classification accuracy on benchmark datasets. However, these algorithms do not scale to large graphs since: 1) they require storing the entire graph in memory, and 2) the end-user has no control over the algorithm's runtime. In this paper, we present single-pass streaming algorithms to approximate structural features of graphs (counts of subgraphs of order $k \geq 4$). Operating on edge streams allows us to avoid keeping the entire graph in memory, and controlling the sample size enables us to control the time taken by the algorithm. We demonstrate the efficacy of our descriptors by analyzing the approximation error, classification accuracy, and scalability to massive graphs. Our experiments showcase the effect of the sample size on approximation error and predictive accuracy. The proposed descriptors are applicable on graphs with millions of edges within minutes and outperform the state-of-the-art descriptors in classification accuracy.
RF-LighGBM: A probabilistic ensemble way to predict customer repurchase behaviour in community e-commerce
Yang, Liping, Niu, Xiaxia, Wu, Jun
It is reported that the number of online payment users in China has reached 854 million; with the emergence of community e-commerce platforms, the trend of integration of e-commerce and social applications is increasingly intense. Community e-commerce is not a mature and sound comprehensive e-commerce with fewer categories and low brand value. To effectively retain community users and fully explore customer value has become an important challenge for community e-commerce operators. Given the above problems, this paper uses the data-driven method to study the prediction of community e-commerce customers' repurchase behaviour. The main research contents include 1. Given the complex problem of feature engineering, the classic model RFM in the field of customer relationship management is improved, and an improved model is proposed to describe the characteristics of customer buying behaviour, which includes five indicators. 2. In view of the imbalance of machine learning training samples in SMOTE-ENN, a training sample balance using SMOTE-ENN is proposed. The experimental results show that the machine learning model can be trained more effectively on balanced samples. 3. Aiming at the complexity of the parameter adjustment process, an automatic hyperparameter optimization method based on the TPE method was proposed. Compared with other methods, the model's prediction performance is improved, and the training time is reduced by more than 450%. 4. Aiming at the weak prediction ability of a single model, the soft voting based RF-LightgBM model was proposed. The experimental results show that the RF-LighTGBM model proposed in this paper can effectively predict customer repurchase behaviour, and the F1 value is 0.859, which is better than the single model and previous research results.
Two Shifts for Crop Mapping: Leveraging Aggregate Crop Statistics to Improve Satellite-based Maps in New Regions
Kluger, Dan M., Wang, Sherrie, Lobell, David B.
Crop type mapping at the field level is critical for a variety of applications in agricultural monitoring, and satellite imagery is becoming an increasingly abundant and useful raw input from which to create crop type maps. Still, in many regions crop type mapping with satellite data remains constrained by a scarcity of field-level crop labels for training supervised classification models. When training data is not available in one region, classifiers trained in similar regions can be transferred, but shifts in the distribution of crop types as well as transformations of the features between regions lead to reduced classification accuracy. We present a methodology that uses aggregate-level crop statistics to correct the classifier by accounting for these two types of shifts. To adjust for shifts in the crop type composition we present a scheme for properly reweighting the posterior probabilities of each class that are output by the classifier. To adjust for shifts in features we propose a method to estimate and remove linear shifts in the mean feature vector. We demonstrate that this methodology leads to substantial improvements in overall classification accuracy when using Linear Discriminant Analysis (LDA) to map crop types in Occitanie, France and in Western Province, Kenya. When using LDA as our base classifier, we found that in France our methodology led to percent reductions in misclassifications ranging from 2.8% to 42.2% (mean = 21.9%) over eleven different training departments, and in Kenya the percent reductions in misclassification were 6.6%, 28.4%, and 42.7% for three training regions. While our methodology was statistically motivated by the LDA classifier, it can be applied to any type of classifier. As an example, we demonstrate its successful application to improve a Random Forest classifier.
Complex Event Forecasting with Prediction Suffix Trees: Extended Technical Report
Alevizos, Elias, Artikis, Alexander, Paliouras, Georgios
Complex Event Recognition (CER) systems have become popular in the past two decades due to their ability to "instantly" detect patterns on real-time streams of events. However, there is a lack of methods for forecasting when a pattern might occur before such an occurrence is actually detected by a CER engine. We present a formal framework that attempts to address the issue of Complex Event Forecasting (CEF). Our framework combines two formalisms: a) symbolic automata which are used to encode complex event patterns; and b) prediction suffix trees which can provide a succinct probabilistic description of an automaton's behavior. We compare our proposed approach against state-of-the-art methods and show its advantage in terms of accuracy and efficiency. In particular, prediction suffix trees, being variable-order Markov models, have the ability to capture long-term dependencies in a stream by remembering only those past sequences that are informative enough. Our experimental results demonstrate the benefits, in terms of accuracy, of being able to capture such long-term dependencies. This is achieved by increasing the order of our model beyond what is possible with full-order Markov models that need to perform an exhaustive enumeration of all possible past sequences of a given order. We also discuss extensively how CEF solutions should be best evaluated on the quality of their forecasts.
Federated Learning: Issues in Medical Application
Yoo, Joo Hun, Jeong, Hyejun, Lee, Jaehyeok, Chung, Tai-Myoung
Since the federated learning, which makes AI learning possible without moving local data around, was introduced by google in 2017 it has been actively studied particularly in the field of medicine. In fact, the idea of machine learning in AI without collecting data from local clients is very attractive because data remain in local sites. However, federated learning techniques still have various open issues due to its own characteristics such as non identical distribution, client participation management, and vulnerable environments. In this presentation, the current issues to make federated learning flawlessly useful in the real world will be briefly overviewed. They are related to data/system heterogeneity, client management, traceability, and security. Also, we introduce the modularized federated learning framework, we currently develop, to experiment various techniques and protocols to find solutions for aforementioned issues. The framework will be open to public after development completes.
Look Who's Talking: Interpretable Machine Learning for Assessing Italian SMEs Credit Default
Crosato, Lisa, Liberati, Caterina, Repetto, Marco
The economy of the European Union (EU) is deeply grounded into Small and Medium Enterprises (SMEs). SMEs represent about 99.8% of the active enterprises in the EU-28 non-financial business sector (NFBS), accounting for almost 60% of value-added within the NFBS and fostering the workforce of the EU with two out of every three jobs (European Commission, 2019a). Thus, there is a wide literature covering various economic aspects of SMEs, with a particular attention to default prediction (for an up-to-date review see Ciampi et al., 2021), which is of interest not only for scholars but also for practitioners such as financial intermediaries and for policy makers in their effort to support SMEs and to ease credit constraints to which they are naturally exposed (Andries et al., 2018; Cornille et al., 2019). Whether it is for private credit-risk assessment or for public funding, independently on the type of data imputed to measure the health status of a firm, prediction of default should success in two aspects: maximise correct classification and clarify the role of the variables involved in the process. Most of the times, the contributions based on Machine Learning (ML) techniques neglect the latter aspect, being rather focused on the former, often with better results with respect to parametric techniques that provide, on the contrary, a clear framework for interpretation.
Sarcasm Detection in Twitter -- Performance Impact while using Data Augmentation: Word Embeddings
Handoyo, Alif Tri, Hidayaturrahman, null, Suhartono, Derwin
Sarcasm is the use of words usually used to either mock or annoy someone, or for humorous purposes. Sarcasm is largely used in social networks and microblogging websites, where people mock or censure in a way that makes it difficult even for humans to tell if what is said is what is meant. Failure to identify sarcastic utterances in Natural Language Processing applications such as sentiment analysis and opinion mining will confuse classification algorithms and generate false results. Several studies on sarcasm detection have utilized different learning algorithms. However, most of these learning models have always focused on the contents of expression only, leaving the contextual information in isolation. As a result, they failed to capture the contextual information in the sarcastic expression. Moreover, some datasets used in several studies have an unbalanced dataset which impacting the model result. In this paper, we propose a contextual model for sarcasm identification in twitter using RoBERTa, and augmenting the dataset by applying Global Vector representation (GloVe) for the construction of word embedding and context learning to generate more data and balancing the dataset. The effectiveness of this technique is tested with various datasets and data augmentation settings. In particular, we achieve performance gain by 3.2% in the iSarcasm dataset when using data augmentation to increase 20% of data labeled as sarcastic, resulting F-score of 40.4% compared to 37.2% without data augmentation.
FADE: FAir Double Ensemble Learning for Observable and Counterfactual Outcomes
Mishler, Alan, Kennedy, Edward
Methods for building fair predictors often involve tradeoffs between fairness and accuracy and between different fairness criteria, but the nature of these tradeoffs varies. Recent work seeks to characterize these tradeoffs in specific problem settings, but these methods often do not accommodate users who wish to improve the fairness of an existing benchmark model without sacrificing accuracy, or vice versa. These results are also typically restricted to observable accuracy and fairness criteria. We develop a flexible framework for fair ensemble learning that allows users to efficiently explore the fairness-accuracy space or to improve the fairness or accuracy of a benchmark model. Our framework can simultaneously target multiple observable or counterfactual fairness criteria, and it enables users to combine a large number of previously trained and newly trained predictors. We provide theoretical guarantees that our estimators converge at fast rates. We apply our method on both simulated and real data, with respect to both observable and counterfactual accuracy and fairness criteria. We show that, surprisingly, multiple unfairness measures can sometimes be minimized simultaneously with little impact on accuracy, relative to unconstrained predictors or existing benchmark models.