Performance Analysis
Identifying biases in legal data: An algorithmic fairness perspective
Sargent, Jackson, Weber, Melanie
As artificial intelligence enters the legal space, it is essential to recognize biases in legal data and ensure that they are not replicated and reinforced with legal technology [7, 13, 18]. Furthermore, understanding biases in legal data and developing discrimination-free technology could help the legal space to become fairer and more widely accessible. We typically find two types of biases in legal data: First, representation biases, i.e., certain social groups are over-or underrepresented in a data set. Second, sentencing disparities, i.e., the outcome of legal proceedings for similar cases varies across social groups. Representation biases may reflect disparities in policing (arrest rates) or in offense rates.
Audio Interval Retrieval using Convolutional Neural Networks
Kuzminykh, Ievgeniia, Shevchuk, Dan, Shiaeles, Stavros, Ghita, Bogdan
Modern streaming services are increasingly labeling videos based on their visual or audio content. This typically augments the use of technologies such as AI and ML by allowing to use natural speech for searching by keywords and video descriptions. Prior research has successfully provided a number of solutions for speech to text, in the case of a human speech, but this article aims to investigate possible solutions to retrieve sound events based on a natural language query, and estimate how effective and accurate they are. In this study, we specifically focus on the YamNet, AlexNet, and ResNet-50 pre-trained models to automatically classify audio samples using their respective melspectrograms into a number of predefined classes. The predefined classes can represent sounds associated with actions within a video fragment. Two tests are conducted to evaluate the performance of the models on two separate problems: audio classification and intervals retrieval based on a natural language query. Results show that the benchmarked models are comparable in terms of performance, with YamNet slightly outperforming the other two models. YamNet was able to classify single fixed-size audio samples with 92.7% accuracy and 68.75% precision while its average accuracy on intervals retrieval was 71.62% and precision was 41.95%. The investigated method may be embedded into an automated event marking architecture for streaming services.
Algorithmic Fairness Verification with Graphical Models
Ghosh, Bishwamittra, Basu, Debabrota, Meel, Kuldeep S.
In recent years, machine learning (ML) algorithms have been deployed in safety-critical and high-stake decision-making, where the fairness of algorithms is of paramount importance. Fairness in ML centers on detecting bias towards certain demographic populations induced by an ML classifier and proposes algorithmic solutions to mitigate the bias with respect to different fairness definitions. To this end, several fairness verifiers have been proposed that compute the bias in the prediction of an ML classifier -- essentially beyond a finite dataset -- given the probability distribution of input features. In the context of verifying linear classifiers, existing fairness verifiers are limited by accuracy due to imprecise modelling of correlations among features and scalability due to restrictive formulations of the classifiers as SSAT or SMT formulas or by sampling. In this paper, we propose an efficient fairness verifier, called FVGM, that encodes the correlations among features as a Bayesian network. In contrast to existing verifiers, FVGM proposes a stochastic subset-sum based approach for verifying linear classifiers. Experimentally, we show that FVGM leads to an accurate and scalable assessment for more diverse families of fairness-enhancing algorithms, fairness attacks, and group/causal fairness metrics than the state-of-the-art. We also demonstrate that FVGM facilitates the computation of fairness influence functions as a stepping stone to detect the source of bias induced by subsets of features.
Machine Learning-Based Estimation and Goodness-of-Fit for Large-Scale Confirmatory Item Factor Analysis
Urban, Christopher J., Bauer, Daniel J.
We investigate novel parameter estimation and goodness-of-fit (GOF) assessment methods for large-scale confirmatory item factor analysis (IFA) with many respondents, items, and latent factors. For parameter estimation, we extend Urban and Bauer's (2021) deep learning algorithm for exploratory IFA to the confirmatory setting by showing how to handle user-defined constraints on loadings and factor correlations. For GOF assessment, we explore new simulation-based tests and indices. In particular, we consider extensions of the classifier two-sample test (C2ST), a method that tests whether a machine learning classifier can distinguish between observed data and synthetic data sampled from a fitted IFA model. The C2ST provides a flexible framework that integrates overall model fit, piece-wise fit, and person fit. Proposed extensions include a C2ST-based test of approximate fit in which the user specifies what percentage of observed data can be distinguished from synthetic data as well as a C2ST-based relative fit index that is similar in spirit to the relative fit indices used in structural equation modeling. Via simulation studies, we first show that the confirmatory extension of Urban and Bauer's (2021) algorithm produces more accurate parameter estimates as the sample size increases and obtains comparable estimates to a state-of-the-art confirmatory IFA estimation procedure in less time. We next show that the C2ST-based test of approximate fit controls the empirical type I error rate and detects when the number of latent factors is misspecified. Finally, we empirically investigate how the sampling distribution of the C2ST-based relative fit index depends on the sample size.
Deep Quantile Regression for Uncertainty Estimation in Unsupervised and Supervised Lesion Detection
Akrami, Haleh, Joshi, Anand, Aydore, Sergul, Leahy, Richard
Despite impressive state-of-the-art performance on a wide variety of machine learning tasks in multiple applications, deep learning methods can produce over-confident predictions, particularly with limited training data. Therefore, quantifying uncertainty is particularly important in critical applications such as anomaly or lesion detection and clinical diagnosis, where a realistic assessment of uncertainty is essential in determining surgical margins, disease status and appropriate treatment. In this work, we focus on using quantile regression to estimate aleatoric uncertainty and use it for estimating uncertainty in both supervised and unsupervised lesion detection problems. In the unsupervised settings, we apply quantile regression to a lesion detection task using Variational AutoEncoder (VAE). The VAE models the output as a conditionally independent Gaussian characterized by means and variances for each output dimension. Unfortunately, joint optimization of both mean and variance in the VAE leads to the well-known problem of shrinkage or underestimation of variance. We describe an alternative VAE model, Quantile-Regression VAE (QR-VAE), that avoids this variance shrinkage problem by estimating conditional quantiles for the given input image. Using the estimated quantiles, we compute the conditional mean and variance for input images under the conditionally Gaussian model. We then compute reconstruction probability using this model as a principled approach to outlier or anomaly detection applications. In the supervised setting, we develop binary quantile regression (BQR) for the supervised lesion segmentation task. BQR segmentation can capture uncertainty in label boundaries. We show how quantile regression can be used to characterize expert disagreement in the location of lesion boundaries.
Harnessing the Power of Ego Network Layers for Link Prediction in Online Social Networks
Toprak, Mustafa, Boldrini, Chiara, Passarella, Andrea, Conti, Marco
Being able to recommend links between users in online social networks is important for users to connect with like-minded individuals as well as for the platforms themselves and third parties leveraging social media information to grow their business. Predictions are typically based on unsupervised or supervised learning, often leveraging simple yet effective graph topological information, such as the number of common neighbors. However, we argue that richer information about personal social structure of individuals might lead to better predictions. In this paper, we propose to leverage well-established social cognitive theories to improve link prediction performance. According to these theories, individuals arrange their social relationships along, on average, five concentric circles of decreasing intimacy. We postulate that relationships in different circles have different importance in predicting new links. In order to validate this claim, we focus on popular feature-extraction prediction algorithms (both unsupervised and supervised) and we extend them to include social-circles awareness. We validate the prediction performance of these circle-aware algorithms against several benchmarks (including their baseline versions as well as node-embedding- and GNN-based link prediction), leveraging two Twitter datasets comprising a community of video gamers and generic users. We show that social-awareness generally provides significant improvements in the prediction performance, beating also state-of-the-art solutions like node2vec and SEAL, and without increasing the computational complexity. Finally, we show that social-awareness can be used in place of using a classifier (which may be costly or impractical) for targeting a specific category of users.
Model-Based Approach for Measuring the Fairness in ASR
Liu, Zhe, Veliche, Irina-Elena, Peng, Fuchun
The issue of fairness arises when the automatic speech recognition (ASR) systems do not perform equally well for all subgroups of the population. In any fairness measurement studies for ASR, the open questions of how to control the nuisance factors, how to handle unobserved heterogeneity across speakers, and how to trace the source of any word error rate (WER) gap among different subgroups are especially important - if not appropriately accounted for, incorrect conclusions will be drawn. In this paper, we introduce mixed-effects Poisson regression to better measure and interpret any WER difference among subgroups of interest. Particularly, the presented method can effectively address the three problems raised above and is very flexible to use in practical disparity analyses. We demonstrate the validity of proposed model-based approach on both synthetic and real-world speech data.
AI Method Improves Detection of Rare Whale Calls
The North Atlantic Right Whale (Right whale) is one of the most endangered whale species in the world with only about 368 remaining off the east coast of North America. A decreasing trend and low reproduction rates, combined with high levels of human activities – such as shipping and fisheries – underscore their precarious situation. Efficient tracking of their numbers, migration paths and habitat use is vital to lowering the number of preventable injuries and deaths and promoting their recovery. One of the frequently used methods to monitor whales is called passive acoustics technology. Right whales vocalize a variety of low-frequency sounds such as moans, groans, pulses and even belches.
Development of patients triage algorithm from nationwide COVID-19 registry data based on machine learning
Hwang, Hyung Ju, Jung, Seyoung, Park, Min Sue, Jo, Hyeontae
Prompt severity assessment model of confirmed patients who were infected with infectious diseases could enable efficient diagnosis and alleviate the burden on the medical system. This paper provides the development processes of the severity assessment model using machine learning techniques and its application on SARS-CoV-2 patients. Here, we highlight that our model only requires basic patients' basic personal data, allowing for them to judge their own severity. We selected the boosting-based decision tree model as a classifier and interpreted mortality as a probability score after modeling. Specifically, hyperparameters that determine the structure of the tree model were tuned using the Bayesian optimization technique without any knowledge of medical information. As a result, we measured model performance and identified the variables affecting the severity through the model. Finally, we aim to establish a medical system that allows patients to check their own severity and informs them to visit the appropriate clinic center based on the past treatment details of other patients with similar severity.
A Comprehensive Overview of Recommender System and Sentiment Analysis
AL-Ghuribi, Sumaia Mohammed, Noah, Shahrul Azman Mohd
Recommender system has been proven to be significantly crucial in many fields and is widely used by various domains. Most of the conventional recommender systems rely on the numeric rating given by a user to reflect his opinion about a consumed item; however, these ratings are not available in many domains. As a result, a new source of information represented by the user-generated reviews is incorporated in the recommendation process to compensate for the lack of these ratings. The reviews contain prosperous and numerous information related to the whole item or a specific feature that can be extracted using the sentiment analysis field. This paper gives a comprehensive overview to help researchers who aim to work with recommender system and sentiment analysis. It includes a background of the recommender system concept, including phases, approaches, and performance metrics used in recommender systems. Then, it discusses the sentiment analysis concept and highlights the main points in the sentiment analysis, including level, approaches, and focuses on aspect-based sentiment analysis.