Collaborating Authors

Case-Based Reasoning

SMOTE and Edited Nearest Neighbors Undersampling for Imbalanced Datasets


Imbalanced datasets are a special case for classification problem where the class distribution is not uniform among the classes. One of the techniques to handle imbalance datasets is data sampling. Synthetic Minority Oversampling Technique (SMOTE) is an oversampling technique that generates synthetic samples from the minority class to match the majority class. It is used to obtain a synthetically class-balanced or nearly class-balanced training set. SMOTE works by selecting examples that are close in the feature space, drawing a line between the examples in the feature space and drawing a new sample at a point along that line.

IAIA-BL: A Case-based Interpretable Deep Learning Model for Classification of Mass Lesions in Digital Mammography Artificial Intelligence

Interpretability in machine learning models is important in high-stakes decisions, such as whether to order a biopsy based on a mammographic exam. Mammography poses important challenges that are not present in other computer vision tasks: datasets are small, confounding information is present, and it can be difficult even for a radiologist to decide between watchful waiting and biopsy based on a mammogram alone. In this work, we present a framework for interpretable machine learning-based mammography. In addition to predicting whether a lesion is malignant or benign, our work aims to follow the reasoning processes of radiologists in detecting clinically relevant semantic features of each image, such as the characteristics of the mass margins. The framework includes a novel interpretable neural network algorithm that uses case-based reasoning for mammography. Our algorithm can incorporate a combination of data with whole image labelling and data with pixel-wise annotations, leading to better accuracy and interpretability even with a small number of images. Our interpretable models are able to highlight the classification-relevant parts of the image, whereas other methods highlight healthy tissue and confounding information. Our models are decision aids, rather than decision makers, aimed at better overall human-machine collaboration. We do not observe a loss in mass margin classification accuracy over a black box neural network trained on the same data.

Interpretable Machine Learning: Fundamental Principles and 10 Grand Challenges Machine Learning

Interpretability in machine learning (ML) is crucial for high stakes decisions and troubleshooting. In this work, we provide fundamental principles for interpretable ML, and dispel common misunderstandings that dilute the importance of this crucial topic. We also identify 10 technical challenge areas in interpretable machine learning and provide history and background on each problem. Some of these problems are classically important, and some are recent problems that have arisen in the last few years. These problems are: (1) Optimizing sparse logical models such as decision trees; (2) Optimization of scoring systems; (3) Placing constraints into generalized additive models to encourage sparsity and better interpretability; (4) Modern case-based reasoning, including neural networks and matching for causal inference; (5) Complete supervised disentanglement of neural networks; (6) Complete or even partial unsupervised disentanglement of neural networks; (7) Dimensionality reduction for data visualization; (8) Machine learning models that can incorporate physics and other generative or causal constraints; (9) Characterization of the "Rashomon set" of good models; and (10) Interpretable reinforcement learning. This survey is suitable as a starting point for statisticians and computer scientists interested in working in interpretable machine learning.

KNN (K-Nearest Neighbors) is Dead!


I'm talking about the demise of the popular KNN algorithm that is taught in pretty much every Data Science course! Read on to find out what's replacing this staple in every Data Scientists' toolkit. Finding "K" similar items to any given item is widely known in the machine learning community as a "similarity" search or "nearest neighbor" (NN) search. The most widely known NN search algorithm is the K-Nearest Neighbours (KNN) algorithm. In KNN, given a collection of objects like an e-commerce catalog of handphones, we can find a small number (K) nearest neighbors from this entire catalog for any new search query.

Intuitively Assessing ML Model Reliability through Example-Based Explanations and Editing Model Inputs Artificial Intelligence

Interpretability methods aim to help users build trust in and understand the capabilities of machine learning models. However, existing approaches often rely on abstract, complex visualizations that poorly map to the task at hand or require non-trivial ML expertise to interpret. Here, we present two interface modules to facilitate a more intuitive assessment of model reliability. To help users better characterize and reason about a model's uncertainty, we visualize raw and aggregate information about a given input's nearest neighbors in the training dataset. Using an interactive editor, users can manipulate this input in semantically-meaningful ways, determine the effect on the output, and compare against their prior expectations. We evaluate our interface using an electrocardiogram beat classification case study. Compared to a baseline feature importance interface, we find that 9 physicians are better able to align the model's uncertainty with clinically relevant factors and build intuition about its capabilities and limitations.

Nearest Neighbor-based Importance Weighting Machine Learning

Importance weighting is widely applicable in machine learning in general and in techniques dealing with data covariate shift problems in particular. A novel, direct approach to determine such importance weighting is presented. It relies on a nearest neighbor classification scheme and is relatively straightforward to implement. Comparative experiments on various classification tasks demonstrate the effectiveness of our so-called nearest neighbor weighting (NNeW) scheme. Considering its performance, our procedure can act as a simple and effective baseline method for importance weighting.

Using AI-enhanced music-supported therapy to assist stroke patients


Stroke currently ranks as the second most common cause of death and the second most common cause of disability worldwide. Motor deficits of the upper extremity (hemiparesis) are the most common and debilitating consequences of stroke, affecting around 80% of patients. These deficits limit the accomplishment of daily activities, affect social participation, are the origin of significant emotional distress, and cause profound detrimental effects on quality of life. Stroke rehabilitation aims to improve and maintain functional ability through restitution, substitution and compensation of functions. The restoration of motor deficits and improvements in motor function typically occurs during the first months following a stroke and therefore, major efforts are devoted to this acute stage.

A Few Good Counterfactuals: Generating Interpretable, Plausible and Diverse Counterfactual Explanations Artificial Intelligence

Counterfactual explanations provide a potentially significant solution to the Explainable AI (XAI) problem, but good, native counterfactuals have been shown to rarely occur in most datasets. Hence, the most popular methods generate synthetic counterfactuals using blind perturbation. However, such methods have several shortcomings: the resulting counterfactuals (i) may not be valid data-points (they often use features that do not naturally occur), (ii) may lack the sparsity of good counterfactuals (if they modify too many features), and (iii) may lack diversity (if the generated counterfactuals are minimal variants of one another). We describe a method designed to overcome these problems, one that adapts native counterfactuals in the original dataset, to generate sparse, diverse synthetic counterfactuals from naturally occurring features. A series of experiments are reported that systematically explore parametric variations of this novel method on common datasets to establish the conditions for optimal performance.

Grip2u iPhone Case Review: Boost And Slim Cases Feature Signature Band To Prevent Drops

International Business Times

Who are the Grip2u Phone Cases for? Cell phones are expensive, delicate devices and nobody knows that better than cell phone case manufacturer Grip2u. Their entire line of products is dedicated to protecting cell phones while also making them easier to use thanks to the signature band found on the back of the case. So how does a Grip2u case compare to a standard cell phone case? We looked at two different Grip2u models, the Boost and Slim, and compared them to a very basic case using an iPhone XR.

THUIR@COLIEE-2020: Leveraging Semantic Understanding and Exact Matching for Legal Case Retrieval and Entailment Artificial Intelligence

We participated in the two case law tasks, i.e., the legal case retrieval task and the legal case entailment task. Task 1 (the retrieval task) aims to automatically identify supporting cases from the case law corpus given a new case, and Task 2 (the entailment task) to identify specific paragraphs that entail the decision of a new case in a relevant case. In both tasks, we employed the neural models for semantic understanding and the traditional retrieval models for exact matching. As a result, our team ("TLIR") ranked 2nd among all of the teams in Task 1 and 3rd among teams in Task 2. Experimental results suggest that combing models of semantic understanding and exact matching benefits the legal case retrieval task while the legal case entailment task relies more on semantic understanding.