Goto

Collaborating Authors

 Case-Based Reasoning


Edited Nearest Neighbors ENN

#artificialintelligence

Hi there, is everything cool? Edited Nearest Neighbors Rule for undersampling involves using K 3 nearest neighbors to the data points that are misclassified and that are then removed before a K 1 classification rule is applied. This approach of resampling and classification was first proposed by Dennis Wilson in his 1972 paper titled "Asymptotic Properties of Nearest Neighbor Rules Using Edited Data." When used as an undersampling procedure, the rule can be applied to each example in the majority class, allowing those examples that are misclassified as belonging to the minority class to be removed and those correctly classified to remain. Let's see how can we apply the ENN And just like CNN, the ENN gives the best results when combined with another oversampling method like SMOTE.



Solving the Class Imbalance Problem Using a Counterfactual Method for Data Augmentation

arXiv.org Artificial Intelligence

Learning from class imbalanced datasets poses challenges for many machine learning algorithms. Many real-world domains are, by definition, class imbalanced by virtue of having a majority class that naturally has many more instances than its minority class (e.g. genuine bank transactions occur much more often than fraudulent ones). Many methods have been proposed to solve the class imbalance problem, among the most popular being oversampling techniques (such as SMOTE). These methods generate synthetic instances in the minority class, to balance the dataset, performing data augmentations that improve the performance of predictive machine learning (ML) models. In this paper we advance a novel data augmentation method (adapted from eXplainable AI), that generates synthetic, counterfactual instances in the minority class. Unlike other oversampling techniques, this method adaptively combines exist-ing instances from the dataset, using actual feature-values rather than interpolating values between instances. Several experiments using four different classifiers and 25 datasets are reported, which show that this Counterfactual Augmentation method (CFA) generates useful synthetic data points in the minority class. The experiments also show that CFA is competitive with many other oversampling methods many of which are variants of SMOTE. The basis for CFAs performance is discussed, along with the conditions under which it is likely to perform better or worse in future tests.


Nearest neighbor process: weak convergence and non-asymptotic bound

arXiv.org Machine Learning

An empirical measure that results from the nearest neighbors to a given point - the nearest neighbor measure - is introduced and studied as a central statistical quantity. First, the resulting empirical process is shown to satisfy a uniform central limit theorem under a (local) bracketing entropy condition on the underlying class of functions (reflecting the localizing nature of nearest neighbor algorithm). Second a uniform non-asymptotic bound is established under a well-known condition, often refereed to as Vapnik-Chervonenkis, on the uniform entropy numbers.


Applying Regression Conformal Prediction with Nearest Neighbors to time series data

arXiv.org Machine Learning

In this paper, we apply conformal prediction to time series data. Conformal prediction isa method that produces predictive regions given a confidence level. The regions outputs arealways valid under the exchangeability assumption. However, this assumption does not holdfor the time series data because there is a link among past, current, and future observations.Consequently, the challenge of applying conformal predictors to the problem of time seriesdata lies in the fact that observations of a time series are dependent and therefore do notmeet the exchangeability assumption. This paper aims to present a way of constructingreliable prediction intervals by using conformal predictors in the context of time series. Weuse the nearest neighbors method based on the fast parameters tuning technique in theweighted nearest neighbors (FPTO-WNN) approach as the underlying algorithm. Dataanalysis demonstrates the effectiveness of the proposed approach.


Terminal Embeddings in Sublinear Time

arXiv.org Machine Learning

Recently (Elkin, Filtser, Neiman 2017) introduced the concept of a {\it terminal embedding} from one metric space $(X,d_X)$ to another $(Y,d_Y)$ with a set of designated terminals $T\subset X$. Such an embedding $f$ is said to have distortion $\rho\ge 1$ if $\rho$ is the smallest value such that there exists a constant $C>0$ satisfying \begin{equation*} \forall x\in T\ \forall q\in X,\ C d_X(x, q) \le d_Y(f(x), f(q)) \le C \rho d_X(x, q) . \end{equation*} In the case that $X,Y$ are both Euclidean metrics with $Y$ being $m$-dimensional, recently (Narayanan, Nelson 2019), following work of (Mahabadi, Makarychev, Makarychev, Razenshteyn 2018), showed that distortion $1+\epsilon$ is achievable via such a terminal embedding with $m = O(\epsilon^{-2}\log n)$ for $n := |T|$. This generalizes the Johnson-Lindenstrauss lemma, which only preserves distances within $T$ and not to $T$ from the rest of space. The downside is that evaluating the embedding on some $q\in \mathbb{R}^d$ required solving a semidefinite program with $\Theta(n)$ constraints in $m$ variables and thus required some superlinear $\mathrm{poly}(n)$ runtime. Our main contribution in this work is to give a new data structure for computing terminal embeddings. We show how to pre-process $T$ to obtain an almost linear-space data structure that supports computing the terminal embedding image of any $q\in\mathbb{R}^d$ in sublinear time $n^{1-\Theta(\epsilon^2)+o(1)} + dn^{o(1)}$. To accomplish this, we leverage tools developed in the context of approximate nearest neighbor search.


A guided journey through non-interactive automatic story generation

arXiv.org Artificial Intelligence

We present a literature survey on non-interactive computational story generation. The article starts with the presentation of requirements for creative systems, three types of models of creativity (computational, socio-cultural, and individual), and models of human creative writing. Then it reviews each class of story generation approach depending on the used technology: story-schemas, analogy, rules, planning, evolutionary algorithms, implicit knowledge learning, and explicit knowledge learning. Before the concluding section, the article analyses the contributions of the reviewed work to improve the quality of the generated stories. This analysis addresses the description of the story characters, the use of narrative knowledge including about character believability, and the possible lack of more comprehensive or more detailed knowledge or creativity models. Finally, the article presents concluding remarks in the form of suggestions of research topics that might have a significant impact on the advancement of the state of the art on autonomous non-interactive story generation systems. The article concludes that the autonomous generation and adoption of the main idea to be conveyed and the autonomous design of the creativity ensuring criteria are possibly two of most important topics for future research.


Dynamic Ranking with the BTL Model: A Nearest Neighbor based Rank Centrality Method

arXiv.org Machine Learning

Many applications such as recommendation systems or sports tournaments involve pairwise comparisons within a collection of $n$ items, the goal being to aggregate the binary outcomes of the comparisons in order to recover the latent strength and/or global ranking of the items. In recent years, this problem has received significant interest from a theoretical perspective with a number of methods being proposed, along with associated statistical guarantees under the assumption of a suitable generative model. While these results typically collect the pairwise comparisons as one comparison graph $G$, however in many applications - such as the outcomes of soccer matches during a tournament - the nature of pairwise outcomes can evolve with time. Theoretical results for such a dynamic setting are relatively limited compared to the aforementioned static setting. We study in this paper an extension of the classic BTL (Bradley-Terry-Luce) model for the static setting to our dynamic setup under the assumption that the probabilities of the pairwise outcomes evolve smoothly over the time domain $[0,1]$. Given a sequence of comparison graphs $(G_{t'})_{t' \in \mathcal{T}}$ on a regular grid $\mathcal{T} \subset [0,1]$, we aim at recovering the latent strengths of the items $w_t \in \mathbb{R}^n$ at any time $t \in [0,1]$. To this end, we adapt the Rank Centrality method - a popular spectral approach for ranking in the static case - by locally averaging the available data on a suitable neighborhood of $t$. When $(G_{t'})_{t' \in \mathcal{T}}$ is a sequence of Erd\"os-Renyi graphs, we provide non-asymptotic $\ell_2$ and $\ell_{\infty}$ error bounds for estimating $w_t^*$ which in particular establishes the consistency of this method in terms of $n$, and the grid size $\lvert\mathcal{T}\rvert$. We also complement our theoretical analysis with experiments on real and synthetic data.


DisCERN:Discovering Counterfactual Explanations using Relevance Features from Neighbourhoods

arXiv.org Artificial Intelligence

Counterfactual explanations focus on "actionable knowledge" to help end-users understand how a machine learning outcome could be changed to a more desirable outcome. For this purpose a counterfactual explainer needs to discover input dependencies that relate to outcome changes. Identifying the minimum subset of feature changes needed to action an output change in the decision is an interesting challenge for counterfactual explainers. The DisCERN algorithm introduced in this paper is a case-based counter-factual explainer. Here counterfactuals are formed by replacing feature values from a nearest unlike neighbour (NUN) until an actionable change is observed. We show how widely adopted feature relevance-based explainers (i.e. LIME, SHAP), can inform DisCERN to identify the minimum subset of "actionable features". We demonstrate our DisCERN algorithm on five datasets in a comparative study with the widely used optimisation-based counterfactual approach DiCE. Our results demonstrate that DisCERN is an effective strategy to minimise actionable changes necessary to create good counterfactual explanations.


The Importance and Challenges of Ethical AI

#artificialintelligence

In thinking about how artificial intelligence works, it is not difficult to arrive at the analogy of a human brain, learning over time from the information it is provided, seeking patterns in that information to optimize its ability to apply those learnings to similar or never-before-seen problems. However, the power of AI lies in its ability to process infinitely greater volumes of information, including streaming data, to detect patterns that may otherwise never be detectible to the human brain. This kind of superpower can be useful when processing over one hundred billion transactions per year and seeking, in real time, to detect costly fraud. This is how, using artificial intelligence technologies such as smart agents, neural networks, and case-based reasoning, Brighterion has been able to transform how fraud is detected and prevented across payment, healthcare and credit risk lifecycle ecosystems. As AI continues to enable, improve and automate a growing number of tasks and processes across different industries, it is not only shifting how companies conduct business, it is also increasingly curating our daily experiences and shaping how we as individuals interact with our world.