Goto

Collaborating Authors

 Accuracy


A Self-Adaptive Synthetic Over-Sampling Technique for Imbalanced Classification

arXiv.org Artificial Intelligence

Traditionally, in supervised machine learning, (a significant) part of the available data (usually 50% to 80%) is used for training and the rest for validation. In many problems, however, the data is highly imbalanced in regard to different classes or does not have good coverage of the feasible data space which, in turn, creates problems in validation and usage phase. In this paper, we propose a technique for synthesising feasible and likely data to help balance the classes as well as to boost the performance in terms of confusion matrix as well as overall. The idea, in a nutshell, is to synthesise data samples in close vicinity to the actual data samples specifically for the less represented (minority) classes. This has also implications to the so-called fairness of machine learning. In this paper, we propose a specific method for synthesising data in a way to balance the classes and boost the performance, especially of the minority classes. It is generic and can be applied to different base algorithms, e.g. support vector machine, k-nearest neighbour, deep networks, rule-based classifiers, decision trees, etc. The results demonstrated that: i) a significantly more balanced (and fair) classification results can be achieved; ii) that the overall performance as well as the performance per class measured by confusion matrix can be boosted. In addition, this approach can be very valuable for the cases when the number of actual available labelled data is small which itself is one of the problems of the contemporary machine learning.


Interpretable Charge Prediction for Criminal Cases with Dynamic Rationale Attention

Journal of Artificial Intelligence Research

Charge prediction which aims to determine appropriate charges for criminal cases based on textual fact descriptions, is an important technology in the field of AI&Law. Previous works focus on improving prediction accuracy, ignoring the interpretability, which limits the methods' applicability. In this work, we propose a deep neural framework to extract short but charge-decisive text snippets - rationales - from input fact description, as the interpretation of charge prediction. To solve the scarcity problem of rationale annotated corpus, rationales are extracted in a reinforcement style with the only supervision in the form of charge labels. We further propose a dynamic rationale attention mechanism to better utilize the information in extracted rationales and predict the charges. Experimental results show that besides providing charge prediction interpretation, our approach can also capture subtle details to help charge prediction.


Host-based anomaly detection using Eigentraces feature extraction and one-class classification on system call trace data

arXiv.org Machine Learning

This paper proposes a methodology for host-based anomaly detection using a semi-supervised algorithm namely one-class classifier combined with a PCA-based feature extraction technique called Eigentraces on system call trace data. The one-class classification is based on generating a set of artificial data using a reference distribution and combining the target class probability function with artificial class density function to estimate the target class density function through the Bayes formulation. The benchmark dataset, ADFA-LD, is employed for the simulation study. ADFA-LD dataset contains thousands of system call traces collected during various normal and attack processes for the Linux operating system environment. In order to pre-process and to extract features, windowing on the system call trace data followed by the principal component analysis which is named as Eigentraces is implemented. The target class probability function is modeled separately by Radial Basis Function neural network and Random Forest machine learners for performance comparison purposes. The simulation study showed that the proposed intrusion detection system offers high performance for detecting anomalies and normal activities with respect to a set of well-accepted metrics including detection rate, accuracy, and missed and false alarm rates.


Making Learners (More) Monotone

arXiv.org Machine Learning

Learning performance can show non-monotonic behavior. That is, more data does not necessarily lead to better models, even on average. We propose three algorithms that take a supervised learning model and make it perform more monotone. We prove consistency and monotonicity with high probability, and evaluate the algorithms on scenarios where non-monotone behaviour occurs. Our proposed algorithm $\text{MT}_{\text{HT}}$ makes less than $1\%$ non-monotone decisions on MNIST while staying competitive in terms of error rate compared to several baselines.


Matrix Normal PCA for Interpretable Dimension Reduction and Graphical Noise Modeling

arXiv.org Machine Learning

Principal component analysis (PCA) is one of the most widely used dimension reduction and multivariate statistical techniques. From a probabilistic perspective, PCA seeks a low-dimensional representation of data in the presence of independent identical Gaussian noise. Probabilistic PCA (PPCA) and its variants have been extensively studied for decades. Most of them assume the underlying noise follows a certain independent identical distribution. However, the noise in the real world is usually complicated and structured. To address this challenge, some non-linear variants of PPCA have been proposed. But those methods are generally difficult to interpret. To this end, we propose a powerful and intuitive PCA method (MN-PCA) through modeling the graphical noise by the matrix normal distribution, which enables us to explore the structure of noise in both the feature space and the sample space. MN-PCA obtains a low-rank representation of data and the structure of noise simultaneously. And it can be explained as approximating data over the generalized Mahalanobis distance. We develop two algorithms to solve this model: one maximizes the regularized likelihood, the other exploits the Wasserstein distance, which is more robust. Extensive experiments on various data demonstrate their effectiveness.


Understanding Classification Thresholds Using Isocurves

#artificialintelligence

You are in a conference room, presenting your work on a classification problem. You demonstrate all the magic you performed with feature engineering, predictor selection, model selection, hyperparameter tuning, and ensembling. You conclude your presentation with the predicted probabilities and the ROC curve, and a fantastic AUC. You sit down confident in a job well done. And the manager says, "What am I supposed to do with all this probability and ROC stuff? I just want to know if I should do x or y."


Large expert-curated database for benchmarking document similarity detection in biomedical literature search

#artificialintelligence

Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations.


DCSO: Dynamic Combination of Detector Scores for Outlier Ensembles

arXiv.org Machine Learning

Selecting and combining the outlier scores of different base detectors used within outlier ensembles can be quite challenging in the absence of ground truth. In this paper, an unsupervised outlier detector combination framework called DCSO is proposed, demonstrated and assessed for th e dynamic selection of most competent base detectors, with an emphasis on data locality. Th e proposed DCSO framework first defines the local region of a tes t instance by its k nearest neighbors and then identifies the top-performing base detectors within t he local region. Experimental results on ten benchmark datasets demonstrate that DCSO provides consistent performance i mprovement over existing stati c combination approaches in mining outlying objects. To facilitate interpretability and reliability of the proposed method, DCSO i s analyzed using both theoretica l frameworks and visualization techniques, and presented alongside empirical parameter setting instructions that can be used to improve the overall performance.


Schema Matching using Machine Learning

arXiv.org Artificial Intelligence

--Schema Matching is a method of finding attributes that are either similar to each other linguistically or represent the same information. In this project, we take a hybrid approach at solving this problem by making use of both the provided data and the schema name to perform one to one schema matching and introduce creation of a global dictionary to achieve one to many schema matching. We experiment with two methods of one to one matching and compare both based on their F-scores, precision and recall. We also compare our method with the ones previously suggested and highlight differences between them. The schema of a database is the skeleton that represents its logical view. In other words, a schema describes the data contained in a database, with the name of each attribute in a relation and its data type contained in the relation's schema. Any time the different tables maintained by a peer management system need to be linked to each other or when one branch of a company is closed down and all its data needs to be redistributed to the database maintained by other branches or when one company takes over another company and all data of the child comapny needs to be integrated with that of the parent company, the need to match schemas of multiple relations with each other arises. Consider the Tables I and II. Here, the ideal schema mappings would be: FName LName Name, Major Maj Stream and Address House No St Name .


Economy Statistical Recurrent Units For Inferring Nonlinear Granger Causality

arXiv.org Machine Learning

Granger causality is a widely-used criterion for analyzing interactions in large-scale networks. As most physical interactions are inherently nonlinear, we consider the problem of inferring the existence of pairwise Granger causality between nonlinearly interacting stochastic processes from their time series measurements. Our proposed approach relies on modeling the embedded nonlinearities in the measurements using a component-wise time series prediction model based on Statistical Recurrent Units (SRUs). We make a case that the network topology of Granger causal relations is directly inferrable from a structured sparse estimate of the internal parameters of the SRU networks trained to predict the processes$'$ time series measurements. We propose a variant of SRU, called economy-SRU, which, by design has considerably fewer trainable parameters, and therefore less prone to overfitting. The economy-SRU computes a low-dimensional sketch of its high-dimensional hidden state in the form of random projections to generate the feedback for its recurrent processing. Additionally, the internal weight parameters of the economy-SRU are strategically regularized in a group-wise manner to facilitate the proposed network in extracting meaningful predictive features that are highly time-localized to mimic real-world causal events. Extensive experiments are carried out to demonstrate that the proposed economy-SRU based time series prediction model outperforms the MLP, LSTM and attention-gated CNN-based time series models considered previously for inferring Granger causality.