AITopics

2006.00567

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > California (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

arXiv.org Artificial IntelligenceOct-31-2020

Fair Classification with Group-Dependent Label Noise

Wang, Jialu, Liu, Yang, Levy, Caleb

This work examines how to train fair classifiers in settings where training labels are corrupted with random noise, and where the error rates of corruption depend both on the label class and on the membership function for a protected subgroup. Heterogeneous label noise models systematic biases towards particular groups when generating annotations. We begin by presenting analytical results which show that naively imposing parity constraints on demographic disparity measures, without accounting for heterogeneous and group-dependent error rates, can decrease both the accuracy and the fairness of the resulting classifier. Our experiments demonstrate these issues arise in practice as well. We address these problems by performing empirical risk minimization with carefully defined surrogate loss functions and surrogate constraints that help avoid the pitfalls introduced by heterogeneous label noise. We provide both theoretical and empirical justifications for the efficacy of our methods. We view our results as an important example of how imposing fairness on biased data sets without proper care can do at least as much harm as it does good.

constraint, data mining, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2011.00379

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Santa Cruz County > Santa Cruz (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
(5 more...)

Genre: Research Report > New Finding (0.66)

Industry:

Health & Medicine (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

#artificialintelligenceOct-30-2020, 23:45:30 GMT

The Beginners' Guide to the ROC Curve and AUC

In the previous article here, you have understood classification evaluation metrics such as Accuracy, Precision, Recall, F1-Score, etc. In this article, we will go through another important evaluation metric AUC-ROC score. ROC curve (Receiver Operating Characteristic curve) is a graph showing the performance of a classification model at different probability thresholds. ROC graph is created by plotting FPR Vs. TPR where FPR (False Positive Rate) is plotted on the x-axis and TPR (True Positive Rate) is plotted on the y-axis for different probability threshold values ranging from 0.0 to 1.0.

artificial intelligence, machine learning, threshold, (14 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

#artificialintelligenceOct-30-2020

Dealing with Imbalanced Data in Machine Learning - KDnuggets

As an ML engineer or data scientist, sometimes you inevitably find yourself in a situation where you have hundreds of records for one class label and thousands of records for another class label. Upon training your model you obtain an accuracy above 90%. You then realize that the model is predicting everything as if it's in the class with the majority of records. Excellent examples of this are fraud detection problems and churn prediction problems, where the majority of the records are in the negative class. What do you do in such a scenario?

artificial intelligence, imbalanced data, machine learning, (14 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Mukeri, Amir, Shaikh, Habibullah, Gaikwad, D. P.

Financial Data Analysis Using Expert Bayesian Framework For Bankruptcy Prediction

arXiv.org Artificial IntelligenceOct-30-2020

In recent years, bankruptcy forecasting has gained lot of attention from researchers as well as practitioners in the field of financial risk management. For bankruptcy prediction, various approaches proposed in the past and currently in practice relies on accounting ratios and using statistical modeling or machine learning methods. These models have had varying degrees of successes. Models such as Linear Discriminant Analysis or Artificial Neural Network employ discriminative classification techniques. They lack explicit provision to include prior expert knowledge. In this paper, we propose another route of generative modeling using Expert Bayesian framework. The biggest advantage of the proposed framework is an explicit inclusion of expert judgment in the modeling process. Also the proposed methodology provides a way to quantify uncertainty in prediction. As a result the model built using Bayesian framework is highly flexible, interpretable and intuitive in nature. The proposed approach is well suited for highly regulated or safety critical applications such as in finance or in medical diagnosis. In such cases accuracy in the prediction is not the only concern for decision makers. Decision makers and other stakeholders are also interested in uncertainty in the prediction as well as interpretability of the model. We empirically demonstrate these benefits of proposed framework on real world dataset using Stan, a probabilistic programming language. We found that the proposed model is either comparable or superior to the other existing methods. Also resulting model has much less False Positive Rate compared to many existing state of the art methods. The corresponding R code for the experiments is available at Github repository.

depreciation, liability, probability, (15 more...)

arXiv.org Artificial Intelligence

2010.13892

Country:

Asia > India > Maharashtra > Pune (0.05)
North America > United States > District of Columbia > Washington (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Banking & Finance > Credit (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
(2 more...)

Nguyen, Timothy, Chen, Zhourung, Lee, Jaehoon

Dataset Meta-Learning from Kernel Ridge-Regression

One of the most fundamental aspects of any machine learning algorithm is the training data used by the algorithm. We introduce the novel concept of ɛ- approximation of datasets, obtaining datasets which are much smaller than or are significant corruptions of the original training data while maintaining similar model performance. We introduce a meta-learning algorithm called Kernel Inducing Points (KIP) for obtaining such remarkable datasets, inspired by the recent developments in the correspondence between infinitely-wide neural networks and kernel ridge-regression (KRR). For KRR tasks, we demonstrate that KIP can compress datasets by one or two orders of magnitude, significantly improving previous dataset distillation and subset selection methods while obtaining state of the art results for MNIST and CIFAR-10 classification. Furthermore, our KIP -learned datasets are transferable to the training of finite-width neural networks even beyond the lazy-training regime, which leads to state of the art results for neural network dataset distillation with potential applications to privacy-preservation. Datasets are a pivotal component in any machine learning task. Typically, a machine learning problem regards a dataset as given and uses it to train a model according to some specific objective. In this work, we depart from the traditional paradigm by instead optimizing a dataset with respect to a learning objective, from which the resulting dataset can be used in a range of downstream learning tasks. Our work is directly motivated by several challenges in existing learning methods. Kernel methods or instance-based learning (Vinyals et al., 2016; Snell et al., 2017; Kaya & Bilge, 2019) in general require a support dataset to be deployed at inference time. Achieving good prediction accuracy typically requires having a large support set, which inevitably increases both memory footprint and latency at inference time--the scalability issue. It can also raise privacy concerns when deploying a support set of original examples, e.g., distributing raw images to user devices. Additional challenges to scalability include, for instance, the desire for rapid hyper-parameter search (Shleifer & Prokop, 2019) and minimizing the resources consumed when replaying data for continual learning (Borsos et al., 2020). A valuable contribution to all these problems would be to find surrogate datasets that can mitigate the challenges which occur for naturally occurring datasets without a significant sacrifice in performance.

artificial intelligence, dataset, machine learning, (16 more...)

2011.0005

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (0.34)
Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.70)

van Loon, Wouter, Fokkema, Marjolein, Szabo, Botond, de Rooij, Mark

View selection in multi-view stacking: Choosing the meta-learner

Multi-view stacking is a framework for combining information from different views (i.e. different feature sets) describing the same set of objects. In this framework, a base-learner algorithm is trained on each view separately, and their predictions are then combined by a meta-learner algorithm. In a previous study, stacked penalized logistic regression, a special case of multi-view stacking, has been shown to be useful in identifying which views are most important for prediction. In this article we expand this research by considering seven different algorithms to use as the meta-learner, and evaluating their view selection and classification performance in simulations and two applications on real gene-expression data sets. Our results suggest that if both view selection and classification accuracy are important to the research at hand, then the nonnegative lasso, nonnegative adaptive lasso and nonnegative elastic net are suitable meta-learners. Exactly which among these three is to be preferred depends on the research context. The remaining four meta-learners, namely nonnegative ridge regression, nonnegative forward selection, stability selection and the interpolating predictor, show little advantages in order to be preferred over the other three.

artificial intelligence, machine learning, selection, (15 more...)

2010.16271

Country:

Europe > Austria > Vienna (0.14)
North America > United States > New York > New York County > New York City (0.04)
Europe > Netherlands > South Holland > Leiden (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Amiridi, Magda, Kargas, Nikos, Sidiropoulos, Nicholas D.

Information-theoretic Feature Selection via Tensor Decomposition and Submodularity

Feature selection by maximizing high-order mutual information between the selected feature vector and a target variable is the gold standard in terms of selecting the best subset of relevant features that maximizes the performance of prediction models. However, such an approach typically requires knowledge of the multivariate probability distribution of all features and the target, and involves a challenging combinatorial optimization problem. Recent work has shown that any joint Probability Mass Function (PMF) can be represented as a naive Bayes model, via Canonical Polyadic (tensor rank) Decomposition. In this paper, we introduce a low-rank tensor model of the joint PMF of all variables and indirect targeting as a way of mitigating complexity and maximizing the classification performance for a given number of features. Through low-rank modeling of the joint PMF, it is possible to circumvent the curse of dimensionality by learning principal components of the joint distribution. By indirectly aiming to predict the latent variable of the naive Bayes model instead of the original target variable, it is possible to formulate the feature selection problem as maximization of a monotone submodular function subject to a cardinality constraint - which can be tackled using a greedy algorithm that comes with performance guarantees. Numerical experiments with several standard datasets suggest that the proposed approach compares favorably to the state-of-art for this important problem.

artificial intelligence, machine learning, selection, (16 more...)

2010.16181

Country:

North America > United States > Virginia (0.05)
North America > United States > Minnesota (0.04)
North America > Canada > Quebec (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.55)

Bartlett, Peter L., Long, Philip M.

Failures of model-dependent generalization bounds for least-norm interpolation

Deep learning methodology has revealed some striking deficiencies of classical statistical learning theory: large neural networks, trained to zero empirical risk on noisy training data, have good predictive accuracy on independent test data. These methods are overfitting (that is, fitting to the training data better than the noise should allow), but the overfitting is benign (that is, prediction performance is good). It is an important open problem to understand why this is possible. The presence of noise is key to why the success of interpolating algorithms is mysterious. Generalization of algorithms that produce a perfect fit in the absence of noise has been studied for decades (see [Haussler, 1992] and its references). A number of recent papers have provided generalization bounds for interpolating algorithms in the absence of noise, either for deep networks or in abstract frameworks motivated by deep networks [Li and Liang, 2018, Arora et al., 2019, Cao and Gu, 2019, Feldman, 2020]. The generalization bounds in these papers either do not hold or become vacuous in the presence of noise: Assumption A1 in [Li and Liang, 2018] rules out noisy data; the data-dependent bound in Arora et al. [2019, Theorem 5.1] becomes vacuous when independent noise is added to the y

artificial intelligence, generalization, machine learning, (18 more...)

2010.08479

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

He, Yang-Hui, Lee, Kyu-Hwan, Oliver, Thomas

Machine-Learning the Sato--Tate Conjecture

We apply some of the latest techniques from machine-learning to the arithmetic of hyperelliptic curves. More precisely we show that, with impressive accuracy and confidence (between 99 and 100 percent precision), and in very short time (matter of seconds on an ordinary laptop), a Bayesian classifier can distinguish between Sato-Tate groups given a small number of Euler factors for the L-function. Our observations are in keeping with the Sato-Tate conjecture for curves of low genus. For elliptic curves, this amounts to distinguishing generic curves (with Sato-Tate group SU(2)) from those with complex multiplication. In genus 2, a principal component analysis is observed to separate the generic Sato-Tate group USp(4) from the non-generic groups. Furthermore in this case, for which there are many more non-generic possibilities than in the case of elliptic curves, we demonstrate an accurate characterisation of several Sato-Tate groups with the same identity component. Throughout, our observations are verified using known results from the literature and the data available in the LMFDB. The results in this paper suggest that a machine can be trained to learn the Sato-Tate distributions and may be able to classify curves much more efficiently than the methods available in the literature.

artificial intelligence, machine learning, sato-tate group, (18 more...)

2010.01213

Country:

North America > United States > Connecticut > Tolland County > Storrs (0.14)
North America > United States > Illinois > Champaign County > Champaign (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.70)