AITopics | Accuracy

Entropia: A Family of Entropy-Based Conformance Checking Measures for Process Mining

Polyvyanyy, Artem, Alkhammash, Hanan, Di Ciccio, Claudio, García-Bañuelos, Luciano, Kalenkova, Anna, Leemans, Sander J. J., Mendling, Jan, Moffat, Alistair, Weidlich, Matthias

arXiv.org Artificial IntelligenceAug-21-2020

This paper presents a command-line tool, called Entropia, that implements a family of conformance checking measures for process mining founded on the notion of entropy from information theory. The measures allow quantifying classical non-deterministic and stochastic precision and recall quality criteria for process models automatically discovered from traces executed by IT-systems and recorded in their event logs. A process model has "good" precision with respect to the log it was discovered from if it does not encode many traces that are not part of the log, and has "good" recall if it encodes most of the traces from the log. By definition, the measures possess useful properties and can often be computed fast.

data mining, Entropia, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2008.09558

Country: Oceania > Australia > Queensland (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Mining (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.38)

Add feedback

Beyond Individual and Group Fairness

Awasthi, Pranjal, Cortes, Corinna, Mansour, Yishay, Mohri, Mehryar

arXiv.org Machine LearningAug-21-2020

Learning algorithms trained on large amounts of data are increasingly adopted in applications with significant individual and social consequences such as selecting loan applicants, filtering resumes of job applicants, estimating the likelihood for a defendant to commit future crimes, or deciding where to deploy police officers. Analyzing the risk of bias in these systems is therefore crucial. In fact, that is also critical for seemingly less socially consequential applications such as ads placement, recommendation systems, speech recognition, and many other common applications of machine learning. Such biases can appear due to the way the training data has been collected, due to an improper choice of the loss function optimized, or as a result of some other algorithmic choices.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

2008.0949

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)

Genre: Research Report (0.82)

Industry: Law (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

Counterfactual-based minority oversampling for imbalanced classification

Luo, Hao, Liu, Li

arXiv.org Machine LearningAug-21-2020

A key challenge of oversampling in imbalanced classification is that the generation of new minority samples often neglects the usage of majority classes, resulting in most new minority sampling spreading the whole minority space. In view of this, we present a new oversampling framework based on the counterfactual theory. Our framework introduces a counterfactual objective by leveraging the rich inherent information of majority classes and explicitly perturbing majority samples to generate new samples in the territory of minority space. It can be analytically shown that the new minority samples satisfy the minimum inversion, and therefore most of them locate near the decision boundary. Empirical evaluations on benchmark datasets suggest that our approach significantly outperforms the state-of-the-art methods.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Machine Learning

2008.09488

Country:

Asia > China > Chongqing Province > Chongqing (0.04)
South America > Paraguay > Asunción > Asunción (0.04)
North America > United States > Wisconsin (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Add feedback

Evaluating Machine Learning Models for the Fast Identification of Contingency Cases

Schaefer, Florian, Menke, Jan-Hendrik, Braun, Martin

arXiv.org Artificial IntelligenceAug-21-2020

Fast approximations of power flow results are beneficial in power system planning and live operation. In planning, millions of power flow calculations are necessary if multiple years, different control strategies or contingency policies are to be considered. In live operation, grid operators must assess if grid states comply with contingency requirements in a short time. In this paper, we compare regression and classification methods to either predict multi-variable results, e.g. bus voltage magnitudes and line loadings, or binary classifications of time steps to identify critical loading situations. We test the methods on three realistic power systems based on time series in 15 min and 5 min resolution of one year. We compare different machine learning models, such as multilayer perceptrons (MLPs), decision trees, k-nearest neighbours, gradient boosting, and evaluate the required training time and prediction times as well as the prediction errors. We additionally determine the amount of training data needed for each method and show results, including the approximation of untrained curtailment of generation. Regarding the compared methods, we identified the MLPs as most suitable for the task. The MLP-based models can predict critical situations with an accuracy of 97-98 % and a very low number of false negative predictions of 0.0-0.64 %.

artificial intelligence, machine learning, time step, (17 more...)

arXiv.org Artificial Intelligence

2008.09384

Country: Europe (0.28)

Genre: Research Report (1.00)

Industry: Energy > Power Industry (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Daisy's Theory of Risk - Daisy Intelligence

#artificialintelligenceAug-20-2020, 18:10:55 GMT

Detect fraud and determine risk by analyzing 100% of your claims. Fraud is becoming more pervasive in the insurance industry. Traditional approaches to fraud-detection rely on rules-based alerts, which are ineffective at dealing with social networks. Data and patterns also continue to change dynamically, and the risk continues to grow as processes moves online and in real time. Daisy's proprietary Theory of Risk measures the causal relationships between all factors and the ripple effects that impact a business decision.

artificial intelligence, daisy intelligence, machine learning, (5 more...)

#artificialintelligence

Industry:

Law Enforcement & Public Safety > Fraud (0.97)
Information Technology (0.76)
Banking & Finance > Insurance (0.59)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.39)

Add feedback

Demographics Should Not Be the Reason of Toxicity: Mitigating Discrimination in Text Classifications with Instance Weighting

Zhang, Guanhua, Bai, Bing, Zhang, Junqi, Bai, Kun, Zhu, Conghui, Zhao, Tiejun

arXiv.org Machine LearningAug-20-2020

With the recent proliferation of the use of text classifications, researchers have found that there are certain unintended biases in text classification datasets. For example, texts containing some demographic identity-terms (e.g., "gay", "black") are more likely to be abusive in existing abusive language detection datasets. As a result, models trained with these datasets may consider sentences like "She makes me happy to be gay" as abusive simply because of the word "gay." In this paper, we formalize the unintended biases in text classification datasets as a kind of selection bias from the non-discrimination distribution to the discrimination distribution. Based on this formalization, we further propose a model-agnostic debiasing training framework by recovering the non-discrimination distribution using instance weighting, which does not require any extra resources or annotations apart from a pre-defined set of demographic identity-terms. Experiments demonstrate that our method can effectively alleviate the impacts of the unintended biases without significantly hurting models' generalization ability.

dataset, proceedings, unintended bias, (14 more...)

arXiv.org Machine Learning

2004.14088

Country: Asia > China > Heilongjiang Province > Harbin (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

Exact Tests for Offline Changepoint Detection in Multichannel Binary and Count Data with Application to Networks

De, Shyamal K., Mukherjee, Soumendu Sundar

arXiv.org Machine LearningAug-20-2020

We consider offline detection of a single changepoint in binary and count time-series. We compare exact tests based on the cumulative sum (CUSUM) and the likelihood ratio (LR) statistics, and a new proposal that combines exact two-sample conditional tests with multiplicity correction, against standard asymptotic tests based on the Brownian bridge approximation to the CUSUM statistic. We see empirically that the exact tests are much more powerful in situations where normal approximations driving asymptotic tests are not trustworthy: (i) small sample settings; (ii) sparse parametric settings; (iii) time-series with changepoint near the boundary. We also consider a multichannel version of the problem, where channels can have different changepoints. Controlling the False Discovery Rate (FDR), we simultaneously detect changes in multiple channels. This "local" approach is shown to be more advantageous than multivariate global testing approaches when the number of channels with changepoints is much smaller than the total number of channels. As a natural application, we consider network-valued time-series and use our approach with (a) edges as binary channels and (b) node-degrees or other local subgraph statistics as count channels. The local testing approach is seen to be much more informative than global network changepoint algorithms.

artificial intelligence, machine learning, mukherjee exact changepoint detection, (14 more...)

arXiv.org Machine Learning

2008.09083

Country:

North America > United States > Arkansas (0.04)
North America > United States > Arizona (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Asia > India > West Bengal > Kolkata (0.04)

Genre: Research Report (0.82)

Industry: Government (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

The foundations of cost-sensitive causal classification

Verbeke, Wouter, Olaya, Diego, Berrevoets, Jeroen, Maldonado, Sebastián

arXiv.org Artificial IntelligenceAug-19-2020

Classification is a well-studied machine learning task which concerns the assignment of instances to a set of outcomes. Classification models support the optimization of managerial decision-making across a variety of operational business processes. For instance, customer churn prediction models are adopted to increase the efficiency of retention campaigns by optimizing the selection of customers that are to be targeted. Cost-sensitive and causal classification methods have independently been proposed to improve the performance of classification models. The former considers the benefits and costs of correct and incorrect classifications, such as the benefit of a retained customer, whereas the latter estimates the causal effect of an action, such as a retention campaign, on the outcome of interest. This study integrates cost-sensitive and causal classification by elaborating a unifying evaluation framework. The framework encompasses a range of existing and novel performance measures for evaluating both causal and conventional classification models in a cost-sensitive as well as a cost-insensitive manner. We proof that conventional classification is a specific case of causal classification in terms of a range of performance measures when the number of actions is equal to one. The framework is shown to instantiate to application-specific cost-sensitive performance measures that have been recently proposed for evaluating customer retention and response uplift models, and allows to maximize profitability when adopting a causal classification model for optimizing decision-making. The proposed framework paves the way toward the development of cost-sensitive causal learning methods and opens a range of opportunities for improving data-driven business decision-making.

artificial intelligence, classification model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2007.12582

Country:

Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
South America > Uruguay > Maldonado > Maldonado (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)

Genre: Research Report (0.90)

Industry: Marketing (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.96)

Add feedback

Generalizing Fault Detection Against Domain Shifts Using Stratification-Aware Cross-Validation

Tan, Yingshui, Jin, Baihong, Cui, Qiushi, Yue, Xiangyu, Vincentelli, Alberto Sangiovanni

arXiv.org Machine LearningAug-19-2020

Incipient anomalies present milder symptoms compared to severe ones, and are more difficult to detect and diagnose due to their close resemblance to normal operating conditions. The lack of incipient anomaly examples in the training data can pose severe risks to anomaly detection methods that are built upon Machine Learning (ML) techniques, because these anomalies can be easily mistaken as normal operating conditions. To address this challenge, we propose to utilize the uncertainty information available from ensemble learning to identify potential misclassified incipient anomalies. We show in this paper that ensemble learning methods can give improved performance on incipient anomalies and identify common pitfalls in these models through extensive experiments on two real-world datasets. Then, we discuss how to design more effective ensemble models for detecting incipient anomalies.

artificial intelligence, dataset, machine learning, (13 more...)

arXiv.org Machine Learning

2008.08713

Country:

North America > United States > California > Alameda County > Berkeley (0.15)
Asia > Singapore (0.04)
North America > United States > Arizona > Maricopa County > Tempe (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.93)

Industry:

Energy > Power Industry (1.00)
Construction & Engineering > HVAC (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.42)

Add feedback

Using Ensemble Classifiers to Detect Incipient Anomalies

Jin, Baihong, Tan, Yingshui, Liu, Albert, Yue, Xiangyu, Chen, Yuxin, Vincentelli, Alberto Sangiovanni

arXiv.org Machine LearningAug-19-2020

Incipient anomalies present milder symptoms compared to severe ones, and are more difficult to detect and diagnose due to their close resemblance to normal operating conditions. The lack of incipient anomaly examples in the training data can pose severe risks to anomaly detection methods that are built upon Machine Learning (ML) techniques, because these anomalies can be easily mistaken as normal operating conditions. To address this challenge, we propose to utilize the uncertainty information available from ensemble learning to identify potential misclassified incipient anomalies. We show in this paper that ensemble learning methods can give improved performance on incipient anomalies and identify common pitfalls in these models through extensive experiments on two real-world datasets. Then, we discuss how to design more effective ensemble models for detecting incipient anomalies.

anomaly, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

2008.0871

Country:

North America > United States > California > Alameda County > Berkeley (0.05)
Asia > Singapore (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(5 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.69)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
(3 more...)

Add feedback

Filters

Collaborating Authors

Accuracy

Entropia: A Family of Entropy-Based Conformance Checking Measures for Process Mining

Beyond Individual and Group Fairness

Counterfactual-based minority oversampling for imbalanced classification

Evaluating Machine Learning Models for the Fast Identification of Contingency Cases

Daisy's Theory of Risk - Daisy Intelligence

Demographics Should Not Be the Reason of Toxicity: Mitigating Discrimination in Text Classifications with Instance Weighting

Exact Tests for Offline Changepoint Detection in Multichannel Binary and Count Data with Application to Networks

The foundations of cost-sensitive causal classification

Generalizing Fault Detection Against Domain Shifts Using Stratification-Aware Cross-Validation

Using Ensemble Classifiers to Detect Incipient Anomalies