Goto

Collaborating Authors

 Performance Analysis


Personalized 3D printed models in optimizing cardiac computed tomography imaging protocols

#artificialintelligence

Patient-specific or personalised 3D printed models created from cardiac imaging data can be applied to research areas beyond the current domains of 3D printing in cardiovascular disease, which mainly focuses on pre-surgical planning and simulation, medical education and training, as well as doctor-patient communication. These areas represent the most commonly used applications of 3D printed models, in particular, use of 3D printing models on congenital heart disease is a very promising field with sufficient evidence provided by randomised controlled trials. Further, 3D printed heart models are shown to play an important role in guiding patient's surgical planning and treatment as reported by single and multi-center studies. In addition to these reported applications, the realistic physical models serve as a valuable tool in studying appropriate cardiac CT protocols for the purpose of optimizing CT scanning techniques. Zhonghua Sun, a professor and medical imaging researcher from Curtin University, Australia has been in search of new ways to acquire accurate and efficient medical images.


Using AI to fight hand-crafted Business Email Compromise

#artificialintelligence

Younghoo Lee is a Senior Data Scientist at Sophos. Together with Joshua Saxe, Sophos Chief Scientist, he recently presented these findings at DEFCON 28 AI Village. Business Email Compromise (BEC), is a form of targeted phishing where attackers disguise themselves as senior executives to dupe employees into doing something they absolutely shouldn't, like wire money. It started out as an evolution of the fraudulent international money transfer scams, and the messages were often riddled with poor punctuation and grammar, misspelt names and more that made them relatively easy to identify. Yet they still made money.


Entropia: A Family of Entropy-Based Conformance Checking Measures for Process Mining

arXiv.org Artificial Intelligence

This paper presents a command-line tool, called Entropia, that implements a family of conformance checking measures for process mining founded on the notion of entropy from information theory. The measures allow quantifying classical non-deterministic and stochastic precision and recall quality criteria for process models automatically discovered from traces executed by IT-systems and recorded in their event logs. A process model has "good" precision with respect to the log it was discovered from if it does not encode many traces that are not part of the log, and has "good" recall if it encodes most of the traces from the log. By definition, the measures possess useful properties and can often be computed fast.


Beyond Individual and Group Fairness

arXiv.org Machine Learning

Learning algorithms trained on large amounts of data are increasingly adopted in applications with significant individual and social consequences such as selecting loan applicants, filtering resumes of job applicants, estimating the likelihood for a defendant to commit future crimes, or deciding where to deploy police officers. Analyzing the risk of bias in these systems is therefore crucial. In fact, that is also critical for seemingly less socially consequential applications such as ads placement, recommendation systems, speech recognition, and many other common applications of machine learning. Such biases can appear due to the way the training data has been collected, due to an improper choice of the loss function optimized, or as a result of some other algorithmic choices.


Counterfactual-based minority oversampling for imbalanced classification

arXiv.org Machine Learning

A key challenge of oversampling in imbalanced classification is that the generation of new minority samples often neglects the usage of majority classes, resulting in most new minority sampling spreading the whole minority space. In view of this, we present a new oversampling framework based on the counterfactual theory. Our framework introduces a counterfactual objective by leveraging the rich inherent information of majority classes and explicitly perturbing majority samples to generate new samples in the territory of minority space. It can be analytically shown that the new minority samples satisfy the minimum inversion, and therefore most of them locate near the decision boundary. Empirical evaluations on benchmark datasets suggest that our approach significantly outperforms the state-of-the-art methods.


Evaluating Machine Learning Models for the Fast Identification of Contingency Cases

arXiv.org Artificial Intelligence

Fast approximations of power flow results are beneficial in power system planning and live operation. In planning, millions of power flow calculations are necessary if multiple years, different control strategies or contingency policies are to be considered. In live operation, grid operators must assess if grid states comply with contingency requirements in a short time. In this paper, we compare regression and classification methods to either predict multi-variable results, e.g. bus voltage magnitudes and line loadings, or binary classifications of time steps to identify critical loading situations. We test the methods on three realistic power systems based on time series in 15 min and 5 min resolution of one year. We compare different machine learning models, such as multilayer perceptrons (MLPs), decision trees, k-nearest neighbours, gradient boosting, and evaluate the required training time and prediction times as well as the prediction errors. We additionally determine the amount of training data needed for each method and show results, including the approximation of untrained curtailment of generation. Regarding the compared methods, we identified the MLPs as most suitable for the task. The MLP-based models can predict critical situations with an accuracy of 97-98 % and a very low number of false negative predictions of 0.0-0.64 %.


Daisy's Theory of Risk - Daisy Intelligence

#artificialintelligence

Detect fraud and determine risk by analyzing 100% of your claims. Fraud is becoming more pervasive in the insurance industry. Traditional approaches to fraud-detection rely on rules-based alerts, which are ineffective at dealing with social networks. Data and patterns also continue to change dynamically, and the risk continues to grow as processes moves online and in real time. Daisy's proprietary Theory of Risk measures the causal relationships between all factors and the ripple effects that impact a business decision.


Demographics Should Not Be the Reason of Toxicity: Mitigating Discrimination in Text Classifications with Instance Weighting

arXiv.org Machine Learning

With the recent proliferation of the use of text classifications, researchers have found that there are certain unintended biases in text classification datasets. For example, texts containing some demographic identity-terms (e.g., "gay", "black") are more likely to be abusive in existing abusive language detection datasets. As a result, models trained with these datasets may consider sentences like "She makes me happy to be gay" as abusive simply because of the word "gay." In this paper, we formalize the unintended biases in text classification datasets as a kind of selection bias from the non-discrimination distribution to the discrimination distribution. Based on this formalization, we further propose a model-agnostic debiasing training framework by recovering the non-discrimination distribution using instance weighting, which does not require any extra resources or annotations apart from a pre-defined set of demographic identity-terms. Experiments demonstrate that our method can effectively alleviate the impacts of the unintended biases without significantly hurting models' generalization ability.


Automating the assessment of biofouling in images using expert agreement as a gold standard

arXiv.org Machine Learning

Biofouling is the accumulation of organisms on surfaces immersed in water. It is of particular concern to the international shipping industry because fouling increases the drag on vessels as they move through the water, resulting in higher fuel costs, and presents a biosecurity risk by providing a pathway for marine non-indigenous species (NIS) to establish in new areas. There is growing interest within jurisdictions to strengthen biofouling risk-management regulations, but it is expensive to conduct in-water inspections and assess the collected data to determine the biofouling state of vessel hulls. Machine learning is well suited to tackle the latter challenge, and here we apply so-called deep learning to automate the classification of images from in-water inspections for the presence and severity of biofouling. We combined images collected from in-water surveys conducted by the Australian Department of Agriculture, Water and the Environment, the New Zealand Ministry for Primary Industries and the California State Lands Commission, and annotated them using the Amazon Mechanical Turk (MTurk) crowdsourcing platform. We compared the annotations from three biofouling experts on a 120-sample subset of these images, and found that for two tasks, identifying images containing fouling, and identifying images containing heavy fouling, they showed 89% agreement (95% CI: 87-92%). It was found that the MTurk labelling approach achieved similar agreement with experts, which we defined as performing at most 5% worse than experts (p=0.004-0.020). Our deep learning model trained with the MTurk annotations also showed reasonable performance in comparison to expert agreement, although at a lower significance level (p=0.071-0.093). We also demonstrate that significantly better performance than expert agreement can be achieved if a classifier with high recall or precision was required.


Exact Tests for Offline Changepoint Detection in Multichannel Binary and Count Data with Application to Networks

arXiv.org Machine Learning

We consider offline detection of a single changepoint in binary and count time-series. We compare exact tests based on the cumulative sum (CUSUM) and the likelihood ratio (LR) statistics, and a new proposal that combines exact two-sample conditional tests with multiplicity correction, against standard asymptotic tests based on the Brownian bridge approximation to the CUSUM statistic. We see empirically that the exact tests are much more powerful in situations where normal approximations driving asymptotic tests are not trustworthy: (i) small sample settings; (ii) sparse parametric settings; (iii) time-series with changepoint near the boundary. We also consider a multichannel version of the problem, where channels can have different changepoints. Controlling the False Discovery Rate (FDR), we simultaneously detect changes in multiple channels. This "local" approach is shown to be more advantageous than multivariate global testing approaches when the number of channels with changepoints is much smaller than the total number of channels. As a natural application, we consider network-valued time-series and use our approach with (a) edges as binary channels and (b) node-degrees or other local subgraph statistics as count channels. The local testing approach is seen to be much more informative than global network changepoint algorithms.