AITopics | Performance Analysis

Collaborating Authors

Performance Analysis

News Overviews Instructional Materials AI-Alerts Classics

Why resampling outperforms reweighting for correcting sampling bias

arXiv.org Machine LearningSep-28-2020

A data set sampled from a certain population is biased if the subgroups of the population are sampled at proportions that are significantly different from their underlying proportions. Training machine learning models on biased data sets requires correction techniques to compensate for potential biases. We consider two commonly-used techniques, resampling and reweighting, that rebalance the proportions of the subgroups to maintain the desired objective function. Though statistically equivalent, it has been observed that reweighting outperforms resampling when combined with stochastic gradient algorithms. By analyzing illustrative examples, we explain the reason behind this phenomenon using tools from dynamical stability and stochastic asymptotics. We also present experiments from regression, classification, and off-policy prediction to demonstrate that this is a general phenomenon. We argue that it is imperative to consider the objective function design and the optimization algorithm together while addressing the sampling bias.

artificial intelligence, machine learning, proportion, (17 more...)

arXiv.org Machine Learning

2009.13447

Country:

North America > United States (0.14)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)

Add feedback

Balancing thermal comfort datasets: We GAN, but should we?

Quintana, Matias, Schiavon, Stefano, Tham, Kwok Wai, Miller, Clayton

arXiv.org Machine LearningSep-28-2020

Thermal comfort assessment for the built environment has become more available to analysts and researchers due to the proliferation of sensors and subjective feedback methods. These data can be used for modeling comfort behavior to support design and operations towards energy efficiency and well-being. By nature, occupant subjective feedback is imbalanced as indoor conditions are designed for comfort, and responses indicating otherwise are less common. This situation creates a scenario for the machine learning workflow where class balancing as a pre-processing step might be valuable for developing predictive thermal comfort classification models with high-performance. This paper investigates the various thermal comfort dataset class balancing techniques from the literature and proposes a modified conditional Generative Adversarial Network (GAN), $\texttt{comfortGAN}$, to address this imbalance scenario. These approaches are applied to three publicly available datasets, ranging from 30 and 67 participants to a global collection of thermal comfort datasets, with 1,474; 2,067; and 66,397 data points, respectively. This work finds that a classification model trained on a balanced dataset, comprised of real and generated samples from $\texttt{comfortGAN}$, has higher performance (increase between 4% and 17% in classification accuracy) than other augmentation methods tested. However, when classes representing discomfort are merged and reduced to three, better imbalanced performance is expected, and the additional increase in performance by $\texttt{comfortGAN}$ shrinks to 1-2%. These results illustrate that class balancing for thermal comfort modeling is beneficial using advanced techniques such as GANs, but its value is diminished in certain scenarios. A discussion is provided to assist potential users in determining which scenarios this process is useful and which method works best.

artificial intelligence, dataset, machine learning, (18 more...)

arXiv.org Machine Learning

2009.13154

Country:

Asia > Japan > Honshū > Kantō > Kanagawa Prefecture > Yokohama (0.05)
Asia > Singapore (0.05)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Industry: Construction & Engineering (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

Deep Learning-Based Automatic Detection of Poorly Positioned Mammograms to Minimize Patient Return Visits for Repeat Imaging: A Real-World Application

Gupta, Vikash, Taylor, Clayton, Bonnet, Sarah, Prevedello, Luciano M., Hawley, Jeffrey, White, Richard D, Flores, Mona G, Erdal, Barbaros Selnur

arXiv.org Artificial IntelligenceSep-28-2020

Screening mammograms are a routine imaging exam performed to detect breast cancer in its early stages to reduce morbidity and mortality attributed to this disease. In order to maximize the efficacy of breast cancer screening programs, proper mammographic positioning is paramount. Proper positioning ensures adequate visualization of breast tissue and is necessary for effective breast cancer detection. Therefore, breast-imaging radiologists must assess each mammogram for the adequacy of positioning before providing a final interpretation of the examination; this often necessitates return patient visits for additional imaging. In this paper, we propose a deep learning-algorithm method that mimics and automates this decision-making process to identify poorly positioned mammograms. Our objective for this algorithm is to assist mammography technologists in recognizing inadequately positioned mammograms real-time, improve the quality of mammographic positioning and performance, and ultimately reducing repeat visits for patients with initially inadequate imaging. The proposed model showed a true positive rate for detecting correct positioning of 91.35% in the mediolateral oblique view and 95.11% in the craniocaudal view. In addition to these results, we also present an automatically generated report which can aid the mammography technologist in taking corrective measures during the patient visit.

artificial intelligence, machine learning, positioning, (14 more...)

arXiv.org Artificial Intelligence

2009.1358

Country:

North America > United States > California > Santa Clara County > Santa Clara (0.14)
North America > United States > West Virginia > Monongalia County > Morgantown (0.04)
North America > United States > Ohio > Franklin County > Columbus (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

RENT -- Repeated Elastic Net Technique for Feature Selection

Jenul, Anna, Schrunner, Stefan, Liland, Kristian Hovde, Indahl, Ulf Geir, Futsaether, Cecilia Marie, Tomic, Oliver

arXiv.org Machine LearningSep-27-2020

In this study we present the RENT feature selection method for binary classification and regression problems. We compare the performance of RENT to a number of other state-of-the-art feature selection methods on eight datasets (six for binary classification and two for regression) to illustrate RENT's performance with regard to prediction and reduction of total number of features. At its core RENT trains an ensemble of unique models using regularized elastic net to select features. Each model in the ensemble is trained with a unique and randomly selected subset from the full training data. From these models one can acquire weight distributions for each feature that contain rich information on the stability of feature selection and from which several adjustable classification criteria may be defined. Moreover, we acquire distributions of class predictions for each sample across many models in the ensemble. Analysis of these distributions may provide useful insight into which samples are more difficult to classify correctly than others. Overall, results from the tested datasets show that RENT not only can compete on-par with the best performing feature selection methods in this study, but also provides valuable insights into the stability of feature selection and sample classification.

artificial intelligence, dataset, machine learning, (17 more...)

arXiv.org Machine Learning

2009.1278

Country:

North America > United States > Wisconsin (0.04)
North America > United States > Virginia > Arlington County > Arlington (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Norway > Eastern Norway > Oslo (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Add feedback

Programming Fairness in Algorithms

#artificialintelligenceSep-26-2020, 20:41:06 GMT

"Being good is easy, what is difficult is being just." "We need to defend the interests of those whom we've never met and never will." Note: This article is intended for a general audience to try and elucidate the complicated nature of unfairness in machine learning algorithms. As such, I have tried to explain concepts in an accessible way with minimal use of mathematics, in the hope that everyone can get something out of reading this. Supervised machine learning algorithms are inherently discriminatory. They are discriminatory in the sense that they use information embedded in the features of data to separate instances into distinct categories -- indeed, this is their designated purpose in life. This is reflected in the name for these algorithms which are often referred to as discriminative algorithms (splitting data into categories), in contrast to generative algorithms (generating data from a given category). When we use supervised machine learning, this "discrimination" is used as an aid to help us categorize our data into distinct categories within the data distribution, as illustrated below. Whilst this occurs when we apply discriminative algorithms -- such as support vector machines, forms of parametric regression (e.g.

algorithm, artificial intelligence, machine learning, (13 more...)

#artificialintelligence

Country: North America > United States (0.47)

Industry:

Law (1.00)
Government (0.94)
Information Technology (0.93)
Health & Medicine > Public Health (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.49)

Add feedback

Will AutoML software replace Data Scientists?

#artificialintelligenceSep-25-2020, 20:31:25 GMT

AutoML is a generic expression to indicate pieces of software that perform Machine Learning tasks automatically. Such pieces of software can be Python libraries like Auto-Sklearn or software programs like Data Robot. AutoML pieces of software replace all the boring steps that take more time to a Data Scientist's work. They actually make all the combinations of the several parameters of a pipeline (e.g. the blank filling values, scaling algorithm, model type, model hyperparameters) and select the best combination that maximizes some performance metrics (like RMSE or Area under the ROC Curve) in k-fold cross-validation using some search algorithm (like Grid or Random Search). They can really simplify the life of somebody that has to create a model from scratch and sometimes they explore combinations and scenarios that a Data Scientist may not have thought of.

artificial intelligence, machine learning, scientist, (11 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.56)

Add feedback

Fairness in Semi-supervised Learning: Unlabeled Data Help to Reduce Discrimination

Zhang, Tao, Zhu, Tianqing, Li, Jing, Han, Mengde, Zhou, Wanlei, Yu, Philip S.

arXiv.org Machine LearningSep-25-2020

A growing specter in the rise of machine learning is whether the decisions made by machine learning models are fair. While research is already underway to formalize a machine-learning concept of fairness and to design frameworks for building fair models with sacrifice in accuracy, most are geared toward either supervised or unsupervised learning. Yet two observations inspired us to wonder whether semi-supervised learning might be useful to solve discrimination problems. First, previous study showed that increasing the size of the training set may lead to a better trade-off between fairness and accuracy. Second, the most powerful models today require an enormous of data to train which, in practical terms, is likely possible from a combination of labeled and unlabeled data. Hence, in this paper, we present a framework of fair semi-supervised learning in the pre-processing phase, including pseudo labeling to predict labels for unlabeled data, a re-sampling method to obtain multiple fair datasets and lastly, ensemble learning to improve accuracy and decrease discrimination. A theoretical decomposition analysis of bias, variance and noise highlights the different sources of discrimination and the impact they have on fairness in semi-supervised learning. A set of experiments on real-world and synthetic datasets show that our method is able to use unlabeled data to achieve a better trade-off between accuracy and discrimination.

dataset, discrimination, unlabeled data, (16 more...)

arXiv.org Machine Learning

doi: 10.1109/TKDE.2020.3002567

2009.1204

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Asia > China > Hubei Province > Wuhan (0.04)
(8 more...)

Genre: Research Report > New Finding (0.87)

Industry:

Health & Medicine (0.68)
Education > Educational Setting > Higher Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Add feedback

AI, Protests, and Justice

#artificialintelligenceSep-24-2020, 01:05:28 GMT

Editor's Note: The use of face recognition technology in policing has been a long-standing subject of concern, even more-so now after the murder of George Floyd and the demonstrations that have followed. In this article, Mike Loukides, VP of Content Strategy at O'Reilly Media, reviews how companies and cities have addressed these concerns, as well as ways in which individuals can mitigate face recognition technology or even use it to increase accountability. We'd love to hear from you about what you think about this piece. Largely on the impetus of the Black Lives Matter movement, the public's response to the murder of George Floyd, and the subsequent demonstrations, we've seen increased concern about the use of facial identification in policing. First, in a highly publicized wave of announcements, IBM, Microsoft, and Amazon have announced that they will not sell face recognition technology to police forces.

artificial intelligence, face recognition technology, machine learning, (13 more...)

#artificialintelligence

Country:

North America > United States > Oregon > Multnomah County > Portland (0.05)
North America > United States > District of Columbia > Washington (0.05)
North America > United States > California > San Francisco County > San Francisco (0.05)
Asia > Middle East > Yemen (0.05)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.32)

Add feedback

Self-Weighted Robust LDA for Multiclass Classification with Edge Classes

Yan, Caixia, Chang, Xiaojun, Luo, Minnan, Zheng, Qinghua, Zhang, Xiaoqin, Li, Zhihui, Nie, Feiping

arXiv.org Machine LearningSep-24-2020

Linear discriminant analysis (LDA) is a popular technique to learn the most discriminative features for multi-class classification. A vast majority of existing LDA algorithms are prone to be dominated by the class with very large deviation from the others, i.e., edge class, which occurs frequently in multi-class classification. First, the existence of edge classes often makes the total mean biased in the calculation of between-class scatter matrix. Second, the exploitation of l2-norm based between-class distance criterion magnifies the extremely large distance corresponding to edge class. In this regard, a novel self-weighted robust LDA with l21-norm based pairwise between-class distance criterion, called SWRLDA, is proposed for multi-class classification especially with edge classes. SWRLDA can automatically avoid the optimal mean calculation and simultaneously learn adaptive weights for each class pair without setting any additional parameter. An efficient re-weighted algorithm is exploited to derive the global optimum of the challenging l21-norm maximization problem. The proposed SWRLDA is easy to implement, and converges fast in practice. Extensive experiments demonstrate that SWRLDA performs favorably against other compared methods on both synthetic and real-world datasets, while presenting superior computational efficiency in comparison with other techniques.

artificial intelligence, edge class, machine learning, (19 more...)

arXiv.org Machine Learning

2009.12362

Country:

Asia > China > Shaanxi Province > Xi'an (0.04)
Asia > China > Guangxi Province > Nanning (0.04)
North America > United States > California > Orange County > Irvine (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

BreachRadar: Automatic Detection of Points-of-Compromise

Araujo, Miguel, Almeida, Miguel, Ferreira, Jaime, Silva, Luis, Bizarro, Pedro

arXiv.org Machine LearningSep-24-2020

Bank transaction fraud results in over $13B annual losses for banks, merchants, and card holders worldwide. Much of this fraud starts with a Point-of-Compromise (a data breach or a skimming operation) where credit and debit card digital information is stolen, resold, and later used to perform fraud. We introduce this problem and present an automatic Points-of-Compromise (POC) detection procedure. BreachRadar is a distributed alternating algorithm that assigns a probability of being compromised to the different possible locations. We implement this method using Apache Spark and show its linear scalability in the number of machines and transactions. BreachRadar is applied to two datasets with billions of real transaction records and fraud labels where we provide multiple examples of real Points-of-Compromise we are able to detect. We further show the effectiveness of our method when injecting Points-of-Compromise in one of these datasets, simultaneously achieving over 90% precision and recall when only 10% of the cards have been victims of fraud.

artificial intelligence, machine learning, probability, (17 more...)

arXiv.org Machine Learning

doi: 10.1137/1.9781611974973.63

2009.11751

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > Portugal (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Banking & Finance (1.00)
Law Enforcement & Public Safety > Fraud (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Security & Privacy (0.89)
Information Technology > e-Commerce (0.88)

Add feedback