AITopics | Performance Analysis

Collaborating Authors

Performance Analysis

News Overviews Instructional Materials AI-Alerts Classics

A Modified Bayesian Optimization based Hyper-Parameter Tuning Approach for Extreme Gradient Boosting

arXiv.org Machine LearningApr-10-2020

It is already reported in the literature that the performance of a machine learning algorithm is greatly impacted by performing proper Hyper-Parameter optimization. One of the ways to perform Hyper-Parameter optimization is by manual search but that is time consuming. Some of the common approaches for performing Hyper-Parameter optimization are Grid search Random search and Bayesian optimization using Hyperopt. In this paper, we propose a brand new approach for hyperparameter improvement i.e. Randomized-Hyperopt and then tune the hyperparameters of the XGBoost i.e. the Extreme Gradient Boosting algorithm on ten datasets by applying Random search, Randomized-Hyperopt, Hyperopt and Grid Search. The performances of each of these four techniques were compared by taking both the prediction accuracy and the execution time into consideration. We find that the Randomized-Hyperopt performs better than the other three conventional methods for hyper-paramter optimization of XGBoost.

dataset, hyperparameter, optimization, (13 more...)

arXiv.org Machine Learning

doi: 10.1109/ICInPro47689.2019.9092025

2004.05041

Country:

Asia > India > Karnataka > Bengaluru (0.06)
Europe > France (0.04)
Asia > Taiwan (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.71)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
(2 more...)

Add feedback

Visual Spoofing in content based spam detection

Sokolov, Mark, Olufowobi, Kehinde, Herndon, Nic

arXiv.org Machine LearningApr-10-2020

"Subject: Please send money Body: I am so distraught. I thought i could reach out to you to help me out. I came down to United Kingdom for a short vacation unfortunately i was mugged at the park of the hotel i stayed, all cash, credit card and cell phone was stolen from me but luckily for me i still have my passport with me. I've been to the embassy and to the police here but they're not helping issues at all and, my flight leaves in few hours time from now but. I am having problems settling the hotel bills and the hotel manager won't let me leave until i settle my hotel bills. I'm freaked out at the moment." As expected, this email, which definitely seems to be spam, ends up in the junk email folder. However, in this paper we show that visual spoofing achieved by substituting some confusables (characters that look similar) into the above email text will enable the same email to bypass the spam filter. We also propose ways to address this loophole.

algorithm, classifier, email, (16 more...)

arXiv.org Machine Learning

2004.05265

Country:

Europe > United Kingdom (0.24)
Asia > Middle East > Iran (0.15)
North America > United States > North Carolina > Pitt County > Greenville (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Security & Privacy > Spam Filtering (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
(5 more...)

Add feedback

Multimodal Categorization of Crisis Events in Social Media

Abavisani, Mahdi, Wu, Liwei, Hu, Shengli, Tetreault, Joel, Jaimes, Alejandro

arXiv.org Artificial IntelligenceApr-10-2020

Recent developments in image classification and natural language processing, coupled with the rapid growth in social media usage, have enabled fundamental advances in detecting breaking events around the world in real-time. Emergency response is one such area that stands to gain from these advances. By processing billions of texts and images a minute, events can be automatically detected to enable emergency response workers to better assess rapidly evolving situations and deploy resources accordingly. To date, most event detection techniques in this area have focused on image-only or text-only approaches, limiting detection performance and impacting the quality of information delivered to crisis response teams. In this paper, we present a new multimodal fusion method that leverages both images and texts as input. In particular, we introduce a cross-attention module that can filter uninformative and misleading components from weak modalities on a sample by sample basis. In addition, we employ a multimodal graph-based approach to stochastically transition between embeddings of different multimodal pairs during training to better regularize the learning process as well as dealing with limited training data by constructing new matched pairs from different samples. We show that our method outperforms the unimodal approaches and strong multimodal baselines by a large margin on three crisis-related tasks.

image and text, information, modality, (14 more...)

arXiv.org Artificial Intelligence

2004.04917

Country:

North America > United States > California > Yolo County > Davis (0.14)
Asia > Sri Lanka (0.05)
North America > United States > New York > New York County > New York City (0.04)
(10 more...)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(4 more...)

Add feedback

False negative coronavirus tests could be due to how healthcare workers are collecting samples

Daily Mail - Science & techApr-9-2020, 09:35:10 GMT

The US has tested more than 1.2 million Americans for coronavirus, but some have received negative results despite being infected. The coronavirus is a disease that forms in the lungs, but it sometimes sits in a cavity between the nose and throat where a swab is unable to reach. Although the RT-polymerase chain reaction (rRT-PCR) detection is the'gold standard' for testing, it can produce a false negative if the sample is not taken properly. Experts also believe that because hospitals and drive-thru testing sites are being flooded by people, healthcare workers are also rushing to tend to as many individuals as possible and are not grabbing the samples properly. The coronavirus is a disease that forms in the lungs, but it sometimes sits in a cavity between the nose and throat where a swab is unable to reach.

false negative, false negative coronavirus test, healthcare worker, (11 more...)

Daily Mail - Science & tech

Country:

North America > United States > California > Yolo County > Davis (0.05)
Asia > China (0.05)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.66)

Add feedback

Latent regularization for feature selection using kernel methods in tumor classification

Palazzo, Martin, Yankilevich, Patricio, Beauseroy, Pierre

arXiv.org Machine LearningApr-9-2020

The transcriptomics of cancer tumors are characterized with tens of thousands of gene expression features. Patient prognosis or tumor stage can be assessed by machine learning techniques like supervised classification tasks given a gene expression profile. Feature selection is a useful approach to select the key genes which helps to classify tumors. In this work we propose a feature selection method based on Multiple Kernel Learning that results in a reduced subset of genes and a custom kernel that improves the classification performance when used in support vector classification. During the feature selection process this method performs a novel latent regularisation by relaxing the supervised target problem by introducing unsupervised structure obtained from the latent space learned by a non linear dimensionality reduction model. An improvement of the generalization capacity is obtained and assessed by the tumor classification performance on new unseen test samples when the classifier is trained with the features selected by the proposed method in comparison with other supervised feature selection approaches.

feature selection, kernel, selection, (16 more...)

arXiv.org Machine Learning

2004.04866

Country:

South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France (0.04)

Genre: Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Multiclass Classification via Class-Weighted Nearest Neighbors

Khim, Justin, Xu, Ziyu, Singh, Shashank

arXiv.org Machine LearningApr-9-2020

Classification is a fundamental problem in statistics and machine learning that arises in many scientific and engineering problems. Scientific applications include identifying plant and animal species from body measurements, determining cancer types based on gene expression, and satellite image processing (Fisher, 1936, 1938; Khan et al., 2001; Lee et al., 2004); in modern engineering contexts, credit card fraud detection, handwritten digit recognition, word sense disambiguation, and object detection in images are all examples of classification tasks. These applications have brought two new challenges: multiclass classification with a potentially large number of classes and imbalanced data. For example, in online retailing, websites have hundreds of thousands or millions of products, and they may like to categorize these products within a preexisting taxonomy based on product descriptions (Lin et al., 2018). While the number of classes alone makes the problem difficult, an added difficulty with text data is that it is usually highly imbalanced, meaning that a few classes may constitute a large fraction of the data while many classes have only a few examples. In fact, Feldman (2019) notes that if the data follows the classical Zipf distribution for text data (Zipf, 1936), i.e., the class probabilities satisfy a power-law distribution, then up to 35% of seen examples may appear only once in the training data. Additionally, natural image data also seems to have the problems of many classes and imbalanced data (Salakhutdinov et al., 2011; Zhu et al., 2014). Focusing on the problem of imbalanced data, researchers have found that a few heuristics help "do better," and the most principled and studied of these is weighting. There are a number of forms of weighting; we consider the most basic in which we incur a loss of weight for misclassifying an example of class and refer to this method as class-weighting.

classification, prec, probability, (15 more...)

arXiv.org Machine Learning

2004.04715

Country:

North America > United States > Texas (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York (0.04)
(2 more...)

Genre:

Research Report (0.64)
Workflow (0.46)

Industry:

Law Enforcement & Public Safety > Fraud (0.54)
Health & Medicine > Pharmaceuticals & Biotechnology (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

Add feedback

Diagnosing COVID-19 from X-Ray and Images using Deep Learning Algorithms Learn Neural Networks

#artificialintelligenceApr-8-2020, 12:20:10 GMT

Throughout history, epidemics and chronic diseases have claimed the lives of many people and caused major crises that have taken a long time to overcome. The 2019 novel coronavirus (COVID-19) pandemic appeared in Wuhan, China in December 2019 and has become a serious public health problem worldwide. It is an acute resolved disease, but it can also be deadly, with a 2% case fatality rate. The early and automatic diagnosis of Covid-19 may be beneficial for timely referral of the patient to quarantine, and monitoring of the spread of the disease. Some tests requiring significant time to produce results (days), and a projected up to 30% false positive rate, other timely approaches to diagnosis are worthy of investigation.

covid-19, covid-19 case, neural network, (13 more...)

#artificialintelligence

Country: Asia > China > Hubei Province > Wuhan (0.25)

Genre: Research Report (0.36)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Bayesian Interpolants as Explanations for Neural Inferences

McMillan, Kenneth L.

arXiv.org Machine LearningApr-8-2020

The notion of Craig interpolant, used as a form of explanation in automated reasoning, is adapted from logical inference to statistical inference and used to explain inferences made by neural networks. The method produces explanations that are at the same time concise, understandable and precise.

explanation, interpolant, precision, (15 more...)

arXiv.org Machine Learning

2004.04198

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Colorado > Boulder County > Boulder (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > Italy > Veneto > Venice (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

Model-Agnostic Characterization of Fairness Trade-offs

Kim, Joon Sik, Chen, Jiahao, Talwalkar, Ameet

arXiv.org Machine LearningApr-8-2020

There exist several inherent trade-offs in designing a fair model, such as those between the model's predictive performance and fairness, or even among different notions of fairness. In practice, exploring these trade-offs requires significant human and computational resources. We propose a diagnostic that enables practitioners to explore these trade-offs without training a single model. Our work hinges on the observation that many widely-used fairness definitions can be expressed via the fairness-confusion tensor, an object obtained by splitting the traditional confusion matrix according to protected data attributes. Optimizing accuracy and fairness objectives directly over the elements in this tensor yields a data-dependent yet model-agnostic way of understanding several types of trade-offs. We further leverage this tensor-based perspective to generalize existing theoretical impossibility results to a wider range of fairness definitions. Finally, we demonstrate the usefulness of the proposed diagnostic on synthetic and real datasets.

accuracy, fairness, fairness constraint, (15 more...)

arXiv.org Machine Learning

2004.03424

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York (0.04)
North America > United States > California > Orange County > Irvine (0.04)
North America > United States > California > Los Angeles County > Santa Monica (0.04)

Genre: Research Report (0.82)

Industry:

Banking & Finance (0.46)
Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Data Science (0.93)

Add feedback

Imbalanced Data Learning by Minority Class Augmentation using Capsule Adversarial Networks

Shamsolmoali, Pourya, Zareapoor, Masoumeh, Shen, Linlin, Sadka, Abdul Hamid, Yang, Jie

arXiv.org Machine LearningApr-8-2020

The fact that image datasets are often imbalanced poses an intense challenge for deep learning techniques. In this paper, we propose a method to restore the balance in imbalanced images, by coalescing two concurrent methods, generative adversarial networks (GANs) and capsule network. In our model, generative and discriminative networks play a novel competitive game, in which the generator generates samples towards specific classes from multivariate probabilities distribution. The discriminator of our model is designed in a way that while recognizing the real and fake samples, it is also requires to assign classes to the inputs. Since GAN approaches require fully observed data during training, when the training samples are imbalanced, the approaches might generate similar samples which leading to data overfitting. This problem is addressed by providing all the available information from both the class components jointly in the adversarial training. It improves learning from imbalanced data by incorporating the majority distribution structure in the generation of new minority samples. Furthermore, the generator is trained with feature matching loss function to improve the training convergence. In addition, prevents generation of outliers and does not affect majority class space. The evaluations show the effectiveness of our proposed methodology; in particular, the coalescing of capsule-GAN is effective at recognizing highly overlapping classes with much fewer parameters compared with the convolutional-GAN.

dataset, discriminator, generator, (14 more...)

arXiv.org Machine Learning

2004.02182

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.82)

Industry:

Information Technology (0.46)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback