Goto

Collaborating Authors

 Performance Analysis


A Novel Weighted Combination Method for Feature Selection using Fuzzy Sets

arXiv.org Machine Learning

In this paper, we propose a novel weighted combination feature selection method using bootstrap and fuzzy sets. The proposed method mainly consists of three processes, including fuzzy sets generation using bootstrap, weighted combination of fuzzy sets and feature ranking based on defuzzification. We implemented the proposed method by combining four state-of-the-art feature selection methods and evaluated the performance based on three publicly available biomedical datasets using five-fold cross validation. Based on the feature selection results, our proposed method produced comparable (if not better) classification accuracies to the best of the individual feature selection methods for all evaluated datasets. More importantly, we also applied standard deviation and Pearson's correlation to measure the stability of the methods. Remarkably, our combination method achieved significantly higher stability than the four individual methods when variations and size reductions were introduced to the datasets.


A Neural Network Looks at Leonardo's(?) Salvator Mundi

arXiv.org Artificial Intelligence

We use convolutional neural networks (CNNs) to analyze authorship questions surrounding the works of Leonardo da Vinci -- in particular, Salvator Mundi, the world's most expensive painting and among the most controversial. Trained on the works of an artist under study and visually comparable works of other artists, our system can identify likely forgeries and shed light on attribution controversies. Leonardo's few extant paintings test the limits of our system and require corroborative techniques of testing and analysis.


Repurpose Open Data to Discover Therapeutics for COVID-19 using Deep Learning

arXiv.org Machine Learning

There have been more than 850,000 confirmed cases and over 48,000 deaths from the human coronavirus disease 2019 (COVID-19) pandemic, caused by novel severe acute respiratory syndrome coronavirus (SARS-CoV-2), in the United States alone. However, there are currently no proven effective medications against COVID-19. Drug repurposing offers a promising way for the development of prevention and treatment strategies for COVID-19. This study reports an integrative, network-based deep learning methodology to identify repurposable drugs for COVID-19 (termed CoV-KGE). Specifically, we built a comprehensive knowledge graph that includes 15 million edges across 39 types of relationships connecting drugs, diseases, genes, pathways, and expressions, from a large scientific corpus of 24 million PubMed publications. Using Amazon AWS computing resources, we identified 41 repurposable drugs (including indomethacin, toremifene and niclosamide) whose therapeutic association with COVID-19 were validated by transcriptomic and proteomic data in SARS-CoV-2 infected human cells and data from ongoing clinical trials. While this study, by no means recommends specific drugs, it demonstrates a powerful deep learning methodology to prioritize existing drugs for further investigation, which holds the potential of accelerating therapeutic development for COVID-19.


Your Ultimate Data Science Statistics & Mathematics Cheat Sheet

#artificialintelligence

Classifier metrics are metrics used to evaluate the performance of machine learning classifiers -- models that put each training example into one of several discrete categories. Confusion Matrix is a matrix used to indicate a classifier's predictions on labels. It contains four cells, each corresponding to one combination of a predicted true or false and an actual true or false. Many classifier metrics are based on the confusion matrix, so it's helpful to keep an image of it stored in your mind. Sensitivity/Recall is the number of positives that were accurately predicted.


Column: I got tested for COVID-19. Should you?

Los Angeles Times

The last time I traveled along Stadium Way I was headed to a Dodger game, but on Monday afternoon I drove to the fire training center near the ballpark for a much less enjoyable experience. Just a cotton swab and a five-minute drive-through, with results to follow in a few days. I was conflicted about being tested, for two reasons. First, while we definitely needed to ramp up testing back at the beginning of this crisis, I'm wondering if the county has now gone overboard in offering free testing to all residents, whether or not they have symptoms. Second, I'm pretty sure that my minor allergy-like symptoms are just that: allergies.


Automated Copper Alloy Grain Size Evaluation Using a Deep-learning CNN

arXiv.org Machine Learning

Moog Inc. has automated the evaluation of copper (Cu) alloy grain size using a deep-learning convolutional neural network (CNN). The proof-of-concept automated image acquisition and batch-wise image processing offers the potential for significantly reduced labor, improved accuracy of grain evaluation, and decreased overall turnaround times for approving Cu alloy bar stock for use in flight critical aircraft hardware. A classification accuracy of 91.1% on individual sub-images of the Cu alloy coupons was achieved. Process development included minimizing the variation in acquired image color, brightness, and resolution to create a dataset with 12300 sub-images, and then optimizing the CNN hyperparameters on this dataset using statistical design of experiments (DoE). Over the development of the automated Cu alloy grain size evaluation, a degree of "explainability" in the artificial intelligence (XAI) output was realized, based on the decomposition of the large raw images into many smaller dataset sub-images, through the ability to explain the CNN ensemble image output via inspection of the classification results from the individual smaller sub-images.


InfoScrub: Towards Attribute Privacy by Targeted Obfuscation

arXiv.org Artificial Intelligence

Personal photos of individuals when shared online, apart from exhibiting a myriad of memorable details, also reveals a wide range of private information and potentially entails privacy risks (e.g., online harassment, tracking). To mitigate such risks, it is crucial to study techniques that allow individuals to limit the private information leaked in visual data. We tackle this problem in a novel image obfuscation framework: to maximize entropy on inferences over targeted privacy attributes, while retaining image fidelity. We approach the problem based on an encoder-decoder style architecture, with two key novelties: (a) introducing a discriminator to perform bi-directional translation simultaneously from multiple unpaired domains; (b) predicting an image interpolation which maximizes uncertainty over a target set of attributes. We find our approach generates obfuscated images faithful to the original input images, and additionally increase uncertainty by 6.2$\times$ (or up to 0.85 bits) over the non-obfuscated counterparts.


Turns out converting files into images is a highly effective way to detect malware

#artificialintelligence

A branch of artificial intelligence called machine learning is all around us. It's employed by Facebook to help curate content (and target us with ads), Google uses it to filter millions of spam messages each day, and it's part of what enabled the OpenAI bot to beat the reigning Dota 2 champions last year in two out of three matches. There are seemingly endless uses. Adding one more to the pile, Microsoft and Intel have come up with a clever machine learning framework that is surprisingly accurate at detecting malware through a grayscale image conversion process. Microsoft detailed the technology in a blog post (via ZDNet), which it calls static malware-as-image network analysis, or STAMINA.


Synthesizing Unrestricted False Positive Adversarial Objects Using Generative Models

arXiv.org Machine Learning

Adversarial examples are data points misclassified by neural networks. Originally, adversarial examples were limited to adding small perturbations to a given image. Recent work introduced the generalized concept of unrestricted adversarial examples, without limits on the added perturbations. In this paper, we introduce a new category of attacks that create unrestricted adversarial examples for object detection. Our key idea is to generate adversarial objects that are unrelated to the classes identified by the target object detector. Different from previous attacks, we use off-the-shelf Generative Adversarial Networks (GAN), without requiring any further training or modification. Our method consists of searching over the latent normal space of the GAN for adversarial objects that are wrongly identified by the target object detector. We evaluate this method on the commonly used Faster R-CNN ResNet-101, Inception v2 and SSD Mobilenet v1 object detectors using logo generative iWGAN-LC and SNGAN trained on CIFAR-10. The empirical results show that the generated adversarial objects are indistinguishable from non-adversarial objects generated by the GANs, transferable between the object detectors and robust in the physical world. This is the first work to study unrestricted false positive adversarial examples for object detection.


Fast cross-validation for multi-penalty ridge regression

arXiv.org Machine Learning

Prediction based on multiple high-dimensional data types needs to account for the potentially strong differences in predictive signal. Ridge regression is a simple, yet versatile and interpretable model for high-dimensional data that has challenged the predictive performance of many more complex models and learners, in particular in dense settings. Moreover, it allows using a specific penalty per data type to account for differences between those. Then, the largest challenge for multi-penalty ridge is to optimize these penalties efficiently in a cross-validation (CV) setting, in particular for GLM and Cox ridge regression, which require an additional loop for fitting the model by iterative weighted least squares (IWLS). Our main contribution is a computationally very efficient formula for the multi-penalty, sample-weighted hat-matrix, as used in the IWLS algorithm. As a result, nearly all computations are in the low-dimensional sample space. We show that our approach is several orders of magnitude faster than more naive ones. We developed a very flexible framework that includes prediction of several types of response, allows for unpenalized covariates, can optimize several performance criteria and implements repeated CV. Moreover, extensions to pair data types and to allow a preferential order of data types are included and illustrated on several cancer genomics survival prediction problems. The corresponding R-package, multiridge, serves as a versatile standalone tool, but also as a fast benchmark for other more complex models and multi-view learners.