Performance Analysis
OSCAR-Net: Object-centric Scene Graph Attention for Image Attribution
Nguyen, Eric, Bui, Tu, Swaminathan, Vishy, Collomosse, John
Images tell powerful stories but cannot always be trusted. Matching images back to trusted sources (attribution) enables users to make a more informed judgment of the images they encounter online. We propose a robust image hashing algorithm to perform such matching. Our hash is sensitive to manipulation of subtle, salient visual details that can substantially change the story told by an image. Yet the hash is invariant to benign transformations (changes in quality, codecs, sizes, shapes, etc.) experienced by images during online redistribution. Our key contribution is OSCAR-Net (Object-centric Scene Graph Attention for Image Attribution Network); a robust image hashing model inspired by recent successes of Transformers in the visual domain. OSCAR-Net constructs a scene graph representation that attends to fine-grained changes of every object's visual appearance and their spatial relationships. The network is trained via contrastive learning on a dataset of original and manipulated images yielding a state of the art image hash for content fingerprinting that scales to millions of images.
Using Undervolting as an On-Device Defense Against Adversarial Machine Learning Attacks
Majumdar, Saikat, Samavatian, Mohammad Hossein, Barber, Kristin, Teodorescu, Radu
Deep neural network (DNN) classifiers are powerful tools that drive a broad spectrum of important applications, from image recognition to autonomous vehicles. Unfortunately, DNNs are known to be vulnerable to adversarial attacks that affect virtually all state-of-the-art models. These attacks make small imperceptible modifications to inputs that are sufficient to induce the DNNs to produce the wrong classification. In this paper we propose a novel, lightweight adversarial correction and/or detection mechanism for image classifiers that relies on undervolting (running a chip at a voltage that is slightly below its safe margin). We propose using controlled undervolting of the chip running the inference process in order to introduce a limited number of compute errors. We show that these errors disrupt the adversarial input in a way that can be used either to correct the classification or detect the input as adversarial. We evaluate the proposed solution in an FPGA design and through software simulation. We evaluate 10 attacks and show average detection rates of 77% and 90% on two popular DNNs.
SMOTified-GAN for class imbalanced pattern classification problems
Sharma, Anuraganand, Singh, Prabhat Kumar, Chandra, Rohitash
Class imbalance in a dataset is a major problem for classifiers that results in poor prediction with a high true positive rate (TPR) but a low true negative rate (TNR) for a majority positive training dataset. Generally, the pre-processing technique of oversampling of minority class(es) are used to overcome this deficiency. Our focus is on using the hybridization of Generative Adversarial Network (GAN) and Synthetic Minority Over-Sampling Technique (SMOTE) to address class imbalanced problems. We propose a novel two-phase oversampling approach that has the synergy of SMOTE and GAN. The initial data of minority class(es) generated by SMOTE is further enhanced by GAN that produces better quality samples. We named it SMOTified-GAN as GAN works on pre-sampled minority data produced by SMOTE rather than randomly generating the samples itself. The experimental results prove the sample quality of minority class(es) has been improved in a variety of tested benchmark datasets. Its performance is improved by up to 9\% from the next best algorithm tested on F1-score measurements. Its time complexity is also reasonable which is around $O(N^2d^2T)$ for a sequential algorithm.
Beyond Cuts in Small Signal Scenarios - Enhanced Sneutrino Detectability Using Machine Learning
Alvestad, Daniel, Fomin, Nikolai, Kersten, Jรถrn, Maeland, Steffen, Strรผmke, Inga
The absence of a signal of new particles at the Large Hadron Collider (LHC) may suggest that new physics is realized in a scenario that is hard to detect due to the absence or very large mass of new colored particles. Hence, this study focuses on setups with dominant electroweak production of color-neutral new particles and multi-lepton signals from their decays. The conventional approach to searches for new physics, also known as "cut-and-count analysis", is to apply a set of constraints on different kinematic variables (called "cuts" or "selection") that improve the signalto-background ratio. However, the scenarios we consider can be challenging for this standard approach due to the small production cross section and the similarity of signal and background features. For such problems, machine learning (ML) offers a promising alternative [1-6]. We investigate how much ML can increase the discovery reach, and whether machine learning models can be trained in such a way that they work in a large region of parameter space and not just for a single point. This is an important issue, in particular in new physics scenarios with many free parameters, as signal kinematics vary from point to point. As a concrete example, we consider a supersymmetry (SUSY) scenario with a gravitino lightest supersymmetric particle (LSP) whose mass is in the GeV range.
A.I. Systems Diagnosing Sepsis: Is It Ready for Prime Time?
Sepsis remains one of the most costly and deadly of medical conditions. Sepsis is not a disease per se, but a syndrome, a collection of signs and symptoms, that indicated the presence of an overwhelming infection. Many, if not all, severely ill patients with COVID-19 had viral sepsis. Bacterial causes are more common, but sepsis in all its microbial forms carries a high mortality. Academics have long tortured clinical hospital data to find some statistical means of identifying sepsis or its incipient signs, because early intervention is associated with better outcomes.
Recommending Insurance products by using Users' Sentiments
Parasrampuria, Rohan, Ghosh, Ayan, Dutta, Suchandra, Sarkar, Dhrubasish
In today's tech-savvy world every industry is trying to formulate methods for recommending products by combining several techniques and algorithms to form a pool that would bring forward the most enhanced models for making the predictions. Building on these lines is our paper focused on the application of sentiment analysis for recommendation in the insurance domain. We tried building the following Machine Learning models namely, Logistic Regression, Multinomial Naive Bayes, and the mighty Random Forest for analyzing the polarity of a given feedback line given by a customer. Then we used this polarity along with other attributes like Age, Gender, Locality, Income, and the list of other products already purchased by our existing customers as input for our recommendation model. Then we matched the polarity score along with the user's profiles and generated the list of insurance products to be recommended in descending order. Despite our model's simplicity and the lack of the key data sets, the results seemed very logical and realistic. So, by developing the model with more enhanced methods and with access to better and true data gathered from an insurance industry may be the sector could be very well benefitted from the amalgamation of sentiment analysis with a recommendation.
Unsupervised Detection of Lung Nodules in Chest Radiography Using Generative Adversarial Networks
Bhatt, Nitish, Prados, David Ramon, Hodzic, Nedim, Karanassios, Christos, Tizhoosh, H. R.
Lung nodules are commonly missed in chest radiographs. We propose and evaluate P-AnoGAN, an unsupervised anomaly detection approach for lung nodules in radiographs. P-AnoGAN modifies the fast anomaly detection generative adversarial network (f-AnoGAN) by utilizing a progressive GAN and a convolutional encoder-decoder-encoder pipeline. Model training uses only unlabelled healthy lung patches extracted from the Indiana University Chest X-Ray Collection. External validation and testing are performed using healthy and unhealthy patches extracted from the ChestX-ray14 and Japanese Society for Radiological Technology datasets, respectively. Our model robustly identifies patches containing lung nodules in external validation and test data with ROC-AUC of 91.17% and 87.89%, respectively. These results show unsupervised methods may be useful in challenging tasks such as lung nodule detection in radiographs.
A Machine-Learning-Ready Dataset Prepared from the Solar and Heliospheric Observatory Mission
Shneider, Carl, Hu, Andong, Tiwari, Ajay K., Bobra, Monica G., Battams, Karl, Teunissen, Jannis, Camporeale, Enrico
We present a Python tool to generate a standard dataset from solar images that allows for user-defined selection criteria and a range of pre-processing steps. Our Python tool works with all image products from both the Solar and Heliospheric Observatory (SoHO) and Solar Dynamics Observatory (SDO) missions. We discuss a dataset produced from the SoHO mission's multi-spectral images which is free of missing or corrupt data as well as planetary transits in coronagraph images, and is temporally synced making it ready for input to a machine learning system. Machine-learning-ready images are a valuable resource for the community because they can be used, for example, for forecasting space weather parameters. We illustrate the use of this data with a 3-5 day-ahead forecast of the north-south component of the interplanetary magnetic field (IMF) observed at Lagrange point one (L1). For this use case, we apply a deep convolutional neural network (CNN) to a subset of the full SoHO dataset and compare with baseline results from a Gaussian Naive Bayes classifier.
Multi-Label Gold Asymmetric Loss Correction with Single-Label Regulators
Pene, Cosmin Octavian, Ghiassi, Amirmasoud, Younesian, Taraneh, Birke, Robert, Chen, Lydia Y.
Multi-label learning is an emerging extension of the multi-class classification where an image contains multiple labels. Not only acquiring a clean and fully labeled dataset in multi-label learning is extremely expensive, but also many of the actual labels are corrupted or missing due to the automated or non-expert annotation techniques. Noisy label data decrease the prediction performance drastically. In this paper, we propose a novel Gold Asymmetric Loss Correction with Single-Label Regulators (GALC-SLR) that operates robust against noisy labels. GALC-SLR estimates the noise confusion matrix using single-label samples, then constructs an asymmetric loss correction via estimated confusion matrix to avoid overfitting to the noisy labels. Empirical results show that our method outperforms the state-of-the-art original asymmetric loss multi-label classifier under all corruption levels, showing mean average precision improvement up to 28.67% on a real world dataset of MS-COCO, yielding a better generalization of the unseen data and increased prediction performance.
Signature Verification using Geometrical Features and Artificial Neural Network Classifier
Jain, Anamika, Singh, Satish Kumar, Singh, Krishna Pratap
Signature verification has been one of the major researched areas in the field of computer vision. Many financial and legal organizations use signature verification as access control and authentication. Signature images are not rich in texture; however, they have much vital geometrical information. Through this work, we have proposed a signature verification methodology that is simple yet effective. The technique presented in this paper harnesses the geometrical features of a signature image like center, isolated points, connected components, etc., and with the power of Artificial Neural Network (ANN) classifier, classifies the signature image based on their geometrical features. Publicly available dataset MCYT, BHSig260 (contains the image of two regional languages Bengali and Hindi) has been used in this paper to test the effectiveness of the proposed method. We have received a lower Equal Error Rate (EER) on MCYT 100 dataset and higher accuracy on the BHSig260 dataset.