Goto

Collaborating Authors

 Performance Analysis


Beyond Visual Image: Automated Diagnosis of Pigmented Skin Lesions Combining Clinical Image Features with Patient Data

arXiv.org Artificial Intelligence

Among the most common types of skin cancer are basal cell carcinoma, squamous cell carcinoma and melanoma. According to the who (2018), currently, between 2 and 3 million non-melanoma skin cancers and 132.000 melanoma skin cancer occur every year in the world. Melanoma is by far the most dangerous form of skin cancer, causing more than 75% of all skin cancer deaths (Allen, 2016). Early diagnosis of the disease plays an important role in reducing the mortality rate with a chance of cure greater than 90% (SBD, 2018). The diagnosis of pigmented skin lesions (PSLs) can be made by invasive and non-invasive methods. One of the most common non-invasive methods was presented by Soyer et al. (1987). The method allows the visualization of morphological structures not visible to the naked eye with the use of an instrument called dermatoscope. When compared to the clinical diagnosis, the use of dermatoscope by experts makes the diagnosis of PSLs easier, increasing by 10-27% the diagnostic sensitivity (Mayer et al., 1997).


Pre-Trained Language Transformers are Universal Image Classifiers

arXiv.org Artificial Intelligence

Facial images disclose many hidden personal traits such as age, gender, race, health, emotion, and psychology. Understanding these traits will help to classify the people in different attributes. In this paper, we have presented a novel method for classifying images using a pretrained transformer model. We apply the pretrained transformer for the binary classification of facial images in criminal and non-criminal classes. The pretrained transformer of GPT-2 is trained to generate text and then fine-tuned to classify facial images. During the finetuning process with images, most of the layers of GT-2 are frozen during backpropagation and the model is frozen pretrained transformer (FPT). The FPT acts as a universal image classifier, and this paper shows the application of FPT on facial images. We also use our FPT on encrypted images for classification. Our FPT shows high accuracy on both raw facial images and encrypted images. We hypothesize the meta-learning capacity FPT gained because of its large size and trained on a large size with theory and experiments. The GPT-2 trained to generate a single word token at a time, through the autoregressive process, forced to heavy-tail distribution. Then the FPT uses the heavy-tail property as its meta-learning capacity for classifying images. Our work shows one way to avoid bias during the machine classification of images.The FPT encodes worldly knowledge because of the pretraining of one text, which it uses during the classification. The statistical error of classification is reduced because of the added context gained from the text.Our paper shows the ethical dimension of using encrypted data for classification.Criminal images are sensitive to share across the boundary but encrypted largely evades ethical concern.FPT showing good classification accuracy on encrypted images shows promise for further research on privacy-preserving machine learning.


Introduction To Machine Learning

#artificialintelligence

The wikipedia definition for ML is Machine learning is the study of computer algorithms that can improve automatically through experience and by the use of data. But what does it really mean? Machine Learning is using Data to make predictions or just use Data in any way to extract knowledge. I'll give a brief intro of these steps right now and go into more detail in the upcoming articles. There are 4 major types in which data will be available for use i.e.


Learning Optimal Fair Classification Trees

arXiv.org Artificial Intelligence

The increasing use of machine learning in high-stakes domains -- where people's livelihoods are impacted -- creates an urgent need for interpretable and fair algorithms. In these settings it is also critical for such algorithms to be accurate. With these needs in mind, we propose a mixed integer optimization (MIO) framework for learning optimal classification trees of fixed depth that can be conveniently augmented with arbitrary domain specific fairness constraints. We benchmark our method against the state-of-the-art approach for building fair trees on popular datasets; given a fixed discrimination threshold, our approach improves out-of-sample (OOS) accuracy by 2.3 percentage points on average and obtains a higher OOS accuracy on 88.9% of the experiments. We also incorporate various algorithmic fairness notions into our method, showcasing its versatile modeling power that allows decision makers to fine-tune the trade-off between accuracy and fairness.


What You See is Not What the Network Infers: Detecting Adversarial Examples Based on Semantic Contradiction

arXiv.org Artificial Intelligence

Adversarial examples (AEs) pose severe threats to the applications of deep neural networks (DNNs) to safety-critical domains, e.g., autonomous driving. While there has been a vast body of AE defense solutions, to the best of our knowledge, they all suffer from some weaknesses, e.g., defending against only a subset of AEs or causing a relatively high accuracy loss for legitimate inputs. Moreover, most existing solutions cannot defend against adaptive attacks, wherein attackers are knowledgeable about the defense mechanisms and craft AEs accordingly. In this paper, we propose a novel AE detection framework based on the very nature of AEs, i.e., their semantic information is inconsistent with the discriminative features extracted by the target DNN model. To be specific, the proposed solution, namely ContraNet, models such contradiction by first taking both the input and the inference result to a generator to obtain a synthetic output and then comparing it against the original input. For legitimate inputs that are correctly inferred, the synthetic output tries to reconstruct the input. On the contrary, for AEs, instead of reconstructing the input, the synthetic output would be created to conform to the wrong label whenever possible. Consequently, by measuring the distance between the input and the synthetic output with metric learning, we can differentiate AEs from legitimate inputs. We perform comprehensive evaluations under various AE attack scenarios, and experimental results show that ContraNet outperforms existing solutions by a large margin, especially under adaptive attacks. Moreover, our analysis shows that successful AEs that can bypass ContraNet tend to have much-weakened adversarial semantics. We have also shown that ContraNet can be easily combined with adversarial training techniques to achieve further improved AE defense capabilities.


A Knowledge Graph Embeddings based Approach for Author Name Disambiguation using Literals

arXiv.org Artificial Intelligence

Data available in scholarly knowledge graphs (SKGs) - i.e., "a graph of data intended to accumulate and convey knowledge of the real world, whose nodes represent entities of interest and whose edges represent potentially different relations between these entities" [14] - is growing continuously every day, leading to a plethora of challenges concerning, for instance, article exploration and visualization [17], article recommendation [3], citation recommendation [11], and Author Name Disambiguation (AND) [24], which is relevant for the purposes of the present article. In particular, AND refers to a specific task of entity resolution which aims at resolving author mentions in bibliographic references to real-world people. Author persistent identifiers, such as ORCIDs and VIAFs, simplify the AND activity since such identifiers can be used for reconciling entities defined as different objects and representing the same real-world person. However, the availability of such persistent identifiers in SKGs - such as OpenCitations (OC) [22], AMiner [27] and Microsoft Academic Knowledge Graph (MAKG) [10] - is characterized by very low coverage and, as such, additional and computationally-oriented techniques must be adopted to identify different authors as the same person. In the past, many automatic approaches have been developed to automatically address AND by using publications metadata (e.g., title, abstract, keywords, venue, affiliation, etc.) to extract some features which can be used in the disambiguation task. These methods vary widely from supervised learning methods to unsupervised learning including recently developed deep neural network-based architectures [31]. However, the existing SKGs do not provide all the relevant contextual information necessary to reuse effectively and efficiently such approaches, that often rely on pure textual data. In contrast with the approaches mentioned above, this study focuses on performing AND for scholarly data represented as linked data or included in SKGs by considering the multi-modal information available in such collections, i.e., the structural information consisting of entities and relations between them as well as text or numeric values associated with the authors and publications defined in the form of literals (family name, given name, publication title, venue title, year of publication, etc.). The proposed framework to address this task is named Literally Author Name Disambiguation (LAND), which focuses on tackling the following research questions: - Can Knowledge Graph Embeddings (KGEs) - i.e. a technique that enables the creation of a "dense representation of the graph in a continuous, low-dimensional vector space that can then be used for machine learning tasks"[13] - be used effectively for the downstream task of clustering, more specifically for author name disambiguation?


Perform Sliced (Tiled) Inference and Detailed Error Analysis using YOLOv5 Models

#artificialintelligence

This post will walk you through installation, sliced inference, error analysis, and interactive visualization steps for your YOLOv5 models. Create error analysis plots using the created result.json:


Robust Wavelet-based Assessment of Scaling with Applications

arXiv.org Machine Learning

A number of approaches have dealt with statistical assessment of self-similarity, and many of those are based on multiscale concepts. Most rely on certain distributional assumptions which are usually violated by real data traces, often characterized by large temporal or spatial mean level shifts, missing values or extreme observations. A novel, robust approach based on Theil-type weighted regression is proposed for estimating self-similarity in two-dimensional data (images). The method is compared to two traditional estimation techniques that use wavelet decompositions; ordinary least squares (OLS) and Abry-Veitch bias correcting estimator (AV). As an application, the suitability of the self-similarity estimate resulting from the the robust approach is illustrated as a predictive feature in the classification of digitized mammogram images as cancerous or non-cancerous. The diagnostic employed here is based on the properties of image backgrounds, which is typically an unused modality in breast cancer screening. Classification results show nearly 68% accuracy, varying slightly with the choice of wavelet basis, and the range of multiresolution levels used.


Survival Prediction of Children Undergoing Hematopoietic Stem Cell Transplantation Using Different Machine Learning Classifiers by Performing Chi-squared Test and Hyper-parameter Optimization: A Retrospective Analysis

arXiv.org Artificial Intelligence

Bone Marrow Transplant, a gradational rescue for a wide range of disorders emanating from the bone marrow, is an efficacious surgical treatment. Several risk factors, such as post-transplant illnesses, new malignancies, and even organ damage, can impair long-term survival. Therefore, technologies like Machine Learning are deployed for investigating the survival prediction of BMT receivers along with the influences that limit their resilience. In this study, an efficient survival classification model is presented in a comprehensive manner, incorporating the Chi-squared feature selection method to address the dimensionality problem and Hyper Parameter Optimization (HPO) to increase accuracy. A synthetic dataset is generated by imputing the missing values, transforming the data using dummy variable encoding, and compressing the dataset from 59 features to the 11 most correlated features using Chi-squared feature selection. The dataset was split into train and test sets at a ratio of 80:20, and the hyperparameters were optimized using Grid Search Cross-Validation. Several supervised ML methods were trained in this regard, like Decision Tree, Random Forest, Logistic Regression, K-Nearest Neighbors, Gradient Boosting Classifier, Ada Boost, and XG Boost. The simulations have been performed for both the default and optimized hyperparameters by using the original and reduced synthetic dataset. After ranking the features using the Chi-squared test, it was observed that the top 11 features with HPO, resulted in the same accuracy of prediction (94.73%) as the entire dataset with default parameters. Moreover, this approach requires less time and resources for predicting the survivability of children undergoing BMT. Hence, the proposed approach may aid in the development of a computer-aided diagnostic system with satisfactory accuracy and minimal computation time by utilizing medical data records.


How can AI Prevent Fraud?

#artificialintelligence

Multinational technology corporation IBM calculated that 72% of business leaders cited fraud as a growing concern in the last year, that $44 billion will be lost worldwide due to fraud by 2024, and that a quarter of e-commerce sales transactions that were declined by artificial intelligence (AI) were false positives. AI has become the leading tool for fighting fraud, but it can still be improved upon. In the past, rule-based engines and simple predictive models were used to computationally identify the majority of fraud attempts. But these methods have not kept up with the increasingly sophisticated nature of fraud attacks today. With a proliferation of digital technologies at criminals' disposal, fraud has grown in both scale and severity over the last few decades. Large criminal organizations and even state-sponsored groups use AI-like machine learning (ML) algorithms to defraud digital businesses for millions of dollars each year.