Goto

Collaborating Authors

 Performance Analysis


r/MachineLearning - [R] Destruction of Image Steganography using Generative Adversarial Networks

#artificialintelligence

Abstract: Digital image steganalysis, or the detection of image steganography, has been studied in depth for years and is driven by Advanced Persistent Threat (APT) groups', such as APT37 Reaper, utilization of steganographic techniques to transmit additional malware to perform further post-exploitation activity on a compromised host. However, many steganalysis algorithms are constrained to work with only a subset of all possible images in the wild or are known to produce a high false positive rate. This results in blocking any suspected image being an unreasonable policy. A more feasible policy is to filter suspicious images prior to reception by the host machine. However, how does one optimally filter specifically to obfuscate or remove image steganography while avoiding degradation of visual image quality in the case that detection of the image was a false positive?


5 Great New Features in Latest Scikit-learn Release - KDnuggets

#artificialintelligence

The latest release of Python's workhorse machine learning library includes a number of new features and bug fixes. You can find a full accounting of these changes from the official Scikit-learn 0.22 release highlights, and can read find the change log here. Here are 5 new features in the latest release of Scikit-learn which are worth your attention. A new plotting API is available, working without requiring any recomputation. Supported plots include, among others, partial dependence plots, confusion matrix, and ROC curves.


Estimation of the spatial weighting matrix for regular lattice data -- An adaptive lasso approach with cross-sectional resampling

arXiv.org Machine Learning

Spatial econometric research typically relies on the assumption that the spatial dependence structure is known in advance and is represented by a deterministic spatial weights matrix. Contrary to classical approaches, we investigate the estimation of sparse spatial dependence structures for regular lattice data. In particular, an adaptive least absolute shrinkage and selection operator (lasso) is used to select and estimate the individual connections of the spatial weights matrix. To recover the spatial dependence structure, we propose cross-sectional resampling, assuming that the random process is exchangeable. The estimation procedure is based on a two-step approach to circumvent simultaneity issues that typically arise from endogenous spatial autoregressive dependencies. The two-step adaptive lasso approach with cross-sectional resampling is verified using Monte Carlo simulations. Eventually, we apply the procedure to model nitrogen dioxide ($\mathrm{NO_2}$) concentrations and show that estimating the spatial dependence structure contrary to using prespecified weights matrices improves the prediction accuracy considerably.


Episode 2: A Cross Validation Framework

#artificialintelligence

Sign in to report inappropriate content. This is the second episode of my video series on applied machine learning. In this episode, we talk about the need for cross-validation and different types of cross-validation. We also see how one can implement a re-usable cross validation framework. In the end we are left with a cross validation framework that can be applied to almost all kinds of machine learning problem.


Semantic Sensitive TF-IDF to Determine Word Relevance in Documents

arXiv.org Machine Learning

Keyword extraction has received an increasing attention as an important research topic which can lead to have advancements in diverse applications such as document context categorization, text indexing and document classification. In this paper we propose STF-IDF, a novel semantic method based on TF-IDF, for scoring word importance of informal documents in a corpus. A set of nearly four million documents from health-care social media was collected and was trained in order to draw semantic model and to find the word embeddings. Then, the features of semantic space were utilized to rearrange the original TF-IDF scores through an iterative solution so as to improve the moderate performance of this algorithm on informal texts. After testing the proposed method with 200 randomly chosen documents, our method managed to decrease the TF-IDF mean error rate by a factor of 50% and reaching the mean error of 13.7%, as opposed to 27.2% of the original TF-IDF.


Prediction of MRI Hardware Failures based on Image Features using Ensemble Learning

arXiv.org Machine Learning

In order to ensure trouble-free operation, prediction of hardware failures is essential. This applies especially to medical systems. Our goal is to determine hardware which needs to be exchanged before failing. In this work, we focus on predicting failures of 20-channel Head/Neck coils using image-related measurements. Thus, we aim to solve a classification problem with two classes, normal and broken coil. To solve this problem, we use data of two different levels. One level refers to one-dimensional features per individual coil channel on which we found a fully connected neural network to perform best. The other data level uses matrices which represent the overall coil condition and feeds a different neural network. We stack the predictions of those two networks and train a Random Forest classifier as the ensemble learner. Thus, combining insights of both trained models improves the prediction results and allows us to determine the coil's condition with an F-score of 94.14% and an accuracy of 99.09%.


User Profiling Using Hinge-loss Markov Random Fields

arXiv.org Machine Learning

A variety of approaches have been proposed to automatically infer the profiles of users from their digital footprint in social media. Most of the proposed approaches focus on mining a single type of information, while ignoring other sources of available user-generated content (UGC). In this paper, we propose a mechanism to infer a variety of user characteristics, such as, age, gender and personality traits, which can then be compiled into a user profile. To this end, we model social media users by incorporating and reasoning over multiple sources of UGC as well as social relations. Our model is based on a statistical relational learning framework using Hinge-loss Markov Random Fields (HL-MRFs), a class of probabilistic graphical models that can be defined using a set of first-order logical rules. We validate our approach on data from Facebook with more than 5k users and almost 725k relations. We show how HL-MRFs can be used to develop a generic and extensible user profiling framework by leveraging textual, visual, and relational content in the form of status updates, profile pictures and Facebook page likes. Our experimental results demonstrate that our proposed model successfully incorporates multiple sources of information and outperforms competing methods that use only one source of information or an ensemble method across the different sources for modeling of users in social media.


Can x2vec Save Lives? Integrating Graph and Language Embeddings for Automatic Mental Health Classification

arXiv.org Artificial Intelligence

Graph and language embedding models are becoming commonplace in large scale analyses given their ability to represent complex sparse data densely in low-dimensional space. Integrating these models' complementary relational and communicative data may be especially helpful if predicting rare events or classifying members of hidden populations - tasks requiring huge and sparse datasets for generalizable analyses. For example, due to social stigma and comorbidities, mental health support groups often form in amorphous online groups. Predicting suicidality among individuals in these settings using standard network analyses is prohibitive due to resource limits (e.g., memory), and adding auxiliary data like text to such models exacerbates complexity- and sparsity-related issues. Here, I show how merging graph and language embedding models (metapath2vec and doc2vec) avoids these limits and extracts unsupervised clustering data without domain expertise or feature engineering. Graph and language distances to a suicide support group have little correlation (\r{ho} < 0.23), implying the two models are not embedding redundant information. When used separately to predict suicidality among individuals, graph and language data generate relatively accurate results (69% and 76%, respectively); however, when integrated, both data produce highly accurate predictions (90%, with 10% false-positives and 12% false-negatives). Visualizing graph embeddings annotated with predictions of potentially suicidal individuals shows the integrated model could classify such individuals even if they are positioned far from the support group. These results extend research on the importance of simultaneously analyzing behavior and language in massive networks and efforts to integrate embedding models for different kinds of data when predicting and classifying, particularly when they involve rare events.


International evaluation of an artificial intelligence system to identify breast cancer in screening mammography

#artificialintelligence

Screening mammography aims to identify breast cancer before symptoms appear, enabling earlier therapy for more treatable disease. Despite the existence of screening programs worldwide, interpretation of these images suffers from suboptimal rates of false positives and false negatives. Here we present an AI system capable of surpassing a single expert reader in breast cancer prediction performance. Using two large data sets representative of clinical practice in the United States (US) and the United Kingdom (UK), we show an absolute reduction of 5.7%/1.2% We show evidence of the system's ability to generalize from the UK sites to the US site.


How Google AI Is Improving Mammograms

#artificialintelligence

While there has been controversy over when and how often women should be screened for breast cancer using mammograms, studies consistently show that screening can lead to earlier detection of the disease, when it's more treatable. So improving how effectively mammograms can detect abnormal growths that could be cancerous is a priority in the field. AI could play a role in accomplishing that--computer-based machine learning might help doctors to read mammograms more accurately. In a study published Jan. 1 in Nature, researchers from Google Health, and from universities in the U.S. and U.K., report on an AI model that reads mammograms with fewer false positives and false negatives than human experts. The algorithm, based on mammograms taken from more than 76,000 women in the U.K. and more than 15,000 in the U.S., reduced false positive rates by nearly 6% in the U.S., where women are screened every one to two years, and by 1.2% in the U.K., where women are screened every three years.