Accuracy
A Survey of Adversarial Learning on Graphs
Chen, Liang, Li, Jintang, Peng, Jiaying, Xie, Tao, Cao, Zengxu, Xu, Kun, He, Xiangnan, Zheng, Zibin
Deep learning models on graphs have achieved remarkable performance in various graph analysis tasks, e.g., node classification, link prediction and graph clustering. However, they expose uncertainty and unreliability against the well-designed inputs, i.e., adversarial examples. Accordingly, various studies have emerged for both attack and defense addressed in different graph analysis tasks, leading to the arms race in graph adversarial learning. For instance, the attacker has poisoning and evasion attack, and the defense group correspondingly has preprocessing- and adversarial- based methods. Despite the booming works, there still lacks a unified problem definition and a comprehensive review. To bridge this gap, we investigate and summarize the existing works on graph adversarial learning tasks systemically. Specifically, we survey and unify the existing works w.r.t. attack and defense in graph analysis tasks, and give proper definitions and taxonomies at the same time. Besides, we emphasize the importance of related evaluation metrics, and investigate and summarize them comprehensively. Hopefully, our works can serve as a reference for the relevant researchers, thus providing assistance for their studies. More details of our works are available at https://github.com/gitgiter/Graph-Adversarial-Learning.
Step-By-Step Framework for Imbalanced Classification Projects
Classification predictive modeling problems involve predicting a class label for a given set of inputs. It is a challenging problem in general, especially if little is known about the dataset, as there are tens, if not hundreds, of machine learning algorithms to choose from. The problem is made significantly more difficult if the distribution of examples across the classes is imbalanced. This requires the use of specialized methods to either change the dataset or change the learning algorithm to handle the skewed class distribution. A common way to deal with the overwhelm on a new classification project is to use a favorite machine learning algorithm like Random Forest or SMOTE. Another common approach is to scour the research literature for descriptions of vaguely similar problems and attempt to re-implement the algorithms and configurations that are described. These approaches can be effective, although they are hit-or-miss and time-consuming respectively.
Machine Learning-based Approach for Depression Detection in Twitter Using Content and Activity Features
AlSagri, Hatoon S., Ykhlef, Mourad
Social media channels, such as Facebook, Twitter, and Instagram, have altered our world forever. People are now increasingly connected than ever and reveal a sort of digital persona. Although social media certainly has several remarkable features, the demerits are undeniable as well. Recent studies have indicated a correlation between high usage of social media sites and increased depression. The present study aims to exploit machine learning techniques for detecting a probable depressed Twitter user based on both, his/her network behavior and tweets. For this purpose, we trained and tested classifiers to distinguish whether a user is depressed or not using features extracted from his/ her activities in the network and tweets. The results showed that the more features are used, the higher are the accuracy and F-measure scores in detecting depressed users. This method is a data-driven, predictive approach for early detection of depression or other mental illnesses. This study's main contribution is the exploration part of the features and its impact on detecting the depression level.
Online Tensor-Based Learning for Multi-Way Data
Anaissi, Ali, Suleiman, Basem, Zandavi, Seid Miad
The online analysis of multi-way data stored in a tensor $\mathcal{X} \in \mathbb{R} ^{I_1 \times \dots \times I_N} $ has become an essential tool for capturing the underlying structures and extracting the sensitive features which can be used to learn a predictive model. However, data distributions often evolve with time and a current predictive model may not be sufficiently representative in the future. Therefore, incrementally updating the tensor-based features and model coefficients are required in such situations. A new efficient tensor-based feature extraction, named NeSGD, is proposed for online $CANDECOMP/PARAFAC$ (CP) decomposition. According to the new features obtained from the resultant matrices of NeSGD, a new criteria is triggered for the updated process of the online predictive model. Experimental evaluation in the field of structural health monitoring using laboratory-based and real-life structural datasets show that our methods provide more accurate results compared with existing online tensor analysis and model learning. The results showed that the proposed methods significantly improved the classification error rates, were able to assimilate the changes in the positive data distribution over time, and maintained a high predictive accuracy in all case studies.
DIBS: Diversity inducing Information Bottleneck in Model Ensembles
Sinha, Samarth, Bharadhwaj, Homanga, Goyal, Anirudh, Larochelle, Hugo, Garg, Animesh, Shkurti, Florian
Although deep learning models have achieved state-of-the art performance on a number of vision tasks, generalization over high dimensional multi-modal data, and reliable predictive uncertainty estimation are still active areas of research. Bayesian approaches including Bayesian Neural Nets (BNNs) do not scale well to modern computer vision tasks, as they are difficult to train, and have poor generalization under dataset-shift [27,38]. This motivates the need for effective ensembles which can generalize and give reliable uncertainty estimates. In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction. We explicitly optimize a diversity inducing adversarial loss for learning the stochastic latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data. We evaluate our method on benchmark datasets: MNIST, CIFAR100, TinyImageNet and MIT Places 2, and compared to the most competitive baselines show significant improvements in classification accuracy, under a shift in the data distribution and in out-of-distribution detection.
MATLAB Benchmark Code for WiDS Datathon 2020
Hello all, I am Neha Goel, Technical Lead for AI/Data Science competitions on the MathWorks Student Competition team. MathWorks is excited to support WiDS Datathon 2020 by providing complimentary MATLAB Licenses, tutorials, and getting started resources to each participant. To request your complimentary license, go to the MathWorks site, click the "Request Software" button, and fill out the software request form. You will get your license within 72 business hours. The WiDS Datathon 2020 focuses on patient health through data from MIT's GOSSIS (Global Open Source Severity of Illness Score) initiative.
Imbalanced Classification with the Adult Income Dataset
Many binary classification tasks do not have an equal number of examples from each class, e.g. the class distribution is skewed or imbalanced. A popular example is the adult income dataset that involves predicting personal income levels as above or below $50,000 per year based on personal details such as relationship and education level. There are many more cases of incomes less than $50K than above $50K, although the skew is not severe. This means that techniques for imbalanced classification can be used whilst model performance can still be reported using classification accuracy, as is used with balanced classification problems. In this tutorial, you will discover how to develop and evaluate a model for the imbalanced adult income classification dataset. Develop an Imbalanced Classification Model to Predict Income Photo by Kirt Edblom, some rights reserved.
Discovering contemporaneous and lagged causal relations in autocorrelated nonlinear time series datasets
We consider causal discovery from time series using conditional independence (CI) based network learning algorithms such as the PC algorithm. The PC algorithm is divided into a skeleton phase where adjacencies are determined based on efficiently selected CI tests and subsequent phases where links are oriented utilizing the Markov and Faithfulness assumptions. Here we show that autocorrelation makes the PC algorithm much less reliable with very low adjacency and orientation detection rates and inflated false positives. We propose a new algorithm, called PCMCI$^+$ that extends the PCMCI method from [Runge et al., 2019b] to also include discovery of contemporaneous links. It separates the skeleton phase for lagged and contemporaneous conditioning sets and modifies the conditioning sets for the individual CI tests. We show that this algorithm now benefits from increasing autocorrelation and yields much more adjacency detection power and especially more orientation recall for contemporaneous links while controlling false positives and having much shorter runtimes. Numerical experiments indicate that the algorithm can be of considerable use in many application scenarios for dozens of variables and large time delays.
Diffusion State Distances: Multitemporal Analysis, Fast Algorithms, and Applications to Biological Networks
Cowen, Lenore, Devkota, Kapil, Hu, Xiaozhe, Murphy, James M., Wu, Kaiyi
Data-dependent metrics are powerful tools for learning the underlying structure of high-dimensional data. This article develops and analyzes a data-dependent metric known as diffusion state distance (DSD), which compares points using a data-driven diffusion process. Unlike related diffusion methods, DSDs incorporate information across time scales, which allows for the intrinsic data structure to be inferred in a parameter-free manner. This article develops a theory for DSD based on the multitemporal emergence of mesoscopic equilibria in the underlying diffusion process. New algorithms for denoising and dimension reduction with DSD are also proposed and analyzed. These approaches are based on a weighted spectral decomposition of the underlying diffusion process, and experiments on synthetic datasets and real biological networks illustrate the efficacy of the proposed algorithms in terms of both speed and accuracy. Throughout, comparisons with related methods are made, in order to illustrate the distinct advantages of DSD for datasets exhibiting multiscale structure.
Adversarial Machine Learning: Perspectives from Adversarial Risk Analysis
Insua, David Rios, Naveiro, Roi, Gallego, Victor, Poulos, Jason
Adversarial Machine Learning (AML) is emerging as a major field aimed at the protection of automated ML systems against security threats. The majority of work in this area has built upon a game-theoretic framework by modelling a conflict between an attacker and a defender. After reviewing game-theoretic approaches to AML, we discuss the benefits that a Bayesian Adversarial Risk Analysis perspective brings when defending ML based systems. A research agenda is included.