Goto

Collaborating Authors

 Country


Random CapsNet Forest Model for Imbalanced Malware Type Classification Task

arXiv.org Machine Learning

Management Information Systems Department, T.C. Kadir Has University, Istanbul, T urkey Abstract Behavior of a malware varies with respect to malware types. Therefore, knowing type of a malware affects strategies of system protection softwares. Many malware type classification models empowered by machine and deep learning achieve superior accuracies to predict malware types. Machine learning based models need to do heavy feature engineering and feature engineering is dominantly effecting performance of models. On the other hand, deep learning based models require less feature engineering than machine learning based models. However, traditional deep learning architectures and components cause very complex and data sensitive models. This paper proposes an ensemble capsule network model based on bootstrap aggregating technique. The proposed method are tested on two malware datasets, whose the-state-of-the-art results are well-known.


Emergence of functional and structural properties of the head direction system by optimization of recurrent neural networks

arXiv.org Machine Learning

Recent work suggests goal-driven training of neural networks can be used to model neural activity in the brain. While response properties of neurons in artificial neural networks bear similarities to those in the brain, the network architectures are often constrained to be different. Here we ask if a neural network can recover both neural representations and, if the architecture is unconstrained and optimized, the anatomical properties of neural circuits. We demonstrate this in a system where the connectivity and the functional organization have been characterized, namely, the head direction circuits of the rodent and fruit fly. We trained recurrent neural networks (RNNs) to estimate head direction through integration of angular velocity. We found that the two distinct classes of neurons observed in the head direction system, the Ring neurons and the Shifter neurons, emerged naturally in artificial neural networks as a result of training. Furthermore, connectivity analysis and in-silico neurophysiology revealed structural and mechanistic similarities between artificial networks and the head direction system. Overall, our results show that optimization of RNNs in a goal-driven task can recapitulate the structure and function of biological circuits, suggesting that artificial neural networks can be used to study the brain at the level of both neural activity and anatomical organization.


Black Box Recursive Translations for Molecular Optimization

arXiv.org Machine Learning

Machine learning algorithms for generating molecular structures offer a promising new approach to drug discovery. We cast molecular optimization as a translation problem, where the goal is to map an input compound to a target compound with improved biochemical properties. Remarkably, we observe that when generated molecules are iteratively fed back into the translator, molecular compound attributes improve with each step. We show that this finding is invariant to the choice of translation model, making this a "black box" algorithm. We call this method Black Box Recursive Translation (BBRT), a new inference method for molecular property optimization. This simple, powerful technique operates strictly on the inputs and outputs of any translation model. We obtain new state-of-the-art results for molecular property optimization tasks using our simple drop-in replacement with well-known sequence and graph-based models. Our method provides a significant boost in performance relative to its non-recursive peers with just a simple "for" loop. Further, BBRT is highly interpretable, allowing users to map the evolution of newly discovered compounds from known starting points.


A Generalizable Method for Automated Quality Control of Functional Neuroimaging Datasets

arXiv.org Machine Learning

Over the last twenty five years, advances in the collection and analysis of fMRI data have enabled new insights into the brain basis of human health and disease. Individual behavioral variation can now be visualized at a neural level as patterns of connectivity among brain regions. Functional brain imaging is enhancing our understanding of clinical psychiatric disorders by revealing ties between regional and network abnormalities and psychiatric symptoms. Initial success in this arena has recently motivated collection of larger datasets which are needed to leverage fMRI to generate brain-based biomarkers to support development of precision medicines. Despite methodological advances and enhanced computational power, evaluating the quality of fMRI scans remains a critical step in the analytical framework. Before analysis can be performed, expert reviewers visually inspect raw scans and preprocessed derivatives to determine viability of the data. This Quality Control (QC) process is labor intensive, and the inability to automate at large scale has proven to be a limiting factor in clinical neuroscience fMRI research. We present a novel method for automating the QC of fMRI scans. We train machine learning classifiers using features derived from brain MR images to predict the "quality" of those images, based on the ground truth of an expert's opinion. We emphasize the importance of these classifiers' ability to generalize their predictions across data from different studies. To address this, we propose a novel approach entitled "FMRI preprocessing Log mining for Automated, Generalizable Quality Control" (FLAG-QC), in which features derived from mining runtime logs are used to train the classifier. We show that classifiers trained on FLAG-QC features perform much better (AUC=0.79) than previously proposed feature sets (AUC=0.56) when testing their ability to generalize across studies.


Landscape Connectivity and Dropout Stability of SGD Solutions for Over-parameterized Neural Networks

arXiv.org Machine Learning

The optimization of multilayer neural networks typically leads to a solution with zero training error, yet the landscape can exhibit spurious local minima and the minima can be disconnected. In this paper, we shed light on this phenomenon: we show that the combination of stochastic gradient descent (SGD) and over-parameterization makes the landscape of multilayer neural networks approximately connected and thus more favorable to optimization. More specifically, we prove that SGD solutions are connected via a piecewise linear path, and the increase in loss along this path vanishes as the number of neurons grows large. This result is a consequence of the fact that the parameters found by SGD are increasingly dropout stable as the network becomes wider. We show that, if we remove part of the neurons (and suitably rescale the remaining ones), the change in loss is independent of the total number of neurons, and it depends only on how many neurons are left. Our results exhibit a mild dependence on the input dimension: they are dimension-free for two-layer networks and depend linearly on the dimension for multilayer networks. We validate our theoretical findings with numerical experiments for different architectures and classification tasks.


Chart Auto-Encoders for Manifold Structured Data

arXiv.org Machine Learning

Auto-encoding and generative models have made tremendous successes in image and signal representation learning and generation. These models, however, generally employ the full Euclidean space or a bounded subset (such as $[0,1]^l$) as the latent space, whose flat geometry is often too simplistic to meaningfully reflect the topological structure of the data. This paper aims at exploring a universal geometric structure of the latent space for better data representation. Inspired by differential geometry, we propose a Chart Auto-Encoder (CAE), which captures the manifold structure of the data with multiple charts and transition functions among them. CAE translates the mathematical definition of manifold through parameterizing the entire data set as a collection of overlapping charts, creating local latent representations. These representations are an enhancement of the single-charted latent space commonly employed in auto-encoding models, as they reflect the intrinsic structure of the manifold. Therefore, CAE achieves a more accurate approximation of data and generates realistic synthetic examples. We demonstrate the efficacy of CAEs through a series experiments with synthetic and real-life data which illustrate that CAEs can out-preform variational auto-encoders on reconstruction tasks while using much smaller latent spaces.


EAST: Encoding-Aware Sparse Training for Deep Memory Compression of ConvNets

arXiv.org Machine Learning

The implementation of Deep Convolutional Neural Networks (ConvNets) on tiny end-nodes with limited non-volatile memory space calls for smart compression strategies capable of shrinking the footprint yet preserving predictive accuracy. There exist a number of strategies for this purpose, from those that play with the topology of the model or the arithmetic precision, e.g. pruning and quantization, to those that operate a model agnostic compression, e.g. weight encoding. The tighter the memory constraint, the higher the probability that these techniques alone cannot meet the requirement, hence more awareness and cooperation across different optimizations become mandatory. This work addresses the issue by introducing EAST, Encoding-Aware Sparse Training, a novel memory-constrained training procedure that leads quantized ConvNets towards deep memory compression. EAST implements an adaptive group pruning designed to maximize the compression rate of the weight encoding scheme (the LZ4 algorithm in this work). If compared to existing methods, EAST meets the memory constraint with lower sparsity, hence ensuring higher accuracy. Results conducted on a state-of-the-art ConvNet (ResNet-9) deployed on a low-power microcontroller (ARM Cortex-M4) validate the proposal.


Dynamic Prediction of ICU Mortality Risk Using Domain Adaptation

arXiv.org Machine Learning

Early recognition of risky trajectories during an Intensive Care Unit (ICU) stay is one of the key steps towards improving patient survival. Learning trajectories from physiological signals continuously measured during an ICU stay requires learning time-series features that are robust and discriminative across diverse patient populations. Patients within different ICU populations (referred here as domains) vary by age, conditions and interventions. Thus, mortality prediction models using patient data from a particular ICU population may perform suboptimally in other populations because the features used to train such models have different distributions across the groups. In this paper, we explore domain adaptation strategies in order to learn mortality prediction models that extract and transfer complex temporal features from multivariate time-series ICU data. Features are extracted in a way that the state of the patient in a certain time depends on the previous state. This enables dynamic predictions and creates a mortality risk space that describes the risk of a patient at a particular time. Experiments based on cross-ICU populations reveals that our model outperforms all considered baselines. Gains in terms of AUC range from 4% to 8% for early predictions when compared with a recent state-of-the-art representative for ICU mortality prediction. In particular, models for the Cardiac ICU population achieve AUC numbers as high as 0.88, showing excellent clinical utility for early mortality prediction. Finally, we present an explanation of factors contributing to the possible ICU outcomes, so that our models can be used to complement clinical reasoning.


Are Transformers universal approximators of sequence-to-sequence functions?

arXiv.org Machine Learning

Despite the widespread adoption of Transformer models for NLP tasks, the expressive power of these models is not well-understood. In this paper, we establish that Transformer models are universal approximators of continuous permutation equivariant sequence-to-sequence functions with compact support, which is quite surprising given the amount of shared parameters in these models. Furthermore, using positional encodings, we circumvent the restriction of permutation equivariance, and show that Transformer models can universally approximate arbitrary continuous sequence-to-sequence functions on a compact domain. Interestingly, our proof techniques clearly highlight the different roles of the self-attention and the feed-forward layers in Transformers. In particular, we prove that fixed width self-attention layers can compute contextual mappings of the input sequences, playing a key role in the universal approximation property of Transformers. Based on this insight from our analysis, we consider other simpler alternatives to self-attention layers and empirically evaluate them.


Recommendations and User Agency: The Reachability of Collaboratively-Filtered Information

arXiv.org Machine Learning

Recommender systems often rely on models which are trained to maximize accuracy in predicting user preferences. When the systems are deployed, these models determine the availability of content and information to different users. The gap between these objectives gives rise to a potential for unintended consequences, contributing to phenomena such as filter bubbles and polarization. In this work, we consider directly the information availability problem through the lens of user recourse. Using ideas of reachability, we propose a computationally efficient audit for top-$N$ linear recommender models. Furthermore, we describe the relationship between model complexity and the effort necessary for users to exert control over their recommendations. We use this insight to provide a novel perspective on the user cold-start problem. Finally, we demonstrate these concepts with an empirical investigation of a state-of-the-art model trained on a widely used movie ratings dataset.