Country
Adaptive Granularity in Tensors: A Quest for Interpretable Structure
Pasricha, Ravdeep, Gujral, Ekta, Papalexakis, Evangelos E.
Data collected at very frequent intervals is usually extremely sparse and has no structure that is exploitable by modern tensor decomposition algorithms. Thus the utility of such tensors is low, in terms of the amount of interpretable and exploitable structure that one can extract from them. In this paper, we introduce the problem of finding a tensor of adaptive aggregated granularity that can be decomposed to reveal meaningful latent concepts (structures) from datasets that, in their original form, are not amenable to tensor analysis. Such datasets fall under the broad category of sparse point processes that evolve over space and/or time. To the best of our knowledge, this is the first work that explores adaptive granularity aggregation in tensors. Furthermore, we formally define the problem and discuss what different definitions of "good structure" can be in practice, and show that optimal solution is of prohibitive combinatorial complexity. Subsequently, we propose an efficient and effective greedy algorithm which follows a number of intuitive decision criteria that locally maximize the "goodness of structure", resulting in high-quality tensors. We evaluate our method on both semi-synthetic data where ground truth is known and real datasets for which we do not have any ground truth. In both cases, our proposed method constructs tensors that have very high structure quality. Finally, our proposed method is able to discover different natural resolutions of a multi-aspect dataset, which can lead to multi-resolution analysis.
Regularized Estimation of High-Dimensional Vector AutoRegressions with Weakly Dependent Innovations
Masini, Ricardo P., Medeiros, Marcelo C., Mendes, Eduardo F.
There has been considerable advance in understanding the properties of sparse regularization procedures in high-dimensional models. Most of the work is limited to either independent and identically distributed setting, or time series with independent and/or (sub-)Gaussian innovations. We extend current literature to a broader set of innovation processes, by assuming that the error process is non-sub-Gaussian and conditionally heteroscedastic, and the generating process is not necessarily sparse. This setting covers fat tailed, conditionally dependent innovations which is of particular interest for financial risk modeling. It covers several multivariate-GARCH specifications, such as the BEKK model, and other factor stochastic volatility specifications.
Bayesian high-dimensional linear regression with generic spike-and-slab priors
Spike-and-slab priors are popular Bayesian solutions for high-dimensional linear regression problems. Previous works on theoretical properties of spike-and-slab methods focus on specific prior formulations and use prior-dependent conditions and analyses, and thus can not be generalized directly. In this paper, we propose a class of generic spike-and-slab priors and develop a unified framework to rigorously assess their theoretical properties. Technically, we provide general conditions under which generic spike-and-slab priors can achieve a nearly-optimal posterior contraction rate and model selection consistency. Our results include those of Castillo et al. (2015) and Narisetty and He (2014) as special cases.
Model Weight Theft With Just Noise Inputs: The Curious Case of the Petulant Attacker
Roberts, Nicholas, Prabhu, Vinay Uday, McAteer, Matthew
This paper explores the scenarios under which an attacker can claim that 'Noise and access to the softmax layer of the model is all you need' to steal the weights of a convolutional neural network whose architecture is already known. We were able to achieve 96% test accuracy using the stolen MNIST model and 82% accuracy using the stolen KMNIST model learned using only i.i.d. Bernoulli noise inputs. We posit that this theft-susceptibility of the weights is indicative of the complexity of the dataset and propose a new metric that captures the same. The goal of this dissemination is to not just showcase how far knowing the architecture can take you in terms of model stealing, but to also draw attention to this rather idiosyncratic weight learnability aspects of CNNs spurred by i.i.d. noise input. We also disseminate some initial results obtained with using the Ising probability distribution in lieu of the i.i.d. Bernoulli distribution.
Deep Connectomics Networks: Neural Network Architectures Inspired by Neuronal Networks
Roberts, Nicholas, Yap, Dian Ang, Prabhu, Vinay Uday
The interplay between inter-neuronal network topology and cognition has been studied deeply by connectomics researchers and network scientists, which is crucial towards understanding the remarkable efficacy of biological neural networks. Curiously, the deep learning revolution that revived neural networks has not paid much attention to topological aspects. The architectures of deep neural networks (DNNs) do not resemble their biological counterparts in the topological sense. We bridge this gap by presenting initial results of Deep Connectomics Networks (DCNs) as DNNs with topologies inspired by real-world neuronal networks. We show high classification accuracy obtained by DCNs whose architecture was inspired by the biological neuronal networks of C. Elegans and the mouse visual cortex.
Multilevel Initialization for Layer-Parallel Deep Neural Network Training
Cyr, Eric C., Gรผnther, Stefanie, Schroder, Jacob B.
This paper investigates multilevel initialization strategies for training very deep neural networks with a layer-parallel multigrid solver. The scheme is based on the continuous interpretation of the training problem as a problem of optimal control, in which neural networks are represented as discretizations of time-dependent ordinary differential equations. A key goal is to develop a method able to intelligently initialize the network parameters for the very deep networks enabled by scalable layer-parallel training. To do this, we apply a refinement strategy across the time domain, that is equivalent to refining in the layer dimension. The resulting refinements create deep networks, with good initializations for the network parameters coming from the coarser trained networks. We investigate the effectiveness of such multilevel "nested iteration" strategies for network training, showing supporting numerical evidence of reduced run time for equivalent accuracy. In addition, we study whether the initialization strategies provide a regularizing effect on the overall training process and reduce sensitivity to hyperparameters and randomness in initial network parameters.
On the Metrics and Adaptation Methods for Domain Divergences of sEMG-based Gesture Recognition
Ketykรณ, Istvรกn, Kovรกcs, Ferenc
Machine Learning (ML) is widely used for several tasks with time-series and biosensor data such as for human activity recognition, electronic health records data-based predictions (Ismail Fawaz et al., 2019), and real-time bionsensor-based decisions. V arious classification goals are addressed related to electrocardiography (ECG) (Jambukia et al., 2015), elec-troencephalography (EEG) (Craik et al., 2019; Dose et al., 2018), and electromyograpy (EMG) (Ketyk et al., 2019; Hu et al., 2018; Patricia et al., 2014; Du et al., 2017). Sensing hand gestures can be done by means of wearables or by means of image or video analysis of hand or finger motion. A wearable-based detection can physically rely on measuring the acceleration and rotations of our body parts (arms, hands or fingers) with Inertial Measurement Unit (IMU) sensors or by measuring the myo-electric signals generated by the various muscles of our arms or fingers with EMG sensors. Surface EMG (sEMG) records muscle activity from the surface of the skin which is above the muscle being evaluated. The signal is collected via surface electrodes. We are interested in sEMG-sensor placement to the forearm and performing hand gesture recognition with ML.
Enabling Smartphone-based Estimation of Heart Rate
Homdee, Nutta, Boukhechba, Mehdi, Feng, Yixue W., Kramer, Natalie, Lach, John, Barnes, Laura E.
Continuous, ubiquitous monitoring through wearable sensors has the potential to collect useful information about users' context. Heart rate is an important physiologic measure used in a wide variety of applications, such as fitness tracking and health monitoring. However, wearable sensors that monitor heart rate, such as smartwatches and electrocardiogram (ECG) patches, can have gaps in their data streams because of technical issues (e.g., bad wireless channels, battery depletion, etc.) or user-related reasons (e.g. motion artifacts, user compliance, etc.). The ability to use other available sensor data (e.g., smartphone data) to estimate missing heart rate readings is useful to cope with any such gaps, thus improving data quality and continuity. In this paper, we test the feasibility of estimating raw heart rate using smartphone sensor data. Using data generated by 12 participants in a one-week study period, we were able to build both personalized and generalized models using regression, SVM, and random forest algorithms. All three algorithms outperformed the baseline moving-average interpolation method for both personalized and generalized settings. Moreover, our findings suggest that personalized models outperformed the generalized models, which speaks to the importance of considering personal physiology, behavior, and life style in the estimation of heart rate. The promising results provide preliminary evidence of the feasibility of combining smartphone sensor data with wearable sensor data for continuous heart rate monitoring.
Pruning by Explaining: A Novel Criterion for Deep Neural Network Pruning
Yeom, Seul-Ki, Seegerer, Philipp, Lapuschkin, Sebastian, Wiedemann, Simon, Mรผller, Klaus-Robert, Samek, Wojciech
The success of convolutional neural networks (CNNs) in various applications is accompanied by a significant increase in computation and parameter storage costs. Recent efforts to reduce these overheads involve pruning and compressing the weights of various layers while at the same time aiming to not sacrifice performance. In this paper, we propose a novel criterion for CNN pruning inspired by neural network interpretability: The most relevant elements, i.e. weights or filters, are automatically found using their relevance score in the sense of explainable AI (XAI). By that we for the first time link the two disconnected lines of interpretability and model compression research. We show in particular that our proposed method can efficiently prune transfer-learned CNN models where networks pre-trained on large corpora are adapted to specialized tasks. To this end, the method is evaluated on a broad range of computer vision datasets. Notably, our novel criterion is not only competitive or better compared to state-of-the-art pruning criteria when successive retraining is performed, but clearly outperforms these previous criteria in the common application setting where the data of the task to be transferred to are very scarce and no retraining is possible. Our method can iteratively compress the model while maintaining or even improving accuracy. At the same time, it has a computational cost in the order of gradient computation and is comparatively simple to apply without the need for tuning hyperparameters for pruning.
Boltzmann Exploration Expectation-Maximisation
We present a general method for fitting finite mixture models (FMM). Learning in a mixture model consists of finding the most likely cluster assignment for each data-point, as well as finding the parameters of the clusters themselves. In many mixture models, this is difficult with current learning methods, where the most common approach is to employ monotone learning algorithms e.g. the conventional expectation-maximisation algorithm. While effective, the success of any monotone algorithm is crucially dependant on good parameter initialisation, where a common choice is $K$-means initialisation, commonly employed for Gaussian mixture models. For other types of mixture models, the path to good initialisation parameters is often unclear and may require a problem-specific solution. To this end, we propose a general heuristic learning algorithm that utilises Boltzmann exploration to assign each observation to a specific base distribution within the mixture model, which we call Boltzmann exploration expectation-maximisation (BEEM). With BEEM, hard assignments allow straight forward parameter learning for each base distribution by conditioning only on its assigned observations. Consequently, it can be applied to mixtures of any base distribution where single component parameter learning is tractable. The stochastic learning procedure is able to escape local optima and is thus insensitive to parameter initialisation. We show competitive performance on a number of synthetic benchmark cases as well as on real-world datasets.