Country
Word2Vec: Optimal Hyper-Parameters and Their Impact on NLP Downstream Tasks
Adewumi, Tosin P., Liwicki, Foteini, Liwicki, Marcus
Word2Vec is a prominent tool for Natural Language Processing (NLP) tasks. Similar inspiration is found in distributed embeddings for state-of-the-art (sota) deep neural networks. However, wrong combination of hyper-parameters can produce poor quality vectors. The objective of this work is to show optimal combination of hyper-parameters exists and evaluate various combinations. We compare them with the original model released by Mikolov. Both intrinsic and extrinsic (downstream) evaluations, including Named Entity Recognition (NER) and Sentiment Analysis (SA) were carried out. The downstream tasks reveal that the best model is task-specific, high analogy scores don't necessarily correlate positively with F1 scores and the same applies for more data. Increasing vector dimension size after a point leads to poor quality or performance. If ethical considerations to save time, energy and the environment are made, then reasonably smaller corpora may do just as well or even better in some cases. Besides, using a small corpus, we obtain better human-assigned WordSim scores, corresponding Spearman correlation and better downstream (NER & SA) performance compared to Mikolov's model, trained on 100 billion word corpus.
DeepSIP: A System for Predicting Service Impact of Network Failure by Temporal Multimodal CNN
Matsuo, Yoichi, Kimura, Tatsuaki, Nishimatsu, Ken
When a failure occurs in a network, network operators need to recognize service impact, since service impact is essential information for handling failures. In this paper, we propose Deep learning based Service Impact Prediction (DeepSIP), a system to predict the time to recovery from the failure and the loss of traffic volume due to the failure in a network element using a temporal multimodal convolutional neural network (CNN). Since the time to recovery is useful information for a service level agreement (SLA) and the loss of traffic volume is directly related to the severity of the failures, we regard these as the service impact. The service impact is challenging to predict, since a network element does not explicitly contain any information about the service impact. Thus, we aim to predict the service impact from syslog messages and traffic volume by extracting hidden information about failures. To extract useful features for prediction from syslog messages and traffic volume which are multimodal and strongly correlated, and have temporal dependencies, we use temporal multimodal CNN. We experimentally evaluated DeepSIP and DeepSIP reduced prediction error by approximately 50% in comparison with other NN-based methods with a synthetic dataset.
Spatio-Temporal Graph Convolution for Functional MRI Analysis
Gadgil, Soham, Zhao, Qingyu, Adeli, Ehsan, Pfefferbaum, Adolf, Sullivan, Edith V., Pohl, Kilian M.
The BOLD signal of resting-state fMRI (rs-fMRI) records the functional brain connectivity in a rich dynamic spatio-temporal setting. However, existing methods applied to rs-fMRI often fail to consider both spatial and temporal characteristics of the data. They either neglect the functional dependency between different brain regions in a network or discard the information in the temporal dynamics of brain activity. To overcome those shortcomings, we propose to formulate functional connectivity networks within the context of spatio-temporal graphs. We then train a spatio-temporal graph convolutional network (ST-GCN) on short sub-sequences of the BOLD time series to model the non-stationary nature of functional connectivity. We simultaneously learn the graph edge importance within ST-GCN to enable interpretation of functional connectivities contributing to the prediction model. In analyzing the rs-fMRI of the Human Connectome Project (HCP, N=1,091) and the National Consortium on Alcohol and Neurodevelopment in Adolescence (NCANDA, N=773), ST-GCN is significantly more accurate than common approaches in predicting gender and age based on BOLD signals. The matrix recording edge importance localizes brain regions and functional connections with significant aging and sex effects, which are verified by the neuroscience literature.
Defense Through Diverse Directions
Bender, Christopher M., Li, Yang, Shi, Yifeng, Reiter, Michael K., Oliva, Junier B.
In this work we develop a novel Bayesian neural network methodology to achieve strong adversarial robustness without the need for online adversarial training. Unlike previous efforts in this direction, we do not rely solely on the stochasticity of network weights by minimizing the divergence between the learned parameter distribution and a prior. Instead, we additionally require that the model maintain some expected uncertainty with respect to all input covariates. We demonstrate that by encouraging the network to distribute evenly across inputs, the network becomes less susceptible to localized, brittle features which imparts a natural robustness to targeted perturbations. We show empirical robustness on several benchmark datasets.
Adversarial Perturbations Fool Deepfake Detectors
This work uses adversarial perturbations to enhance deepfake images and fool common deepfake detectors. We created adversarial perturbations using the Fast Gradient Sign Method and the Carlini and Wagner L2 norm attack in both blackbox and whitebox settings. Detectors achieved over 95% accuracy on unperturbed deepfakes, but less than 27% accuracy on perturbed deepfakes. We also explore two improvements to deepfake detectors: (i) Lipschitz regularization, and (ii) Deep Image Prior (DIP). Lipschitz regularization constrains the gradient of the detector with respect to the input in order to increase robustness to input perturbations. The DIP defense removes perturbations using generative convolutional neural networks in an unsupervised manner. Regularization improved the detection of perturbed deepfakes on average, including a 10% accuracy boost in the blackbox case. The DIP defense achieved 95% accuracy on perturbed deepfakes that fooled the original detector, while retaining 98% accuracy in other cases on a 100 image subsample.
Meta Pseudo Labels
Pham, Hieu, Xie, Qizhe, Dai, Zihang, Le, Quoc V.
Many training algorithms of a deep neural network can be interpreted as minimizing the cross entropy loss between the prediction made by the network and a target distribution. In supervised learning, this target distribution is typically the ground-truth one-hot vector. In semi-supervised learning, this target distribution is typically generated by a pre-trained teacher model to train the main network. In this work, instead of using such predefined target distributions, we show that learning to adjust the target distribution based on the learning state of the main network can lead to better performances. In particular, we propose an efficient meta-learning algorithm, which encourages the teacher to adjust the target distributions of training examples in the manner that improves the learning of the main network. The teacher is updated by policy gradients computed by evaluating the main network on a held-out validation set. Our experiments demonstrate substantial improvements over strong baselines and establish state-ofthe-art performance on CIFAR-10, SVHN, and ImageNet. For instance, with ResNets on small datasets, we achieve 96.1% on CIFAR-10 with 4,000 labeled examples and 73.9% top-1 on ImageNet with 10% examples. Meanwhile, with EfficientNet on full datasets plus extra unlabeled data, we attain 98.6% accuracy on CIFAR-10 and 86.9% top-1 accuracy on ImageNet.
Slow and Stale Gradients Can Win the Race
Dutta, Sanghamitra, Wang, Jianyu, Joshi, Gauri
Distributed Stochastic Gradient Descent (SGD) when run in a synchronous manner, suffers from delays in runtime as it waits for the slowest workers (stragglers). Asynchronous methods can alleviate stragglers, but cause gradient staleness that can adversely affect the convergence error. In this work, we present a novel theoretical characterization of the speedup offered by asynchronous methods by analyzing the trade-off between the error in the trained model and the actual training runtime(wallclock time). The main novelty in our work is that our runtime analysis considers random straggling delays, which helps us design and compare distributed SGD algorithms that strike a balance between straggling and staleness. We also provide a new error convergence analysis of asynchronous SGD variants without bounded or exponential delay assumptions. Finally, based on our theoretical characterization of the error-runtime trade-off, we propose a method of gradually varying synchronicity in distributed SGD and demonstrate its performance on CIFAR10 dataset.
Learning End-to-End Codes for the BPSK-constrained Gaussian Wiretap Channel
Nooraiepour, Alireza, Aghdam, Sina Rezaei
Finite-length codes are learned for the Gaussian wiretap channel in an end-to-end manner assuming that the communication parties are equipped with deep neural networks (DNNs), and communicate through binary phase-shift keying (BPSK) modulation scheme. The goal is to find codes via DNNs which allow a pair of transmitter and receiver to communicate reliably and securely in the presence of an adversary aiming at decoding the secret messages. Following the information-theoretic secrecy principles, the security is evaluated in terms of mutual information utilizing a deep learning tool called MINE (mutual information neural estimation). System performance is evaluated for different DNN architectures, designed based on the existing secure coding schemes, at the transmitter. Numerical results demonstrate that the legitimate parties can indeed establish a secure transmission in this setting as the learned codes achieve points on almost the boundary of the equivocation region. I. INTRODUCTION Physical layer (PHY) security has been put forth as an alternative/aid to the higher-layer security approaches including cryptography in order to relieve the burden placed by them upon the communication systems in various ways. Wiretap channel [1] is a widely-known theoretical model for studying PHY security from an information-theoretic perspective. The importance of achieving physical layer security for this model through finite alphabet signaling like binary phase-shift keying (BPSK) modulation is highlighted in many works [1], [2]. Several works have studied this channel and designed coding schemes to ensure security for that. Specifically, the authors in [3] have proposed an encoding technique called scrambling which could result in BERs very close to 0.5 for the eavesdropper (Eve) through error propagation, while ensuring a A. Nooraiepour is with WINLAB, Department of Electrical and Computer Engineering, Rutgers University, NJ, USA. S. Rezaei Aghdam is with the Department of Electrical Engineering, Chalmers University of Technology, Gothenburg, Sweden (emails: alireza.nooraiepour@rutgers.edu, sinar@chalmers.se).
Data-driven models and computational tools for neurolinguistics: a language technology perspective
Artemova, Ekaterina, Bakarov, Amir, Artemov, Aleksey, Burnaev, Evgeny, Sharaev, Maxim
In this paper, our focus is the connection and influence of language technologies on the research in neurolinguistics. We present a review of brain imaging-based neurolinguistic studies with a focus on the natural language representations, such as word embeddings and pre-trained language models. Mutual enrichment of neurolinguistics and language technologies leads to development of brain-aware natural language representations. The importance of this research area is emphasized by medical applications.
The Internet of Things as a Deep Neural Network
Du, Rong, Magnússon, Sindri, Fischione, Carlo
An important task in the Internet of Things (IoT) is field monitoring, where multiple IoT nodes take measurements and communicate them to the base station or the cloud for processing, inference, and analysis. This communication becomes costly when the measurements are high-dimensional (e.g., videos or time-series data). The IoT networks with limited bandwidth and low power devices may not be able to support such frequent transmissions with high data rates. To ensure communication efficiency, this article proposes to model the measurement compression at IoT nodes and the inference at the base station or cloud as a deep neural network (DNN). We propose a new framework where the data to be transmitted from nodes are the intermediate outputs of a layer of the DNN. We show how to learn the model parameters of the DNN and study the tradeoff between the communication rate and the inference accuracy. The experimental results show that we can save approximately 96% transmissions with only a degradation of 2.5% in inference accuracy. Our findings have the potentiality to enable many new IoT data analysis applications generating large amount of measurements. I. INTRODUCTION Distributed detection, monitoring, and classification are important tasks in wireless sensor networks or the Internet of Things. In these tasks, sensor nodes collect measurements and send them to a base station/gateway, and further to an application server in the IP network that can then make a decision or an inference based on the measurements from the nodes.