Country
Semantic Bottleneck Scene Generation
Azadi, Samaneh, Tschannen, Michael, Tzeng, Eric, Gelly, Sylvain, Darrell, Trevor, Lucic, Mario
Coupling the high-fidelity generation capabilities of label-conditional image synthesis methods with the flexibility of unconditional generative models, we propose a semantic bottleneck GAN model for unconditional synthesis of complex scenes. W e assume pixel-wise segmentation labels are available during training and use them to learn the scene structure. During inference, our model first synthesizes a realistic segmentation layout from scratch, then synthesizes a realistic scene conditioned on that layout. F or the former, we use an unconditional progressive segmentation generation network that captures the distribution of realistic semantic scene layouts. F or the latter, we use a conditional segmentation-to-image synthesis network that captures the distribution of photo-realistic images conditioned on the semantic layout. When trained end-to-end, the resulting model outperforms state-of-the-art generative models in unsupervised image synthesis on two challenging domains in terms of the Fr echet Inception Distance and user-study evaluations. Moreover, we demonstrate the generated segmentation maps can be used as additional training data to strongly improve recent segmentation-to-image synthesis networks.
High Dimensional M-Estimation with Missing Outcomes: A Semi-Parametric Framework
Chakrabortty, Abhishek, Lu, Jiarui, Cai, T. Tony, Li, Hongzhe
We consider high dimensional $M$-estimation in settings where the response $Y$ is possibly missing at random and the covariates $\mathbf{X} \in \mathbb{R}^p$ can be high dimensional compared to the sample size $n$. The parameter of interest $\boldsymbol{\theta}_0 \in \mathbb{R}^d$ is defined as the minimizer of the risk of a convex loss, under a fully non-parametric model, and $\boldsymbol{\theta}_0$ itself is high dimensional which is a key distinction from existing works. Standard high dimensional regression and series estimation with possibly misspecified models and missing $Y$ are included as special cases, as well as their counterparts in causal inference using 'potential outcomes'. Assuming $\boldsymbol{\theta}_0$ is $s$-sparse ($s \ll n$), we propose an $L_1$-regularized debiased and doubly robust (DDR) estimator of $\boldsymbol{\theta}_0$ based on a high dimensional adaptation of the traditional double robust (DR) estimator's construction. Under mild tail assumptions and arbitrarily chosen (working) models for the propensity score (PS) and the outcome regression (OR) estimators, satisfying only some high-level conditions, we establish finite sample performance bounds for the DDR estimator showing its (optimal) $L_2$ error rate to be $\sqrt{s (\log d)/ n}$ when both models are correct, and its consistency and DR properties when only one of them is correct. Further, when both the models are correct, we propose a desparsified version of our DDR estimator that satisfies an asymptotic linear expansion and facilitates inference on low dimensional components of $\boldsymbol{\theta}_0$. Finally, we discuss various of choices of high dimensional parametric/semi-parametric working models for the PS and OR estimators. All results are validated via detailed simulations.
An Autonomous Spectrum Management Scheme for Unmanned Aerial Vehicle Networks in Disaster Relief Operations
Shamsoshoara, Alireza, Afghah, Fatemeh, Razi, Abolfazl, Mousavi, Sajad, Ashdown, Jonathan, Turk, Kurt
This paper studies the problem of spectrum shortage in an unmanned aerial vehicle (UAV) network during critical missions such as wildfire monitoring, search and rescue, and disaster monitoring. Such applications involve a high demand for high-throughput data transmissions such as real-time video-, image-, and voice- streaming where the assigned spectrum to the UAV network may not be adequate to provide the desired Quality of Service (QoS). In these scenarios, the aerial network can borrow an additional spectrum from the available terrestrial networks in the trade of a relaying service for them. We propose a spectrum sharing model in which the UAVs are grouped into two classes of relaying UAVs that service the spectrum owner and the sensing UAVs that perform the disaster relief mission using the obtained spectrum. The operation of the UAV network is managed by a hierarchical mechanism in which a central controller assigns the tasks of the UAVs based on their resources and determine their operation region based on the level of priority of impacted areas and then the UAVs autonomously fine-tune their position using a model-free reinforcement learning algorithm to maximize the individual throughput and prolong their lifetime. We analyze the performance and the convergence for the proposed method analytically and with extensive simulations in different scenarios.
Bridging Disentanglement with Independence and Conditional Independence via Mutual Information for Representation Learning
Yang, Xiaojiang, Bi, Wendong, Cheng, Yu, Yan, Junchi
Existing works on disentangled representation learning usually lie on a common assumption: all factors in disentangled representations should be independent. This assumption is about the inner property of disentangled representations, while ignoring their relation with external da ta. T o tackle this problem, we propose another assumption to establish an important relation between data and its disentangled representations via mutual information: the mutual information between each factor of disentangled representations and data should be invariant to other factors. W e formulate this assumption into mathematical equations, and theoretically bridge it with independence and conditional independence of factors. Meanwhile, we show that conditional independence is satisfied in encoders of VAEs due to factorized noise in reparameterization. T o highligh t the importance of our proposed assumption, we show in experiments that violating the assumption leads to dramatic decline of disentanglement. Based on this assumption, we further propose to split the deeper layers in encoder to ensure parameters in these layers are not shared for different factors. The proposed encoder, called Split Encoder, can be applied into models that penalize total correlation, and shows significant improvement in unsupervised learning of disentangled representations and reconstructions.
When NAS Meets Robustness: In Search of Robust Architectures against Adversarial Attacks
Guo, Minghao, Yang, Yuzhe, Xu, Rui, Liu, Ziwei
Recent advances in adversarial attacks uncover the intrinsic vulnerability of modern deep neural networks. Since then, extensive efforts have been devoted to enhancing the robustness of deep networks via specialized learning algorithms and loss functions. In this work, we take an architectural perspective and investigate the patterns of network architectures that are resilient to adversarial attacks. To obtain the large number of networks needed for this study, we adopt one-shot neural architecture search, training a large network for once and then finetuning the sub-networks sampled therefrom. The sampled architectures together with the accuracies they achieve provide a rich basis for our study. Our "robust architecture Odyssey" reveals several valuable observations: 1) densely connected patterns result in improved robustness; 2) under computational budget, adding convolution operations to direct connection edge is effective; 3) flow of solution procedure (FSP) matrix is a good indicator of network robustness. Based on these observations, we discover a family of robust architectures (RobNets). On various datasets, including CIFAR, SVHN, and Tiny-ImageNet, RobNets exhibit superior robustness performance to other widely used architectures. Notably, RobNets substantially improve the robust accuracy (~5% absolute gains) under both white-box and black-box attacks, even with fewer parameter numbers.
Transfer Learning in Visual and Relational Reasoning
Jayram, T. S., Marois, Vincent, Kornuta, Tomasz, Albouy, Vincent, Sevgen, Emre, Ozcan, Ahmet S.
Transfer learning is becoming the de facto solution for vision and text encoders in the front-end processing of machine learning solutions. Utilizing vast amounts of knowledge in pre-trained models and subsequent fine-tuning allows achieving better performance in domains where labeled data is limited. In this paper, we analyze the efficiency of transfer learning in visual reasoning by introducing a new model (SAMNet) and testing it on two datasets: COG and CLEVR. Our new model achieves state-of-the-art accuracy on COG and shows significantly better generalization capabilities compared to the baseline. We also formalize a taxonomy of transfer learning for visual reasoning around three axes: feature, temporal, and reasoning transfer. Based on extensive experimentation of transfer learning on each of the two datasets, we show the performance of the new model along each axis.
Evaluating Commonsense in Pre-trained Language Models
Zhou, Xuhui, Zhang, Yue, Cui, Leyang, Huang, Dandan
Contextualized representations trained over large raw text data have given remarkable improvements for NLP tasks including question answering and reading comprehension. There have been works showing that syntactic, semantic and word sense knowledge are contained in such representations, which explains why they benefit such tasks. However, relatively little work has been done investigating commonsense knowledge contained in contextualized representations, which is crucial for human question answering and reading comprehension. We study the commonsense ability of GPT, BERT, XLNet, and RoBERTa by testing them on seven challenging benchmarks, finding that language modeling and its variants are effective objectives for promoting models' commonsense ability while bidirectional context and larger training set are bonuses. We additionally find that current models do poorly on tasks require more necessary inference steps. Finally, we test the robustness of models by making dual test cases, which are correlated so that the correct prediction of one sample should lead to correct prediction of the other. Interestingly, the models show confusion on these test cases, which suggests that they learn commonsense at the surface rather than the deep level. We release a test set, named CA Ts publicly, for future research. Introduction Contextualized representations trained over large-scale text data have given remarkable improvements to a wide range of NLP tasks, including natural language inference (Bowman et al. 2015), question answering (Rajpurkar, Jia, and Liang 2018) and reading comprehension (Lai et al. 2017). Giving new state-of-the-art results that approach or surpass human performance on several benchmark datasets, it is an interesting question what types of knowledge are learned in pre-trained contextualized representations in order to better understand how they benefit the NLP problems above. Intuitively, such knowledge is at least as useful as semantic and syntactic knowledge in natural language inference, reading comprehension and coreference resolution.
Improving Fictitious Play Reinforcement Learning with Expanding Models
Qin, Rong-Jun, Pang, Jing-Cheng, Yu, Yang
Fictitious play with reinforcement learning is a general and effective framework for zero-sum games. However, using the current deep neural network models, the implementation of fictitious play faces crucial challenges. Neural network model training employs gradient descent approaches to update all connection weights, and thus is easy to forget the old opponents after training to beat the new opponents. Existing approaches often maintain a pool of historical policy models to avoid the forgetting. However, learning to beat a pool in stochastic games, i.e., a wide distribution over policy models, is either sample-consuming or insufficient to exploit all models with limited amount of samples. In this paper, we propose a learning process with neural fictitious play to alleviate the above issues. We train a single model as our policy model, which consists of sub-models and a selector. Everytime facing a new opponent, the model is expanded by adding a new sub-model, where only the new sub-model is updated instead of the whole model. At the same time, the selector is also updated to mix up the new sub-model with the previous ones at the state-level, so that the model is maintained as a behavior strategy instead of a wide distribution over policy models. Experiments on Kuhn poker, a grid-world Treasure Hunting game, and Mini-RTS environments show that the proposed approach alleviates the forgetting problem, and consequently improves the learning efficiency and the robustness of neural fictitious play.
TimeCaps: Capturing Time Series Data with Capsule Networks
Jayasekara, Hirunima, Jayasundara, Vinoj, Rajasegaran, Jathushan, Jayasekara, Sandaru, Senevirathne, Suranga, Rodrigo, Ranga
Electrocardiogram (ECG) signal analysis plays a vital role in medical diagnosis since ECG signal can provide vital information that can help to diagnose various health conditions. For example, ECG beat classification; e.g classifying ECG signal portions in to classes such as normal beats or different arrhythmia types such as atrial fibrillation, premature contraction, or ventricular fibrillation allows to identify different cardiovascular diseases. Similarly, ECG signal compression and reconstruction have a variety of applications such as remote cardiac monitoring in body sensor nodes (Mamaghanian et al., 2011) and achieving low power consumption when sending and processing data through IoT -gateways (Al Disi et al., 2018). ECG signal analysis and classification was predominantly done using signal processing methods such as wavelet transformation or independent component analysis or feature driven classical machine learning methods (Y u and Chou, 2008; Martis et al., 2013; Kim et al., 2009; Li and Zhou, 2016). However such methods have left room for further improvements in terms of accuracy and the manual feature curation is a daunting task. Recently 1D Convolutions have been tried on ECG classification producing some promising results (Li et al., 2017; Acharya et al., 2017), Nonetheless, these methods do not perform well for the classes with less volumes of training data.
Defending Against Adversarial Machine Learning
An Adversarial System to attack and an Authorship Attribution System (AAS) to defend itself against the attacks are analyzed. Defending a system against attacks from an adversarial machine learner can be done by randomly switching between models for the system, by detecting and reacting to changes in the distribution of normal inputs, or by using other methods. Adversarial machine learning is used to identify a system that is being used to map system inputs to outputs. Three types of machine learners are using for the model that is being attacked. The machine learners that are used to model the system being attacked are a Radial Basis Function Support Vector Machine, a Linear Support Vector Machine, and a Feedforward Neural Network. The feature masks are evolved using accuracy as the fitness measure. The system defends itself against adversarial machine learning attacks by identifying inputs that do not match the probability distribution of normal inputs. The system also defends itself against adversarial attacks by randomly switching between the feature masks being used to map system inputs to outputs.