Goto

Collaborating Authors

Wang, Yujing


Deeper Insights into Weight Sharing in Neural Architecture Search

arXiv.org Machine Learning

With the success of deep neural networks, Neural Architecture Search (NAS) as a way of automatic model design has attracted wide attention. As training every child model from scratch is very time-consuming, recent works leverage weight-sharing to speed up the model evaluation procedure. These approaches greatly reduce computation by maintaining a single copy of weights on the super-net and share the weights among every child model. However, weight-sharing has no theoretical guarantee and its impact has not been well studied before. In this paper, we conduct comprehensive experiments to reveal the impact of weight-sharing: (1) The best-performing models from different runs or even from consecutive epochs within the same run have significant variance; (2) Even with high variance, we can extract valuable information from training the super-net with shared weights; (3) The interference between child models is a main factor that induces high variance; (4) Properly reducing the degree of weight sharing could effectively reduce variance and improve performance.


TextNAS: A Neural Architecture Search Space tailored for Text Representation

arXiv.org Machine Learning

Learning text representation is crucial for text classification and other language related tasks. There are a diverse set of text representation networks in the literature, and how to find the optimal one is a non-trivial problem. Recently, the emerging Neural Architecture Search (NAS) techniques have demonstrated good potential to solve the problem. Nevertheless, most of the existing works of NAS focus on the search algorithms and pay little attention to the search space. In this paper, we argue that the search space is also an important human prior to the success of NAS in different applications. Thus, we propose a novel search space tailored for text representation. Through automatic search, the discovered network architecture outperforms state-of-the-art models on various public datasets on text classification and natural language inference tasks. Furthermore, some of the design principles found in the automatic network agree well with human intuition.


An Anatomy of Graph Neural Networks Going Deep via the Lens of Mutual Information: Exponential Decay vs. Full Preservation

arXiv.org Machine Learning

Graph Convolutional Network (GCN) has attracted intensive interests recently. One major limitation of GCN is that it often cannot benefit from using a deep architecture, while traditional CNN and an alternative Graph Neural Network architecture, namely GraphCNN, often achieve better quality with a deeper neural architecture. How can we explain this phenomenon? In this paper, we take the first step towards answering this question. We first conduct a systematic empirical study on the accuracy of GCN, GraphCNN, and ResNet-18 on 2D images and identified relative importance of different factors in architectural design. This inspired a novel theoretical analysis on the mutual information between the input and the output after l GCN/ GraphCNN layers. We identified regimes in which GCN suffers exponentially fast "information lose" and show that GraphCNN requires a much weaker condition for similar behavior to happen. Extending convolutional neural networks (CNN) over images to a graph has attracted intense interest recently. One early attempt is the GCN model proposed by Kipf & Welling (2016a). However, when applying GCN to many practical applications, one discrepancy lingers -- although traditional CNN usually gets higher accuracy when it goes deeper, GCN, as a natural extension of CNN, does not seem to benefit much from going deeper by stacking multiple layers together. This phenomenon has been the focus of multiple recent papers (Li et al., 2018; 2019; Oono & Suzuki, 2019). On the theoretical side, Li et al. (2018) and Oono & Suzuki (2019) identified the problem as oversmoothing -- under certain conditions, when multiple GCN layers are stacked together, the output will converge to a region that is independent of weights and inputs. On the empirical side, Li et al. (2019) showed that many techniques that were designed to train a deep CNN, e.g., the skip connections in ResNet (He et al., 2016a), can also make it easier for GCN to go deeper.


Time-Series Anomaly Detection Service at Microsoft

arXiv.org Machine Learning

Large companies need to monitor various metrics (for example, Page Views and Revenue) of their applications and services in real time. At Microsoft, we develop a time-series anomaly detection service which helps customers to monitor the time-series continuously and alert for potential incidents on time. In this paper, we introduce the pipeline and algorithm of our anomaly detection service, which is designed to be accurate, efficient and general. The pipeline consists of three major modules, including data ingestion, experimentation platform and online compute. To tackle the problem of time-series anomaly detection, we propose a novel algorithm based on Spectral Residual (SR) and Convolutional Neural Network (CNN). Our work is the first attempt to borrow the SR model from visual saliency detection domain to time-series anomaly detection. Moreover, we innovatively combine SR and CNN together to improve the performance of SR model. Our approach achieves superior experimental results compared with state-of-the-art baselines on both public datasets and Microsoft production data.


Source Free Transfer Learning for Text Classification

AAAI Conferences

Transfer learning uses relevant auxiliary data to help the learning task in a target domain where labeled data is usually insufficient to train an accurate model. Given appropriate auxiliary data, researchers have proposed many transfer learning models. How to find such auxiliary data, however, is of little research so far. In this paper, we focus on the problem of auxiliary data retrieval, and propose a transfer learning framework that effectively selects helpful auxiliary data from an open knowledge space (e.g. the World Wide Web). Because there is no need of manually selecting auxiliary data for different target domain tasks, we call our framework Source Free Transfer Learning (SFTL). For each target domain task, SFTL framework iteratively queries for the helpful auxiliary data based on the learned model and then updates the model using the retrieved auxiliary data. We highlight the automatic constructions of queries and the robustness of the SFTL framework. Our experiments on 20NewsGroup dataset and a Google search snippets dataset suggest that the framework is capable of achieving comparable performance to those state-of-the-art methods with dedicated selections of auxiliary data.


Lu

AAAI Conferences

Transfer learning uses relevant auxiliary data to help the learning task in a target domain where labeled data is usually insufficient to train an accurate model. Given appropriate auxiliary data, researchers have proposed many transfer learning models. How to find such auxiliary data, however, is of little research so far. In this paper, we focus on the problem of auxiliary data retrieval, and propose a transfer learning framework that effectively selects helpful auxiliary data from an open knowledge space (e.g. the World Wide Web). Because there is no need of manually selecting auxiliary data for different target domain tasks, we call our framework Source Free Transfer Learning (SFTL). For each target domain task, SFTL framework iteratively queries for the helpful auxiliary data based on the learned model and then updates the model using the retrieved auxiliary data. We highlight the automatic constructions of queries and the robustness of the SFTL framework. Our experiments on 20NewsGroup dataset and a Google search snippets dataset suggest that the framework is capable of achieving comparable performance to those state-of-the-art methods with dedicated selections of auxiliary data.


Ranking Scientific Articles by Exploiting Citations, Authors, Journals, and Time Information

AAAI Conferences

Ranking scientific articles is an important but challenging task, partly due to the dynamic nature of the evolving publication network. In this paper, we mainly focus on two problems: (1) how to rank articles in the heterogeneous network; and (2) how to use time information in the dynamic network in order to obtain a better ranking result. To tackle the problems, we propose a graph based ranking method, which utilizes citations, authors, journals/conferences and the publication time information collaboratively. The experiments were carried out on two public datasets. The result shows that our approach is practical and ranks scientific articles more accurately than the state-of-art methods.