Deep Learning
Learning Context-Sensitive Word Embeddings with Neural Tensor Skip-Gram Model
Liu, Pengfei (Fudan University) | Qiu, Xipeng (Fudan University) | Huang, Xuanjing (Fudan University)
Distributed word representations have a rising interest in NLP community. Most of existing models assume only one vector for each individual word, which ignores polysemy and thus degrades their effectiveness for downstream tasks. To address this problem, some recent work adopts multi-prototype models to learn multiple embeddings per word type. In this paper, we distinguish the different senses of each word by their latent topics. We present a general architecture to learn the word and topic embeddings efficiently, which is an extension to the Skip-Gram model and can model the interaction between words and topics simultaneously. The experiments on the word similarity and text classification tasks show our model outperforms state-of-the-art methods.
Incorporating Domain and Sentiment Supervision in Representation Learning for Domain Adaptation
Liu, Biao (Tsinghua University) | Huang, Minlie (Tsinghua University) | Sun, Jiashen (Samsung Research and Development Institute) | Zhu, Xuan (Samsung Research and Development Institute)
Domain adaptation aims at learning robust classifiers across domains using labeled data from a source domain. Representation learning methods, which project the original features to a new feature space, have been proved to be quite effective for this task. However, these unsupervised methods neglect the domain information of the input and are not specialized for the classification task. In this work, we address two key factors to guide the representation learning process for domain adaptation of sentiment classification โ one is domain supervision, enforcing the learned representation to better predict the domain of an input, and the other is sentiment supervision which utilizes the source domain sentiment labels to learn sentiment-favorable representations. Experimental results show that these two factors significantly improve the proposed models as expected.
Reader-Aware Multi-Document Summarization via Sparse Coding
Li, Piji (The Chinese University of Hong Kong) | Bing, Lidong (Carnegie Mellon University) | Lam, Wai (The Chinese University of Hong Kong) | Li, Hang (Huawei Technologies) | Liao, Yi (The Chinese University of Hong Kong)
We propose a new MDS paradigm called reader-aware multi-document summarization (RA-MDS).Specifically, a set of reader comments associated with the news reports are also collected. The generated summaries from the reports for the event should be salient according to not only the reports but also the reader comments. To tackle this RA-MDS problem, we propose a sparse-coding-based method that is able to calculate the salience of the text units by jointly considering news reports and reader comments. Another reader-aware characteristic of our framework is to improve linguistic quality via entity rewriting. The rewriting consideration is jointly assessed together with other summarization requirements under a unified optimization model. To support the generation of compressive summaries via optimization, we explore a finer syntactic unit, namely, noun/verb phrase. In this work, we also generate a data set for conducting RA-MDS. Extensive experiments on this data set and some classical data sets demonstrate the effectiveness of our proposed approach.
Character-Based Parsing with Convolutional Neural Network
Zheng, Xiaoqing (Fudan University) | Peng, Haoyuan (Fudan University) | Chen, Yi (Fudan University) | Zhang, Pengjing (Fudan University) | Zhang, Wenqiang (Fudan University)
We describe a novel convolutional neural network architecture with k-max pooling layer that is able to successfully recover the structure of Chinese sentences. This network can capture active features for unseen segments of a sentence to measure how likely the segments are merged to be the constituents. Given an input sentence, after all the scores of possible segments are computed, an efficient dynamic programming parsing algorithm is used to find the globally optimal parse tree. A similar network is then applied to predict syntactic categories for every node in the parse tree. Our networks archived competitive performance to existing benchmark parsers on the CTB-5 dataset without any task-specific feature engineering.
Scalable Bayesian Optimization Using Deep Neural Networks
Snoek, Jasper, Rippel, Oren, Swersky, Kevin, Kiros, Ryan, Satish, Nadathur, Sundaram, Narayanan, Patwary, Md. Mostofa Ali, Prabhat, null, Adams, Ryan P.
Bayesian optimization is an effective methodology for the global optimization of functions with expensive evaluations. It relies on querying a distribution over functions defined by a relatively cheap surrogate model. An accurate model for this distribution over functions is critical to the effectiveness of the approach, and is typically fit using Gaussian processes (GPs). However, since GPs scale cubically with the number of observations, it has been challenging to handle objectives whose optimization requires many evaluations, and as such, massively parallelizing the optimization. In this work, we explore the use of neural networks as an alternative to GPs to model distributions over functions. We show that performing adaptive basis function regression with a neural network as the parametric form performs competitively with state-of-the-art GP-based approaches, but scales linearly with the number of data rather than cubically. This allows us to achieve a previously intractable degree of parallelism, which we apply to large scale hyperparameter optimization, rapidly finding competitive models on benchmark object recognition tasks using convolutional networks, and image caption generation using neural language models.
Ego-Object Discovery
Lifelogging devices are spreading faster everyday. This growth can represent great benefits to develop methods for extraction of meaningful information about the user wearing the device and his/her environment. In this paper, we propose a semi-supervised strategy for easily discovering objects relevant to the person wearing a first-person camera. Given an egocentric video/images sequence acquired by the camera, our algorithm uses both the appearance extracted by means of a convolutional neural network and an object refill methodology that allows to discover objects even in case of small amount of object appearance in the collection of images. An SVM filtering strategy is applied to deal with the great part of the False Positive object candidates found by most of the state of the art object detectors. We validate our method on a new egocentric dataset of 4912 daily images acquired by 4 persons as well as on both PASCAL 2012 and MSRC datasets. We obtain for all of them results that largely outperform the state of the art approach. We make public both the EDUB dataset and the algorithm code.
Dependency Recurrent Neural Language Models for Sentence Completion
Mirowski, Piotr, Vlachos, Andreas
Recent work on language modelling has shifted focus from count-based models to neural models. In these works, the words in each sentence are always considered in a left-to-right order. In this paper we show how we can improve the performance of the recurrent neural network (RNN) language model by incorporating the syntactic dependencies of a sentence, which have the effect of bringing relevant contexts closer to the word being predicted. We evaluate our approach on the Microsoft Research Sentence Completion Challenge and show that the dependency RNN proposed improves over the RNN by about 10 points in accuracy. Furthermore, we achieve results comparable with the state-of-the-art models on this task.
Bootstrapped Thompson Sampling and Deep Exploration
Osband, Ian, Van Roy, Benjamin
This technical note presents a new approach to carrying out the kind of exploration achieved by Thompson sampling, but without explicitly maintaining or sampling from posterior distributions. The approach is based on a bootstrap technique that uses a combination of observed and artificially generated data. The latter serves to induce a prior distribution which, as we will demonstrate, is critical to effective exploration. We explain how the approach can be applied to multi-armed bandit and reinforcement learning problems and how it relates to Thompson sampling. The approach is particularly well-suited for contexts in which exploration is coupled with deep learning, since in these settings, maintaining or generating samples from a posterior distribution becomes computationally infeasible.
Natural Neural Networks
Desjardins, Guillaume, Simonyan, Karen, Pascanu, Razvan, Kavukcuoglu, Koray
We introduce Natural Neural Networks, a novel family of algorithms that speed up convergence by adapting their internal representation during training to improve conditioning of the Fisher matrix. In particular, we show a specific example that employs a simple and efficient reparametrization of the neural network weights by implicitly whitening the representation obtained at each layer, while preserving the feed-forward computation of the network. Such networks can be trained efficiently via the proposed Projected Natural Gradient Descent algorithm (PRONG), which amortizes the cost of these reparametrizations over many parameter updates and is closely related to the Mirror Descent online learning algorithm. We highlight the benefits of our method on both unsupervised and supervised learning tasks, and showcase its scalability by training on the large-scale ImageNet Challenge dataset.
Diffusion Nets
Mishne, Gal, Shaham, Uri, Cloninger, Alexander, Cohen, Israel
Non-linear manifold learning enables high-dimensional data analysis, but requires out-of-sample-extension methods to process new data points. In this paper, we propose a manifold learning algorithm based on deep learning to create an encoder, which maps a high-dimensional dataset and its low-dimensional embedding, and a decoder, which takes the embedded data back to the high-dimensional space. Stacking the encoder and decoder together constructs an autoencoder, which we term a diffusion net, that performs out-of-sample-extension as well as outlier detection. We introduce new neural net constraints for the encoder, which preserves the local geometry of the points, and we prove rates of convergence for the encoder. Also, our approach is efficient in both computational complexity and memory requirements, as opposed to previous methods that require storage of all training points in both the high-dimensional and the low-dimensional spaces to calculate the out-of-sample-extension and the pre-image.