Jin, Hongxia
A New Concept of Deep Reinforcement Learning based Augmented General Sequence Tagging System
Wang, Yu, Patel, Abhishek, Jin, Hongxia
In this paper, a new deep reinforcement learning based augmented general sequence tagging system is proposed. The new system contains two parts: a deep neural network (DNN) based sequence tagging model and a deep reinforcement learning (DRL) based augmented tagger. The augmented tagger helps improve system performance by modeling the data with minority tags. The new system is evaluated on SLU and NLU sequence tagging tasks using ATIS and CoNLL-2003 benchmark datasets, to demonstrate the new system's outstanding performance on general tagging tasks. Evaluated by F1 scores, it shows that the new system outperforms the current state-of-the-art model on ATIS dataset by 1.9 % and that on CoNLL-2003 dataset by 1.4 %.
A Bi-model based RNN Semantic Frame Parsing Model for Intent Detection and Slot Filling
Wang, Yu, Shen, Yilin, Jin, Hongxia
Intent detection and slot filling are two main tasks for building a spoken language understanding(SLU) system. Multiple deep learning based models have demonstrated good results on these tasks . The most effective algorithms are based on the structures of sequence to sequence models (or "encoder-decoder" models), and generate the intents and semantic tags either using separate models or a joint model. Most of the previous studies, however, either treat the intent detection and slot filling as two separate parallel tasks, or use a sequence to sequence model to generate both semantic tags and intent. Most of these approaches use one (joint) NN based model (including encoder-decoder structure) to model two tasks, hence may not fully take advantage of the cross-impact between them. In this paper, new Bi-model based RNN semantic frame parsing network structures are designed to perform the intent detection and slot filling tasks jointly, by considering their cross-impact to each other using two correlated bidirectional LSTMs (BLSTM). Our Bi-model structure with a decoder achieves state-of-the-art result on the benchmark ATIS data, with about 0.5$\%$ intent accuracy improvement and 0.9 $\%$ slot filling improvement.
User Information Augmented Semantic Frame Parsing using Coarse-to-Fine Neural Networks
Shen, Yilin, Zeng, Xiangyu, Wang, Yu, Jin, Hongxia
Semantic frame parsing is a crucial component in spoken language understanding (SLU) to build spoken dialog systems. It has two main tasks: intent detection and slot filling. Although state-of-the-art approaches showed good results, they require large annotated training data and long training time. In this paper, we aim to alleviate these drawbacks for semantic frame parsing by utilizing the ubiquitous user information. We design a novel coarse-to-fine deep neural network model to incorporate prior knowledge of user information intermediately to better and quickly train a semantic frame parser. Due to the lack of benchmark dataset with real user information, we synthesize the simplest type of user information (location and time) on ATIS benchmark data. The results show that our approach leverages such simple user information to outperform state-of-the-art approaches by 0.25% for intent detection and 0.31% for slot filling using standard training data. When using smaller training data, the performance improvement on intent detection and slot filling reaches up to 1.35% and 1.20% respectively. We also show that our approach can achieve similar performance as state-of-the-art approaches by using less than 80% annotated training data. Moreover, the training time to achieve the similar performance is also reduced by over 60%.
Beyond Word Embeddings: Learning Entity and Concept Representations from Large Scale Knowledge Bases
Shalaby, Walid, Zadrozny, Wlodek, Jin, Hongxia
Text representations using neural word embeddings have proven effective in many NLP applications. Recent researches adapt the traditional word embedding models to learn vectors of multiword expressions (concepts/entities). However, these methods are limited to textual knowledge bases (e.g., Wikipedia). In this paper, we propose a novel and simple technique for integrating the knowledge about concepts from two large scale knowledge bases of different structure (Wikipedia and Probase) in order to learn concept representations. We adapt the efficient skip-gram model to seamlessly learn from the knowledge in Wikipedia text and Probase concept graph. We evaluate our concept embedding models on two tasks: (1) analogical reasoning, where we achieve a state-of-the-art performance of 91% on semantic analogies, (2) concept categorization, where we achieve a state-of-the-art performance on two benchmark datasets achieving categorization accuracy of 100% on one and 98% on the other. Additionally, we present a case study to evaluate our model on unsupervised argument type identification for neural semantic parsing. We demonstrate the competitive accuracy of our unsupervised method and its ability to better generalize to out of vocabulary entity mentions compared to the tedious and error prone methods which depend on gazetteers and regular expressions.
Deep Neural Network Approximation using Tensor Sketching
Kasiviswanathan, Shiva Prasad, Narodytska, Nina, Jin, Hongxia
Deep neural networks are powerful learning models that achieve state-of-the-art performance on many computer vision, speech, and language processing tasks. In this paper, we study a fundamental question that arises when designing deep network architectures: Given a target network architecture can we design a smaller network architecture that approximates the operation of the target network? The question is, in part, motivated by the challenge of parameter reduction (compression) in modern deep neural networks, as the ever increasing storage and memory requirements of these networks pose a problem in resource constrained environments. In this work, we focus on deep convolutional neural network architectures, and propose a novel randomized tensor sketching technique that we utilize to develop a unified framework for approximating the operation of both the convolutional and fully connected layers. By applying the sketching technique along different tensor dimensions, we design changes to the convolutional and fully connected layers that substantially reduce the number of effective parameters in a network. We show that the resulting smaller network can be trained directly, and has a classification accuracy that is comparable to the original network.
Private Incremental Regression
Kasiviswanathan, Shiva Prasad, Nissim, Kobbi, Jin, Hongxia
Data is continuously generated by modern data sources, and a recent challenge in machine learning has been to develop techniques that perform well in an incremental (streaming) setting. In this paper, we investigate the problem of private machine learning, where as common in practice, the data is not given at once, but rather arrives incrementally over time. We introduce the problems of private incremental ERM and private incremental regression where the general goal is to always maintain a good empirical risk minimizer for the history observed under differential privacy. Our first contribution is a generic transformation of private batch ERM mechanisms into private incremental ERM mechanisms, based on a simple idea of invoking the private batch ERM procedure at some regular time intervals. We take this construction as a baseline for comparison. We then provide two mechanisms for the private incremental regression problem. Our first mechanism is based on privately constructing a noisy incremental gradient function, which is then used in a modified projected gradient procedure at every timestep. This mechanism has an excess empirical risk of $\approx\sqrt{d}$, where $d$ is the dimensionality of the data. While from the results of [Bassily et al. 2014] this bound is tight in the worst-case, we show that certain geometric properties of the input and constraint set can be used to derive significantly better results for certain interesting regression problems.
Privacy-CNH: A Framework to Detect Photo Privacy with Convolutional Neural Network using Hierarchical Features
Tran, Lam (University of Rochester) | Kong, Deguang (Samsung Research America) | Jin, Hongxia (Samsung Research America) | Liu, Ji (University of Rochester)
Photo privacy is a very important problem in the digital age where photos are commonly shared on social networking sites and mobile devices. The main challenge in photo privacy detection is how to generate discriminant features to accurately detect privacy at risk photos. Existing photo privacy detection works, which rely on low-level vision features, are non-informative to the users regarding what privacy information is leaked from their photos. In this paper, we propose a new framework called Privacy-CNH that utilizes hierarchical features which include both object and convolutional features in a deep learning model to detect privacy at risk photos. The generation of object features enables our model to better inform the users about the reason why a photo has privacy risk. The combination of convolutional and object features provide a richer model to understand photo privacy from different aspects, thus improving photo privacy detection accuracy. Experimental results demonstrate that the proposed model outperforms the state-of-the-art work and the standard convolutional neural network (CNN) with low-level features on photo privacy detection tasks.
Content Recommendation for Attention Management in Unified Social Messaging
Jin, Hongxia (IBM Research)
With the growing popularity of social networks and collaboration systems, people are increasingly working with or socially connected with each other. Unified messaging system provides a single interface for users to receive and process information from multiple sources. It is highly desirable to design attention management solution that can help users easily navigate and process dozens of unread messages from a unified message system. Moreover, with the proliferation of mobile devices people are now selectively consuming the most important messages on the go between different activities in their daily life. The information overload problem is especially acute for mobile users with small screen to display. In this paper, we present \PAM, an intelligent end-to-end Personalized Attention Management solution that employs analytical techniques that can learn user interests and organize and prioritize incoming messages based on user interests. For a list of unread messages, \PAM generates a concise attention report that allows users to quickly scan the important new messages from his important social connections as well as messages about his most important tasks that the user is involved with. Our solution can also be applied in other applications such as news filtering and alerts on mobile devices. Our evaluation results demonstrate the effectiveness of \PAM.