Goto

Collaborating Authors

 East China Normal University


A Multi-Task Learning Approach for Improving Product Title Compression with User Search Log Data

AAAI Conferences

It is a challenging and practical research problem to obtain effective compression of lengthy product titles for E-commerce. This is particularly important as more and more users browse mobile E-commerce apps and more merchants make the original product titles redundant and lengthy for Search Engine Optimization. Traditional text summarization approaches often require a large amount of preprocessing costs and do not capture the important issue of conversion rate in E-commerce. This paper proposes a novel multi-task learning approach for improving product title compression with user search log data. In particular, a pointer network-based sequence-to-sequence approach is utilized for title compression with an attentive mechanism as an extractive method and an attentive encoder-decoder approach is utilized for generating user search queries. The encoding parameters (i.e., semantic embedding of original titles) are shared among the two tasks and the attention distributions are jointly optimized. An extensive set of experiments with both human annotated data and online deployment demonstrate the advantage of the proposed research for both compression qualities and online business values.


Personalized Time-Aware Tag Recommendation

AAAI Conferences

Personalized tag recommender systems suggest a list of tags to a user when he or she wants to annotate an item. They utilize users’ preferences and the features of items. Tensorfactorization techniques have been widely used in tag recommendation. Given the user-item pair, although the classic PITF (Pairwise Interaction Tensor Factorization) explicitly models the pairwise interactions among users, items and tags, it overlooks users’ short-term interests and suffers from data sparsity. On the other hand, given the user-item-time triple, time-aware approaches like BLL (Base-Level Learning) utilize the time effect to capture the temporal dynamics and the most popular tags on items to handle cold start situation of new users. However, it works only on individual level and the target resource level, which cannot find users’ potential interests. In this paper, we propose an unified tag recommendation approach by considering both time awareness and personalization aspects, which extends PITF by adding weightsto user-tag interaction and item-tag interaction respectively. Compared to PITF, our proposed model can depict temporal factor by temporal weights and relieve data sparsity problem by referencing the most popular tags on items. Further, our model brings collaborative filtering (CF) to time-aware models, which can mine information from global data and help improving the ability of recommending new tags. Different from the power-form functions used in the existing time aware recommendation models, we use the Hawkes process with the exponential intensity function to improve the model’s efficiency. The experimental results show that our proposed model outperforms the state of the art tag recommendation methods in accuracy and has better ability to recommend new tags.


Inference on Syntactic and Semantic Structures for Machine Comprehension

AAAI Conferences

Hidden variable models are important tools for solving open domain machine comprehension tasks and have achieved remarkable accuracy in many question answering benchmark datasets. Existing models impose strong independence assumptions on hidden variables, which leaves the interaction among them unexplored. Here we introduce linguistic structures to help capturing global evidence in hidden variable modeling. In the proposed algorithms, question-answer pairs are scored based on structured inference results on parse trees and semantic frames, which aims to assign hidden variables in a global optimal way. Experiments on the MCTest dataset demonstrate that the proposed models are highly competitive with state-of-the-art machine comprehension systems.


CA-RNN: Using Context-Aligned Recurrent Neural Networks for Modeling Sentence Similarity

AAAI Conferences

The recurrent neural networks (RNNs) have shown good performance for sentence similarity modeling in recent years. Most RNNs focus on modeling the hidden states based on the current sentence, while the context information from the other sentence is not well investigated during the hidden state generation. In this paper, we propose a context-aligned RNN (CA-RNN) model, which incorporates the contextual information of the aligned words in a sentence pair for the inner hidden state generation. Specifically, we first perform word alignment detection to identify the aligned words in the two sentences. Then, we present a context alignment gating mechanism and embed it into our model to automatically absorb the aligned words' context for the hidden state update. Experiments on three benchmark datasets, namely TREC-QA and WikiQA for answer selection and MSRP for paraphrase identification, show the great advantages of our proposed model. In particular, we achieve the new state-of-the-art performance on TREC-QA and WikiQA. Furthermore, our model is comparable to if not better than the recent neural network based approaches on MSRP.


SNNN: Promoting Word Sentiment and Negation in Neural Sentiment Classification

AAAI Conferences

We mainly investigate word influence in neural sentiment classification, which results in a novel approach to promoting word sentiment and negation as attentions. Particularly, a sentiment and negation neural network (SNNN) is proposed, including a sentiment neural network (SNN) and a negation neural network (NNN). First, we modify the word level by embedding the word sentiment and negation information as the extra layers for the input. Second, we adopt a hierarchical LSTM model to generate the word-level, sentence-level and document-level representations respectively. After that, we enhance word sentiment and negation as attentions over the semantic level. Finally, the experiments conducting on the IMDB and Yelp data sets show that our approach is superior to the state-of-the-art baselines. Furthermore, we draw the interesting conclusions that (1) LSTM performs better than CNN and RNN for neural sentiment classification; (2) word sentiment and negation are a strong alliance with attention, while overfitting occurs when they are simultaneously applied at the embedding layer; and (3) word sentiment/negation can be singly implemented for better performance as both embedding layer and attention at the same time.


Tau-FPL: Tolerance-Constrained Learning in Linear Time

AAAI Conferences

In many real-world applications, learning a classifier with false-positive rate under a specified tolerance is appealing. Existing approaches either introduce prior knowledge dependent label cost or tune parameters based on traditional classifiers, which are of limitation in methodology since they do not directly incorporate the false-positive rate tolerance. In this paper, we propose a novel scoring-thresholding approach, tau-False Positive Learning (tau-FPL) to address this problem. We show that the scoring problem which takes the false-positive rate tolerance into accounts can be efficiently solved in linear time, also an out-of-bootstrap thresholding method can transform the learned ranking function into a low false-positive classifier. Both theoretical analysis and experimental results show superior performance of the proposed tau-FPL over the existing approaches.


Co-Attending Free-Form Regions and Detections With Multi-Modal Multiplicative Feature Embedding for Visual Question Answering

AAAI Conferences

Recently, the Visual Question Answering (VQA) task has gained increasing attention in artificial intelligence. Existing VQA methods mainly adopt the visual attention mechanism to associate the input question with corresponding image regions for effective question answering. The free-form region based and the detection-based visual attention mechanisms are mostly investigated, with the former ones attending free-form image regions and the latter ones attending pre-specified detection-box regions. We argue that the two attention mechanisms are able to provide complementary information and should be effectively integrated to better solve the VQA problem. In this paper, we propose a novel deep neural network for VQA that integrates both attention mechanisms. Our proposed framework effectively fuses features from free-form image regions, detection boxes, and question representations via a multi-modal multiplicative feature embedding scheme to jointly attend question-related free-form image regions and detection boxes for more accurate question answering. The proposed method is extensively evaluated on two publicly available datasets, COCO-QA and VQA, and outperforms state-of-the-art approaches. Source code is available at https://github.com/lupantech/dual-mfa-vqa.


Unsupervised Deep Learning for Optical Flow Estimation

AAAI Conferences

Recent work has shown that optical flow estimation can be formulated as a supervised learning problem. Moreover, convolutional networks have been successfully applied to this task. However, supervised flow learning is obfuscated by the shortage of labeled training data. As a consequence, existing methods have to turn to large synthetic datasets for easily computer generated ground truth. In this work, we explore if a deep network for flow estimation can be trained without supervision. Using image warping by the estimated flow, we devise a simple yet effective unsupervised method for learning optical flow, by directly minimizing photometric consistency. We demonstrate that a flow network can be trained from end-to-end using our unsupervised scheme. In some cases, our results come tantalizingly close to the performance of methods trained with full supervision.


Additional Multi-Touch Attribution for Online Advertising

AAAI Conferences

Multi-Touch Attribution studies the effects of various types of online advertisements on purchase conversions. It is a very important problem in computational advertising, as it allows marketers to assign credits for conversions to different advertising channels and optimize advertising campaigns. In this paper, we propose an additional multi-touch attribution model (AMTA) based on two obvious assumptions: (1) the effect of an ad exposure is fading with time and (2) the effects of ad exposures on the browsing path of a user are additive.AMTA borrows the techniques from survival analysis and uses the hazard rate to measure the influence of an ad exposure. In addition, we both take the conversion time and the intrinsic conversion rate of users into consideration.Experimental results on a large real-world advertising dataset illustrate that the our proposed method is superior to state-of-the-art techniques in conversion rate prediction and the credit allocation based on AMTA is reasonable.


Modeling the Intensity Function of Point Process Via Recurrent Neural Networks

AAAI Conferences

Event sequence, asynchronously generated with random timestamp, is ubiquitous among applications. The precise and arbitrary timestamp can carry important clues about the underlying dynamics, and has lent the event data fundamentally different from the time-series whereby series is indexed with fixed and equal time interval. One expressive mathematical tool for modeling event is point process. The intensity functions of many point processes involve two components: the background and the effect by the history. Due to its inherent spontaneousness, the background can be treated as a time series while the other need to handle the history events. In this paper, we model the background by a Recurrent Neural Network (RNN) with its units aligned with time series indexes while the history effect is modeled by another RNN whose units are aligned with asynchronous events to capture the long-range dynamics. The whole model with event type and timestamp prediction output layers can be trained end-to-end. Our approach takes an RNN perspective to point process, and models its background and history effect. For utility, our method allows a black-box treatment for modeling the intensity which is often a pre-defined parametric form in point processes. Meanwhile end-to-end training opens the venue for reusing existing rich techniques in deep network for point process modeling. We apply our model to the predictive maintenance problem using a log dataset by more than 1000 ATMs from a global bank headquartered in North America.