AITopics

With Community Question Answering (CQA) evolving into a quite popular method for information seeking and providing, it also becomes a target for spammers to disseminate promotion campaigns. Although there are a number of quality estimation efforts on the CQA platform, most of these works focus on identifying and reducing low-quality answers, which are mostly generated by impatient or inexperienced answerers. However, a large number of promotion answers appear to provide high-quality information to cheat CQA users in future interactions. Therefore, most existing quality estimation works in CQA may fail to detect these specially designed answers or question-answer pairs. In contrast to these works, we focus on the promotion channels of spammers, which include (shortened) URLs, telephone numbers and social media accounts. Spammers rely on these channels to connect to users to achieve promotion goals so they are irreplaceable for spamming activities. We propose a propagation algorithm to diffuse promotion intents on an "answerer-channel" bipartite graph and detect possible spamming activities. A supervised learning framework is also proposed to identify whether a QA pair is spam based on propagated promotion intents. Experimental results based on more than 6 million entries from a popular Chinese CQA portal show that our approach outperforms a number of existing quality estimation methods for detecting promotion campaigns on both the answer level and QA pair level.

information, promotion campaign, promotion channel, (13 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

Asia > China > Beijing > Beijing (0.04)
Asia > Singapore (0.04)

Genre: Research Report (0.69)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Deep Learning for Event-Driven Stock Prediction

Ding, Xiao (Harbin Institute of Technology) | Zhang, Yue (Singapore University of Technology and Design) | Liu, Ting (Harbin Institute of Technology) | Duan, Junwen (Harbin Institute of Technology)

We propose a deep learning method for eventdriven stock market prediction. First, events are extracted from news text, and represented as dense vectors, trained using a novel neural tensor network. Second, a deep convolutional neural network is used to model both short-term and long-term influences of events on stock price movements. Experimental results show that our model can achieve nearly 6% improvements on S&P 500 index prediction and individual stock prediction, respectively, compared to state-of-the-art baseline methods. In Figure 1: Example news influence of Google Inc. addition, market simulation results show that our system is more capable of making profits than previously reported systems trained on S&P 500 stock of events can be better captured [Ding et al., 2014].

neural network, prediction, prediction model, (15 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

Asia > Singapore (0.05)
Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.04)
Europe > Bulgaria > Sofia City Province > Sofia (0.04)
Asia > China > Heilongjiang Province > Harbin (0.04)

Genre: Research Report > New Finding (0.86)

Industry: Banking & Finance > Trading (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Tracking Political Elections on Social Media: Applications and Experience

Contractor, Danish (IBM Research) | Chawda, Bhupesh (IBM Research) | Mehta, Sameep (IBM Research) | Subramaniam, L Venkata (IBM Research) | Faruquie, Tanveer Afzal (IBM Research)

In recent times, social media has become a popular medium for many election campaigns. It not only allows candidates to reach out to a large section of the electorate, it is also a potent medium for people to express their opinion on the proposed policies and promises of candidates. Analyzing social media data is challenging as the text can be noisy, sparse and even multilingual. In addition, the information may not be completely trustworthy, particularly in the presence of propaganda, promotions and rumors. In this paper we describe our work for analyzing election campaigns using social media data. Using data from the 2012 US presidential elections and the 2013 Philippines General elections, we provide detailed experiments on our methods that use granger causality to identify topics that were most “causal” for public opinion and which in turn, give an interpretable insight into “elections topics” that were most important. Our system was deployed by the largest media organization in the Philippines during the 2013 General elections and using our work, the media house able to identify and report news stories much faster than competitors and reported higher TRP ratings during the election.

bigram, sentiment, tweet, (15 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

Asia > Philippines (0.55)
Asia > India > NCT > New Delhi (0.05)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(3 more...)

Genre: Research Report (0.46)

Industry:

Government > Voting & Elections (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.68)

Bouneffouf, Djallel (Canada's Michael Smith Genome Sciences Centre) | Birol, Inanc (Canada's Michael Smith Genome Sciences Centre)

Sampling with Minimum Sum of Squared Similarities for Nystrom-Based Large Scale Spectral Clustering

The Nystrom method provides an efficient sampling approach for large scale clustering problems, by generating a low-rank matrix approximation. However, existing sampling methods are limited by accuracy and computing time. This paper proposes an improved Nystrom-based clustering algorithm with a new sampling procedure, Minimum Sum of Squared Similarities (MSSS). Experiments on synthetic and real data sets show that the proposed sampling performs with higher accuracy than existing algorithms, applied to Nystrom-based spectral clustering problems. Furthermore, we provide a theoretical analysis that allows us to define the upper bound of the Frobenius norm error of the MSSS.

algorithm, dataset, matrix, (14 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Orange County > Irvine (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Hamming Compatible Quantization for Hashing

Hashing is one of the effective techniques for fast Approximate Nearest Neighbour (ANN) search. Traditional single-bit quantization (SBQ) in most hashing methods incurs lots of quantization error which seriously degrades the search performance. To address the limitation of SBQ, researchers have proposed promising multi-bit quantization (MBQ) methods to quantize each projection dimension with multiple bits. However, some MBQ methods need to adopt specific distance for binary code matching instead of the original Hamming distance, which would significantly decrease the retrieval speed. Two typical MBQ methods Hierarchical Quantization and Double Bit Quantization retain the Hamming distance, but both of them only consider the projection dimensions during quantization, ignoring the neighborhood structure of raw data inherent in Euclidean space. In this paper, we propose a multi-bit quantization method named Hamming Compatible Quantization (HCQ) to preserve the capability of similarity metric between Euclidean space and Hamming space by utilizing the neighborhood structure of raw data. Extensive experiment results have shown our approach significantly improves the performance of various state-of-the-art hashing methods while maintaining fast retrieval speed.

hamming distance, projection dimension, quantization, (13 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Singapore (0.04)
Asia > China > Beijing > Beijing (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Information Management (0.90)

Deep Multimodal Hashing with Orthogonal Regularization

Wang, Daixin (Tsinghua University) | Cui, Peng (Tsinghua University) | Ou, Mingdong (Tsinghua University) | Zhu, Wenwu (Tsinghua University)

Hashing is an important method for performing efficient similarity search. With the explosive growth of multimodal data, how to learn hashing-based compact representations for multimodal data becomes highly non-trivial. Compared with shallow structured models, deep models present superiority in capturing multimodal correlations due to their high nonlinearity. However, in order to make the learned representation more accurate and compact, how to reduce the redundant information lying in the multimodal representations and incorporate different complexities of different modalities in the deep models is still an open problem. In this paper, we propose a novel deep multimodal hashing method, namely Deep Multimodal Hashing with Orthogonal Regularization (DMHOR), which fully exploits intra-modality and inter-modality correlations. In particular, to reduce redundant information, we impose orthogonal regularizer on the weighting matrices of the model, and theoretically prove that the learned representation is guaranteed to be approximately orthogonal. Moreover, we find that a better representation can be attained with different numbers of layers for different modalities, due to their different complexities. Comprehensive experiments on WIKI and NUS-WIDE, demonstrate a substantial gain of DMHOR compared with state-of-the-art methods.

different modality, modality, representation, (15 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

Asia > Singapore (0.04)
Asia > China > Beijing > Beijing (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Sensing and Signal Processing > Image Processing (0.68)

Wan, Ji (Institute Of Computing Technology of the Chinese Academy of Sciences) | Wu, Pengcheng (Singapore Management University) | Hoi, Steven C. H. (Singapore Management University) | Zhao, Peilin (Institute for Infocomm Research) | Gao, Xingyu (Institute of Computing Technology of the Chinese Academy of Sciences) | Wang, Dayong (Michigan State University) | Zhang, Yongdong (Institute of Computing Technology of the Chinese Academy of Sciences) | Li, Jintao (Institute of Computing Technology of the Chinese Academy of Sciences)

Online Learning to Rank for Content-Based Image Retrieval

A major challenge in Content-Based Image Retrieval (CBIR) is to bridge the semantic gap between low-level image contents and high-level semantic concepts. Although researchers have investigated a variety of retrieval techniques using different types of features and distance functions, no single best retrieval solution can fully tackle this challenge. In a real-world CBIR task, it is often highly desired to combine multiple types of different feature representations and diverse distance measures in order to close the semantic gap. In this paper, we investigate a new framework of learning to rank for CBIR, which aims to seek the optimal combination of different retrieval schemes by learning from large-scale training data in CBIR. We first formulate the problem formally as a learning to rank task, which can be solved in general by applying the existing batch learning to rank algorithms from text information retrieval (IR). To further address the scalability towards large-scale online CBIR applications, we present a family of online learning to rank algorithms, which are significantly more efficient and scalable than classical batch algorithms for large-scale online CBIR. Finally, we conduct an extensive set of experiments, in which encouraging results show that our technique is effective, scalable and promising for large-scale CBIR.

algorithm, online, rank algorithm, (15 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

Asia > Singapore (0.04)
Asia > China (0.04)
North America > United States > Michigan (0.04)

Genre: Research Report (0.34)

Industry: Education > Educational Setting > Online (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.65)

Personalized Sentiment Classification Based on Latent Individuality of Microblog Users

Sentiment expression in microblog posts often reflects user's specific individuality due to different language habit, personal character, opinion bias and so on. Existing sentiment classification algorithms largely ignore such latent personal distinctions among different microblog users. Meanwhile, sentiment data of microblogs are sparse for individual users, making it infeasible to learn effective personalized classifier. In this paper, we propose a novel, extensible personalized sentiment classification method based on a variant of latent factor model to capture personal sentiment variations by mapping users and posts into a low-dimensional factor space. We alleviate the sparsity of personal texts by decomposing the posts into words which are further represented by the weighted sentiment and topic units based on a set of syntactic units of words obtained from dependency parsing results. To strengthen the representation of users, we leverage users following relation to consolidate the individuality of a user fused from other users with similar interests. Results on real-world microblog datasets confirm that our method outperforms state-of-the-art baseline algorithms with large margins.

individuality, proceedings, relation, (16 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

Asia > China > Hong Kong (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.04)
(3 more...)

Industry: Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Short and Sparse Text Topic Modeling via Self-Aggregation

Quan, Xiaojun (Institute for Infocomm Research) | Kit, Chunyu (City University of Hong Kong) | Ge, Yong (University of North Carolina at Charlotte) | Pan, Sinno Jialin (Nanyang Technological University)

The overwhelming amount of short text data on social media and elsewhere has posed great challenges to topic modeling due to the sparsity problem. Most existing attempts to alleviate this problem resort to heuristic strategies to aggregate short texts into pseudo-documents before the application of standard topic modeling. Although such strategies cannot be well generalized to more general genres of short texts, the success has shed light on how to develop a generalized solution. In this paper, we present a novel model towards this goal by integrating topic modeling with short text aggregation during topic inference. The aggregation is founded on general topical affinity of texts rather than particular heuristics, making the model readily applicable to various short texts. Experimental results on real-world datasets validate the effectiveness of this new model, suggesting that it can distill more meaningful topics from short texts.

short text, topic model, topic modeling, (15 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

Asia > Singapore (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Hong Kong (0.04)
North America > United States > North Carolina > Mecklenburg County > Charlotte (0.04)

Genre: Research Report (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Communications > Social Media (0.90)

Web Page Classification Based on Uncorrelated Semi-Supervised Intra-View and Inter-View Manifold Discriminant Feature Extraction

Jing, Xiao-Yuan (Wuhan University) | Liu, Qian (Wuhan University and Nanjing University of Posts and Telecommunications) | Wu, Fei (Wuhan University) | Xu, Baowen (Wuhan University) | Zhu, Yangping (Wuhan University) | Chen, Songcan (Nanjing University of Aeronautics and Astronautics)

Web page classification has attracted increasing research interest. It is intrinsically a multi-view and semi-supervised application, since web pages usually contain two or more types of data, such a text, hyperlinks and images, and unlabeled pages are generally much more than labeled ones. Web page data is commonly high-dimensional. Thus, how to extract useful features from this kind of data in the multi-view semi-supervised scenario is important for web page classification. To our knowledge, only one method is specially presented for this topic. And with respect to a few semi-supervised multi-view feature extraction methods on other applications, there still exists much room for improvement. In this paper, we firstly design a feature extraction schema called semi-supervised intra-view and inter-view manifold discriminant (SI2MD) learning, which sufficiently utilizes the intra-view and inter-view discriminant information of labeled samples and the local neighborhood structures of unlabeled samples. We then design a semi-supervised uncorrelation constraint for the SI2MD schema to remove the multi-view correlation in the semi-supervised scenario. By combining the SI2MD schema with the constraint, we propose an uncorrelated semi-supervised intra-view and inter-view manifold discriminant (USI2MD) learning approach for web page classification. Experiments on public web page databases validate the proposed approach.

database, unlabeled sample, usi 2, (15 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

Asia > China > Jiangsu Province > Nanjing (0.04)
Asia > China > Hubei Province > Wuhan (0.04)

Genre: Research Report (0.68)

Technology:

Information Technology > Communications > Web (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science > Data Mining > Feature Extraction (0.83)