Goto

Collaborating Authors

 Communications: Overviews


Harnessing Structures in Big Data via Guaranteed Low-Rank Matrix Estimation

arXiv.org Machine Learning

Low-rank modeling plays a pivotal role in signal processing and machine learning, with applications ranging from collaborative filtering, video surveillance, medical imaging, to dimensionality reduction and adaptive filtering. Many modern high-dimensional data and interactions thereof can be modeled as lying approximately in a low-dimensional subspace or manifold, possibly with additional structures, and its proper exploitations lead to significant reduction of costs in sensing, computation and storage. In recent years, there is a plethora of progress in understanding how to exploit low-rank structures using computationally efficient procedures in a provable manner, including both convex and nonconvex approaches. On one side, convex relaxations such as nuclear norm minimization often lead to statistically optimal procedures for estimating low-rank matrices, where first-order methods are developed to address the computational challenges; on the other side, there is emerging evidence that properly designed nonconvex procedures, such as projected gradient descent, often provide globally optimal solutions with a much lower computational cost in many problems. This survey article will provide a unified overview of these recent advances on low-rank matrix estimation from incomplete measurements. Attention is paid to rigorous characterization of the performance of these algorithms, and to problems where the low-rank matrix have additional structural properties that require new algorithmic designs and theoretical analysis.


Twitter Summarization Based on Social Network and Sparse Reconstruction

AAAI Conferences

With the rapid growth of microblogging services, such as Twitter, a vast of short and noisy messages are produced by millions of users, which makes people difficult to quickly grasp essential information of their interested topics. In this paper, we study extractive topic-oriented Twitter summarization as a solution to address this problem. Traditional summarization methods only consider text information, which is insufficient in social media situation. Existing Twitter summarization techniques rarely explore relations between tweets explicitly, ignoring that information can spread along the social network. Inspired by social theories that expression consistence and expression contagion are observed in social network, we propose a novel approach for Twitter summarization in short and noisy situation by integrating Social Network and Sparse Reconstruction (SNSR). We explore whether social relations can help Twitter summarization, modeling relations between tweets described as the social regularization and integrating it into the group sparse optimization framework. It conducts a sparse reconstruction process by selecting tweets that can best reconstruct the original tweets in a specific topic, with considering coverage and sparsity. We simultaneously design the diversity regularization to remove redundancy. In particular, we present a mathematical optimization formulation and develop an efficient algorithm to solve it. Due to the lack of public corpus, we construct the gold standard twitter summary datasets for 12 different topics. Experimental results on this datasets show the effectiveness of our framework for handling the large scale short and noisy messages in social media.


Fact Checking in Community Forums

AAAI Conferences

Community Question Answering (cQA) forums are very popular nowadays, as they represent effective means for communities around particular topics to share information. Unfortunately, this information is not always factual. Thus, here we explore a new dimension in the context of cQA, which has been ignored so far: checking the veracity of answers to particular questions in cQA forums. As this is a new problem, we create a specialized dataset for it. We further propose a novel multi-faceted model, which captures information from the answer content (what is said and how), from the author profile (who says it), from the rest of the community forum (where it is said), and from external authoritative sources of information (external support). Evaluation results show a MAP value of 86.54, which is 21 points absolute above the baseline.


280 Birds With One Stone: Inducing Multilingual Taxonomies From Wikipedia Using Character-Level Classification

AAAI Conferences

We propose a novel fully-automated approach towards inducing multilingual taxonomies from Wikipedia. Given an English taxonomy, our approach first leverages the interlanguage links of Wikipedia to automatically construct training datasets for the isa relation in the target language. Character-level classifiers are trained on the constructed datasets, and used in an optimal path discovery framework to induce high-precision, high-coverage taxonomies in other languages. Through experiments, we demonstrate that our approach significantly outperforms the state-of-the-art, heuristics-heavy approaches for six languages. As a consequence of our work, we release presumably the largest and the most accurate multilingual taxonomic resource spanning over 280 languages.


Cognition-Cognizant Sentiment Analysis With Multitask Subjectivity Summarization Based on Annotators' Gaze Behavior

AAAI Conferences

For document level sentiment analysis (SA), Subjectivity Extraction, ie., extracting the relevant subjective portions of the text that cover the overall sentiment expressed in the document, is an important step. Subjectivity Extraction, however, is a hard problem for systems, as it demands a great deal of world knowledge and reasoning. Humans, on the other hand, are good at extracting relevant subjective summaries from an opinionated document (say, a movie review), while inferring the sentiment expressed in it. This capability is manifested in their eye-movement behavior while reading: words pertaining to the subjective summary of the text attract a lot more attention in the form of gaze-fixations and/or saccadic patterns. We propose a multi-task deep neural framework for document level sentiment analysis that learns to predict the overall sentiment expressed in the given input document, by simultaneously learning to predict human gaze behavior and auxiliary linguistic tasks like part-of-speech and syntactic properties of words in the document. For this, a multi-task learning algorithm based on multi-layer shared LSTM augmented with task specific classifiers is proposed. With this composite multi-task network, we obtain performance competitive with or better than state-of-the-art approaches in SA. Moreover, the availability of gaze predictions as an auxiliary output helps interpret the system better; for instance, gaze predictions reveal that the system indeed performs subjectivity extraction better, which accounts for improvement in document level sentiment analysis performance.


10 Principles for Winning the Game of Digital Disruption

#artificialintelligence

A version of this article appeared in the Spring 2018 issue of strategy business. If you haven't noticed, a high-stakes global game of digital disruption is currently under way. It is fueled by the latest wave of technology: advances in artificial intelligence, data analytics, robotics, the Interne...


Deep Learning for Sentiment Analysis : A Survey

arXiv.org Machine Learning

Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of the data and produces state-of-the-art prediction results. Along with the success of deep learning in many other application domains, deep learning is also popularly used in sentiment analysis in recent years. This paper first gives an overview of deep learning and then provides a comprehensive survey of its current applications in sentiment analysis.


Marketing Analytics: Methods, Practice, Implementation, and Links to Other Fields

arXiv.org Machine Learning

Marketing analytics is a diverse field, with both academic researchers and practitioners coming from a range of backgrounds including marketing, operations research, statistics, and computer science. This paper provides an integrative review at the boundary of these three areas. The topics of visualization, segmentation, and class prediction are featured. Links between the disciplines are emphasized. For each of these topics, a historical overview is given, starting with initial work in the 1960s and carrying through to the present day. Recent innovations for modern large and complex "big data" sets are described. Practical implementation advice is given, along with a directory of open source R routines for implementing marketing analytics techniques.


Graph Summarization Methods and Applications: A Survey

arXiv.org Artificial Intelligence

While advances in computing resources have made processing enormous amounts of data possible, human ability to identify patterns in such data has not scaled accordingly. Efficient computational methods for condensing and simplifying data are thus becoming vital for extracting actionable insights. In particular, while data summarization techniques have been studied extensively, only recently has summarizing interconnected data, or graphs, become popular. This survey is a structured, comprehensive overview of the state-of-the-art methods for summarizing graph data. We first broach the motivation behind, and the challenges of, graph summarization. We then categorize summarization approaches by the type of graphs taken as input and further organize each category by core methodology. Finally, we discuss applications of summarization on real-world graphs and conclude by describing some open problems in the field.


Netvue Releases World's First Artificial Intelligence Doorbell

#artificialintelligence

It offers a leading-edge approach to respond to visitors and surroundings of the house. Belle will be available on Kickstarter on Jan. 16, 2018, starting at $129, as well as on exhibit at #42925, Sands Hall at CES 2018.