Goto

Collaborating Authors

 Communications: Overviews


A Comparison between Microblog Corpus and Balanced Corpus from Linguistic and Sentimental Perspectives

AAAI Conferences

While microblogging has gained popularity on the Internet, analyzing and processing short messages has become a challenging task in natural language processing. This paper analyzes the differences between Internet short messages (or “microtext”) and general articles by comparing the Plurk Corpus and the Sinica Balanced Corpus. Likelihood ratio and the tóngyìcícílín thesaurus are adopted to analyze the lexical semantics of frequent terms in each corpus. Furthermore, the NTUSD sentiment dictionary is used to compare the sentiment distribution of the two corpora. The result is also applied to sentiment transition analysis.


Social Recommendation Using Low-Rank Semidefinite Program

AAAI Conferences

The most critical challenge for the recommendation system is to achieve the high prediction quality on the large scale sparse data contributed by the users. In this paper, we present a novel approach to the social recommendation problem, which takes the advantage of the graph Laplacian regularization to capture the underlying social relationship among the users. Differently from the previous approaches, that are based on the conventional gradient descent optimization, we formulate the presented graph Laplacian regularized social recommendation problem into a low-rank semidefinite program, which is able to be efficiently solved by the quasi-Newton algorithm. We have conducted the empirical evaluation on a large scale dataset of high sparsity, the promising experimental results show that our method is very effective and efficient for the social recommendation task.


Scalable Event-Based Clustering of Social Media Via Record Linkage Techniques

AAAI Conferences

We tackle the problem of grouping content available in social media applications such as Flickr, Youtube, Panoramino etc. into clusters of documents describing the same event. This task has been referred to as event identification before. We present a new formalization of the event identification task as a record linkage problem and show that this formulation leads to a principled and highly efficient solution to the problem. We present results on two datasets derived from Flickr — last.fm and upcoming — comparing the results in terms of Normalized Mutual Information and F-Measure with respect to several baselines, showing that a record linkage approach outperforms all baselines as well as a state-of-the-art system. We demonstrate that our approach can scale to large amounts of data, reducing the processing time considerably compared to a state-of-the-art approach. The scalability is achieved by applying an appropriate blocking strategy and relying on a Single Linkage clustering algorithm which avoids the exhaustive computation of pairwise similarities.


Human Computation

Morgan & Claypool Publishers

This book is aimed at achieving four goals: (1) defining human computation as a research area; (2) providing a comprehensive review of existing work; (3) drawing connections to a wide variety of disciplines, including AI, Machine Learning, HCI, Mechanism/Market Design and Psychology, and capturing their unique perspectives on the core research questions in human computation; and (4) suggesting promising research directions for the future. ISBN 9781608455164, 121 pages.


Using Mechanism Design to Prevent False-Name Manipulations

AI Magazine

The basic notion of false-name-proofness allows for useful mechanisms under certain circumstances, but in general there are impossibility results that show that false-name-proof mechanisms have severe limitations. One may react to these impossibility results by saying that, since false-name-proof mechanisms are unsatisfactory, we should not run any important mechanisms in highly anonymous settings—unless, perhaps, we can find some methodology that directly prevents false-name manipulation even in such settings, so that we are back in a more typical mechanism design context. However, it seems unlikely that the phenomenon of false-name manipulation will disappear anytime soon. Because the Internet is so attractive as a platform for running certain types of mechanisms, it seems unlikely that the organizations running these mechanisms will take them offline. Moreover, because a goal of these organizations is often to get as many users to participate as possible, they will be reluctant to use high-overhead solutions that discourage users from participating. As a result, perhaps the most promising approaches at this point are those that combine techniques from mechanism design with other techniques discussed in this article. It appears that this is a rich domain for new, creative approaches that can have significant practical impact.


A Comprehensive Survey of Data Mining-based Fraud Detection Research

arXiv.org Artificial Intelligence

This survey paper categorises, compares, and summarises from almost all published technical and review articles in automated fraud detection within the last 10 years. It defines the professional fraudster, formalises the main types and subtypes of known fraud, and presents the nature of data evidence collected within affected industries. Within the business context of mining the data to achieve higher cost savings, this research presents methods and techniques together with their problems. Compared to all related reviews on fraud detection, this survey covers much more technical articles and is the only one, to the best of our knowledge, which proposes alternative data and solutions from related domains.


Temporal and Social Context Based Burst Detection from Folksonomies

AAAI Conferences

Burst detection is an important topic in temporal stream analysis. Usually, only the textual features are used in burst detection. In the theme extraction from current prevailing social media content, it is necessary to consider not only textual features but also the pervasive collaborative context, e.g., resource lifetime and user activity. This paper explores novel approaches to combine multiple sources of such indication for better burst extraction. We systematically investigate the characters of collaborative context, i.e., metadata frequency, topic coverage and user attractiveness. First, a robust state based model is utilized to detect bursts from individual streams. We then propose a learning method to combine these burst pulses. Experiments on a large real dataset demonstrate the remarkable improvements over the traditional methods.


Keyword Extraction and Headline Generation Using Novel Word Features

AAAI Conferences

We introduce several novel word features for keyword extraction and headline generation. These new word features are derived according to the background knowledge of a document as supplied by Wikipedia. Given a document, to acquire its background knowledge from Wikipedia, we first generate a query for searching the Wikipedia corpus based on the key facts present in the document. We then use the query to find articles in the Wikipedia corpus that are closely related to the contents of the document. With the Wikipedia search result article set, we extract the inlink, outlink, category and infobox information in each article to derive a set of novel word features which reflect the document's background knowledge. These newly introduced word features offer valuable indications on individual words' importance in the input document. They serve as nice complements to the traditional word features derivable from explicit information of a document. In addition, we also introduce a word-document fitness feature to charcterize the influence of a document's genre on the keyword extraction and headline generation process. We study the effectiveness of these novel word features for keyword extraction and headline generation by experiments and have obtained very encouraging results.


A survey of statistical network models

arXiv.org Machine Learning

Networks are ubiquitous in science and have become a focal point for discussion in everyday life. Formal statistical models for the analysis of network data have emerged as a major topic of interest in diverse areas of study, and most of these involve a form of graphical representation. Probability models on graphs date back to 1959. Along with empirical studies in social psychology and sociology from the 1960s, these early works generated an active network community and a substantial literature in the 1970s. This effort moved into the statistical literature in the late 1970s and 1980s, and the past decade has seen a burgeoning network literature in statistical physics and computer science. The growth of the World Wide Web and the emergence of online networking communities such as Facebook, MySpace, and LinkedIn, and a host of more specialized professional network communities has intensified interest in the study of networks and network data. Our goal in this review is to provide the reader with an entry point to this burgeoning literature. We begin with an overview of the historical development of statistical network modeling and then we introduce a number of examples that have been studied in the network literature. Our subsequent discussion focuses on a number of prominent static and dynamic network models and their interconnections. We emphasize formal model descriptions, and pay special attention to the interpretation of parameters and their estimation. We end with a description of some open problems and challenges for machine learning and statistics.


AAAI-08 and IAAI-08 Conferences Provide Focal Point for AI

AI Magazine

This summer's AAAI Conference on Artificial Intelligence (AAAI-08) and its sister Conference on Innovative Applications of AI (IAAI-08) continued their long tradition of being a focal point of AI. This year's conferences were held in Chicago at the Hyatt Regency McCormick Place, July 13-17, 2008. The multidimensional conference offerings included nine invited talks, 251 technical papers, 22 innovative applications of AI papers, three competitions (poker, AI video, and general game playing), three special tracks (AI and the web, integrated intelligence, and physically grounded AI), 15 tutorials, 15 workshops, and 11 intelligent system demonstrations, as well as a number of awards, a doctoral consortium, student poster session and programs, and a vendor exhibit. This translated into a plethora of choices for the 921 conference attendees. An additional 175 people exclusively attended the tutorials, workshops, or exhibit.