Goto

Collaborating Authors

 Information Technology


Online Social Spammer Detection

AAAI Conferences

The explosive use of social media also makes it a popular platform for malicious users, known as social spammers, to overwhelm normal users with unwanted content. One effective way for social spammer detection is to build a classifier based on content and social network information. However, social spammers are sophisticated and adaptable to game the system with fast evolving content and network patterns. First, social spammers continually change their spamming content patterns to avoid being detected. Second, reflexive reciprocity makes it easier for social spammers to establish social influence and pretend to be normal users by quickly accumulating a large number of "human" friends. It is challenging for existing anti-spamming systems based on batch-mode learning to quickly respond to newly emerging patterns for effective social spammer detection. In this paper, we present a general optimization framework to collectively use content and network information for social spammer detection, and provide the solution for efficient online processing. Experimental results on Twitter datasets confirm the effectiveness and efficiency of the proposed framework.


False-Name Bidding and Economic Efficiency in Combinatorial Auctions

AAAI Conferences

Combinatorial auctions are multiple-item auctions in which bidders may place bids on any package (subset) of goods. This additional expressibility produces benefits that have led to combinatorial auctions becoming extremely important both in practice and in theory. In the computer science community, auction design has focused primarily on computational practicality and incentive compatibility. The latter concerns mechanisms that are resistant to bidders misrepresenting themselves via a single false identity; however, with modern forms of bid submission, such as electronic bidding, other types of cheating have become feasible. Prominent amongst them is false-name bidding; that is, bidding under pseudonyms. For example, the ubiquitous Vickrey-Clarke-Groves (VCG) mechanism is incentive compatible and produces optimal allocations, but it is not false-name-proofโ€“bidders can increase their utility by submitting bids under multiple identifiers. Thus, there has recently been much interest in the design and analysis of false-name-proof auction mechanisms. These false-name-proof mechanisms, however, have polynomially small efficiency guarantees: they can produce allocations with very low economic efficiency/social welfare. In contrast, we show that, provided the degree to which different goods are complementary is bounded (as is the case in many important, practical auctions), the VCG mechanism gives a constant efficiency guarantee. Constant efficiency guarantees hold even at equilibria where the agents bid in a manner that is not individually rational. Thus, while an individual bidder may personally benefit greatly from making false-name bids, this will have only a small detrimental effect on the objective of the auctioneer: maximizing economic efficiency. So, from the auctioneer's viewpoint the VCG mechanism remains preferable to false-name-proof mechanisms.


Uncovering Hidden Structure through Parallel Problem Decomposition

AAAI Conferences

A key strategy for speeding up computation is to run in parallel on multiple cores. However, on hard combinatorial problems, exploiting parallelism has been surprisingly challenging. It appears that traditional divide-and-conquer strategies do not work well, due to the intricate non-local nature of the interactions between the problem variables. In this paper, we introduce a novel way in which parallelism can be used to exploit hidden structure of hard combinatorial problems. We demonstrate the success of this approach on minimal set basis problem, which has a wide range of applications in machine learning and system security, etc. We also show the effectiveness on a related application problem from materials discovery. In our approach, a large number of smaller sub-problems are identified and solved concurrently. We then aggregate the information from those solutions, and use this to initialize the search of a global, complete solver. We show that this strategy leads to a significant speed-up over a sequential approach. The strategy also greatly outperforms state-of-the-art incomplete solvers in terms of solution quality. Our work opens up a novel angle for using parallelism to solve hard combinatorial problems.


RepRev: Mitigating the Negative Effects of Misreported Ratings

AAAI Conferences

Reputation models depend on the ratings provided by buyers togauge the reliability of sellers in multi-agent based e-commerce environment. However, there is no prevention forthe cases in which a buyer misjudges a seller, and provides a negative rating to an original satisfactory transaction. In this case,how should the seller get his reputation repaired andutility loss recovered? In this work, we propose a mechanism to mitigate the negativeeffect of the misreported ratings. It temporarily inflates the reputation of thevictim seller with a certain value for a period of time. This allows the seller to recover hisutility loss due to lost opportunities caused by the misreported ratings. Experiments demonstrate the necessity and effectiveness of the proposed mechanism.


Generalized Higher-Order Tensor Decomposition via Parallel ADMM

AAAI Conferences

Higher-order tensors are becoming prevalent in many scientific areas such as computer vision, social network analysis, data mining and neuroscience. Traditional tensor decomposition approaches face three major challenges: model selecting, gross corruptions and computational efficiency. To address these problems, we first propose a parallel trace norm regularized tensor decomposition method, and formulate it as a convex optimization problem. This mehtod does not require the rank of each mode to be specified beforehand, and can automaticaly determine the number of factors in each mode through our optimization scheme. By considering the low-rank structure of the observed tensor, we analyze the equivalent relationship of the trace norm between a low-rank tensor and its core tensor. Then, we cast a non-convex tensor decomposition model into a weighted combination of multiple much smaller-scale matrix trace norm minimization. Finally, we develop two parallel alternating direction methods of multipliers (ADMM) to solve our problems. Experimental results verify that our regularized formulation is effective, and our methods are robust to noise or outliers.


On Hair Recognition in the Wild by Machine

AAAI Conferences

We present an algorithm for identity verification using only information from the hair. Face recognition in the wild (i.e., unconstrained settings) is highly useful in a variety of applications, but performance suffers due to many factors, e.g., obscured face, lighting variation, extreme pose angle, and expression. It is well known that humans utilize hair for identification under many of these scenarios due to either the consistent hair appearance of the same subject or obvious hair discrepancy of different subjects, but little work exists to replicate this intelligence artificially. We propose a learned hair matcher using shape, color, and texture features derived from localized patches through an AdaBoost technique with abstaining weak classifiers when features are not present in the given location. The proposed hair matcher achieves 71.53% accuracy on the LFW View 2 dataset. Hair also reduces the error of a Commercial Off-The-Shelf (COTS) face matcher through simple score-level fusion by 5.7%.


Delivering Guaranteed Display Ads under Reach and Frequency Requirements

AAAI Conferences

We propose a novel idea in the allocation and serving of online advertising. We show that by using predetermined fixed-length streams of ads (which we call patterns) to serve advertising, we can incorporate a variety of interesting features into the ad allocation optimization problem. In particular, our formulation optimizes for representativeness as well as user-level diversity and pacing of ads, under reach and frequency requirements. We show how the problem can be solved efficiently using a column generation scheme in which only a small set of best patterns are kept in the optimization problem. Our numerical tests suggest that with parallelization of the pattern generation process, the algorithm has a promising run time and memory usage.


The Semantic Interpretation of Trust in Multiagent Interactions

AAAI Conferences

We provide an approach to estimate trust between agents from their interactions. Our approach takes a probabilistic model of trust founded on commitments. We assume commitments to estimate trust because a commitment describes what an agent may expect of another. Therefore, the satisfaction or violation of a commitment provides a natural basis for determining how much to trust another agent. We evaluate our approach empirically. In one study, 30 subjects read emails extracted from the Enron dataset augmented with some synthetic emails to capture commitment operations missing in the Enron corpus. The subjects estimated trust between each pair of communicating participants. We trained model parameters for each subject with respect to our automated analysis of the emails, showing that our trained parameters yield a lower prediction error of a subject's trust rating given automatically inferred commitments than fixed parameters.


Event Recommendation in Event-Based Social Networks

AAAI Conferences

With the rapid growth of event-based social networks, the demand of event recommendation becomes increasingly important. Different from classic recommendation problems, event recommendation generally faces the problems of heterogenous online and offline social relationships among users and implicit feedback data. In this paper, we present a baysian probability model that can fully unleash the power of heterogenous social relations and efficiently tackle with implicit feedback characteristic for event recommendation. Experimental results on several real-world datasets demonstrate the utility of our method.


Improving Domain-independent Cloud-Based Speech Recognition with Domain-Dependent Phonetic Post-Processing

AAAI Conferences

Automatic speech recognition (ASR) technology has been developed to such a level that off-the-shelf distributed speech recognition services are available (free of cost), which allow researchers to integrate speech into their applications with little development effort or expert knowledge leading to better results compared with previously used open-source tools. Often, however, such services do not accept language models or grammars but process free speech from any domain. While results are very good given the enormous size of the search space, results frequently contain out-of-domain words or constructs that cannot be understood by subsequent domain-dependent natural language understanding (NLU) components. We present a versatile post-processing technique based on phonetic distance that integrates domain knowledge with open-domain ASR results, leading to improved ASR performance. Notably, our technique is able to make use of domain restrictions using various degrees of domain knowledge, ranging from pure vocabulary restrictions via grammars or N-Grams to restrictions of the acceptable utterances. We present results for a variety of corpora (mainly from human-robot interaction) where our combined approach significantly outperforms Google ASR as well as a plain open-source ASR solution.