Plotting

 Information Technology


Locate the Hate: Detecting Tweets against Blacks

AAAI Conferences

Although the social medium Twitter grants users freedom of speech, its instantaneous nature and retweeting features also amplify hate speech. Because Twitter has a sizeable black constituency, racist tweets against blacks are especially detrimental in the Twitter community, though this effect may not be obvious against a backdrop of half a billion tweets a day.1 We apply a supervised machine learning approach, employing inexpensively acquired labeled data from diverse Twitter accounts to learn a binary classifier for the labels “racist” and “nonracist.” The classifier has a 76% average accuracy on individual tweets, suggesting that with further improvements, our work can contribute data on the sources of anti-black hate speech.


From Interest to Function: Location Estimation in Social Media

AAAI Conferences

Recent years have witnessed the tremendous development of social media, which attracts a vast number of Internet users. The high-dimension content generated by these users provides an unique opportunity to understand their behavior deeply. As one of the most fundamental topics, location estimation attracts more and more research efforts. Different from the previous literature, we find that user's location is strongly related to user interest. Based on this, we first build a detection model to mine user interest from short text. We then establish the mapping between location function and user interest before presenting an efficient framework to predict the user's location with convincing fidelity. Thorough evaluations and comparisons on an authentic data set show that our proposed model significantly outperforms the state-of-the-arts approaches. Moreover, the high efficiency of our model also guarantees its applicability in real-world scenarios.


Search More, Disclose Less

AAAI Conferences

The blooming of comparison shopping agents (CSAs) in recent years enables buyers in today's markets to query more than a single CSA while shopping, thus substantially expanding the list of sellers whose prices they obtain. From the individual CSA point of view, however, the multi-CSAs querying is definitely non-favorable as most of today's CSAs benefit depends on payments they receive from sellers upon transferring buyers to their websites (and making a purchase). The most straightforward way for the CSA to improve its competence is through spending more resources on getting more sellers' prices, potentially resulting in a more attractive ``best price''. In this paper we suggest a complementary approach that improves the attractiveness of the best price returned to the buyer without having to extend the CSAs' price database. This approach, which we term ``selective price disclosure'' relies on removing some of the prices known to the CSA from the list of results returned to the buyer. The advantage of this approach is in the ability to affect the buyer's beliefs regarding the probability of obtaining more attractive prices if querying additional CSAs. The paper presents two methods for choosing the subset of prices to be presented to a fully-rational buyer, attempting to overcome the computational complexity associated with evaluating all possible subsets. The effectiveness and efficiency of the methods are demonstrated using real data, collected from five CSAs for four products. Furthermore, since people are known to have an inherently bounded rationality, the two methods are also evaluated with human buyers, demonstrating that selective price-disclosing can be highly effective with people, however the subset of prices that needs to be used should be extracted in a different (and more simplistic) manner.


Hypothesis Exploration for Malware Detection Using Planning

AAAI Conferences

In this paper we apply AI planning to address the hypothesis exploration problem and provide assistance to network administrators in detecting malware based on unreliable observations derived from network traffic.Building on the already established characterization and use of AI planning for similar problems, we propose a formulation of the hypothesis generation problem for malware detection as an AI planning problem with temporally extended goals and actions costs. Furthermore, we propose a notion of hypothesis ``plausibility'' under unreliable observations, which we model as plan quality. We then show that in the presence of unreliable observations, simply finding one most ``plausible'' hypothesis, although challenging, is not sufficient for effective malware detection. To that end, we propose a method for applying a state-of-the-art planner within a principled exploration process, to generate multiple distinct high-quality plans. We experimentally evaluate this approach by generating random problems of varying hardness both with respect to the number of observations, as well as the degree of unreliability. Based on these experiments, we argue that our approach presents a significant improvement over prior work that are focused on finding a single optimal plan, and that our hypothesis exploration application can motivate the development of new planners capable of generating the top high-quality plans.


The Automated Acquisition of Suggestions from Tweets

AAAI Conferences

This paper targets at automatically detecting and classifying user's suggestions from tweets. The short and informal nature of tweets, along with the imbalanced characteristics of suggestion tweets, makes the task extremely challenging. To this end, we develop a classification framework on Factorization Machines, which is effective and efficient especially in classification tasks with feature sparsity settings. Moreover, we tackle the imbalance problem by introducing cost-sensitive learning techniques in Factorization Machines. Extensively experimental studies on a manually annotated real-life data set show that the proposed approach significantly improves the baseline approach, and yields the precision of 71.06% and recall of 67.86%. We also investigate the reason why Factorization Machines perform better. Finally, we introduce the first manually annotated dataset for suggestion classification.


Multiagent Router Throttling: Decentralized Coordinated Response Against DDoS Attacks

AAAI Conferences

Distributed denial of service (DDoS) attacks constitute a rapidly evolving threat in the current Internet. In this paper we introduce Multiagent Router Throttling, a decentralized DDoS response mechanism in which a set of upstream routers independently learn to throttle traffic towards a victim server. We compare our approach against a baseline and a popular throttling technique from the literature, and we show that our proposed approach is more secure, reliable and cost-effective. Furthermore, our approach outperforms the baseline technique and either outperforms or has the same performance as the popular one.


Posted Prices Exchange for Display Advertising Contracts

AAAI Conferences

We propose a new market design for display advertising contracts, based on posted prices. Our model and algorithmic framework address several major challenges: (i) the space of possible impression types is exponential in the number of attributes, which is typically large, therefore a complete price space cannot be maintained; (ii) advertisers are usually unable or reluctant to provide extensive demand (willingness-to-pay) functions, (iii) the levels of detail with which supply and demand are specified are often not identical.


An Antimicrobial Prescription Surveillance System that Learns from Experience

AAAI Conferences

Inappropriate prescribing of antimicrobials is a major clinical and health concern, as well as a financial burden, in hospitals worldwide. In this paper, we describe a deployed automated antimicrobial prescription surveillance system that has been assisting hospital pharmacists in identifying and reporting inappropriate antimicrobial prescriptions. One of the key characteristics of this system is its ability to learn new rules for detecting inappropriate prescriptions based on previous false alerts. The supervised learning algorithm combines instance-based learning and rule induction techniques. It exploits temporal abstraction to extract a meaningful time interval representation from raw clinical data, and applies nearest neighbor classification with a distance function on both temporal and non-temporal parameters. The learning capability is valuable both in configuring the system for initial deployment and improving its long term use. We give an overview of the application, point to lessons learned so far and provide insight into the machine learning capability.


Multi-Armed Bandit with Budget Constraint and Variable Costs

AAAI Conferences

We study the multi-armed bandit problems with budget constraint and variable costs (MAB-BV). In this setting, pulling an arm will receive a random reward together with a random cost, and the objective of an algorithm is to pull a sequence of arms in order to maximize the expected total reward with the costs of pulling those arms complying with a budget constraint. This new setting models many Internet applications (e.g., ad exchange, sponsored search, and cloud computing) in a more accurate manner than previous settings where the pulling of arms is either costless or with a fixed cost. We propose two UCB based algorithms for the new setting. The first algorithm needs prior knowledge about the lower bound of the expected costs when computing the exploration term. The second algorithm eliminates this need by estimating the minimal expected costs from empirical observations, and therefore can be applied to more real-world applications where prior knowledge is not available. We prove that both algorithms have nice learning abilities, with regret bounds of O(ln B). Furthermore, we show that when applying our proposed algorithms to a previous setting with fixed costs (which can be regarded as our special case), one can improve the previously obtained regret bound. Our simulation results on real-time bidding in ad exchange verify the effectiveness of the algorithms and are consistent with our theoretical analysis.