Goto

Collaborating Authors

 Asia


What Edited Retweets Reveal about Online Political Discourse

AAAI Conferences

How widespread is the phenomenon of commenting or editing a tweet in the practice of retweeting by members of political communities in Twitter? What is the nature of comments(agree/disagree), or of edits (change audience, change meaning, curate content). Being able to answer these questions will provide knowledge that will help answering other questions such as: what are the topics, events, people that attract more discussion (in forms of commenting) or controversy (agree/disagree)? Who are the users who engage in the processing of curating content by inserting hashtags or adding links? Which political community shows more enthusiasm for an issue and how broad is the base of engaged users? How can detection of agreement/disagreement in conversations inform sentiment analysis - the technique used to make predictions (who will win an election) or support insightful analytics (which policy issue resonates more with constituents). We argue that is necessary to go beyond the much-adopted aggregate text analysis of the volume of tweets, in order to discover and understand phenomena at the level of single tweets. This becomes important in the light of the increase in the number of human-mimicking bots in Twitter. Genuine interaction and engagement can be better measured by analyzing tweets that display signs of human intervention. Editing the text of an original tweet before it is retweeted, could reveal mindful user engagement with the content, and therefore, would allow us to perform sampling among real human users. This paper presents work in progress that deals with the challenges of discovering retweets that contain comments or edits, and outlines a machine-learning based strategy for classifying the nature of such comments.


Learning Ontologies from the Web for Microtext Processing

AAAI Conferences

We build a mechanism to form an ontology of entities which improves a relevance of matching and searching microtext. Ontology construction starts from the seed entities and mines the web for new entities associated with them. To form these new entities, machine learning of syntactic parse trees (syntactic generalization) is applied to form commonalities between various search results for existing entities on the web. Ontology and syntactic generalization are applied to relevance improvement in search and text similarity assessment in commercial setting; evaluation results show substantial contribution of both sources to microtext processing.


Analysis of C2 and โ€œC2-Liteโ€ Micro-Message Communications

AAAI Conferences

Rather, the goal is to Microtext media (Ellen, 2011), such as SMS, IM, Twitter, gather relevant messages, organize them, and extract some and text chat, have in common that they use short strings other kind of useful information from them, such as how for immediate communication or broadcast. Microtext can well a team is performing or what people are talking about be construed as one form of micro-messaging (e.g., and when. However, micro-messages do not exist in a Milstein, et al., 2008) which we extend here to include any vacuum; they are contextually oriented and may be part of of a number of other modalities (e.g., telephone calls, a larger network of communications which includes email, face-to-face interaction) used for short, immediate and telephone and other media, including "macro-text." Given (potentially) persistent message passing among this, we have found that natural language processing of the coordinating agents. In this paper, we describe several microtext must be paired with temporal or network recent attempts to study micro-messaging military and analysis of the context. To demonstrate this process, we related organizational contexts.


The Role and Identification of Dialog Acts in Online Chat

AAAI Conferences

In recent years, online chat has become a dominant mode of communication. This text-based medium has the potential of improving information awareness within an organization, but only if the critical information within messages can be identified and directed to where it is most needed. Such a goal has many challenges that traditional Information Extraction (IE) approaches have rarely addressed: the text is โ€œdirtyโ€ (containing typos, misspellings, sparse punctuation, etc.), messages are fragmented and refer implicitly to previous messages and shared knowledge, messages from multiple topics are interleaved, etc. Past work in conversation analysis has included in-depth discussions of dialog acts, i.e., the individual utterances that comprise conversations. This paper describes how dialog acts within online chat differ from those within two-person voice conversations. It then presents methods for identifying dialog acts and the role that dialog acts play in identifying individual conversations within a chat stream. Identifying conversations is a necessary step for extracting actionable information, such as identifying individuals with specific expertise, recognizing reports of offline activities, and alerting decision makers to critical developments. Finally, we describe Chat-IE, a prototype software system that performs live dialog identification on chat streams.


Personal Activity Logger with Hierarchical Activity Representation

AAAI Conferences

Activity recognition is a key function for many context-aware applications in a smart environment. However, data collection and annotation for activity recognition is both time-consuming and costly. This paper proposes the hierarchical activity representation to enhance data reusability and introduces Personal Activity Logger (PAL), a computer aided tool with it, to reduce annotation efforts. We experimented with PAL in annotating activities within a personal space from power meters and a webcam in the office. Preliminary results show that PAL is effective in reducing the annotation efforts with only a slight loss in quality. In addition, we indicate the potential possibility to identify users from the distribution of events in their activities through the data analysis.


A Rich Context Model for Knowledge-Works

AAAI Conferences

Lack of context in information is a serious problem for knowledge-workers. Effective utilization of computational aids for supporting knowledge-workers require a rich understanding of the nature of context of information and related knowledge-works. It also needs specifications about how such understanding can be leveraged in computer-based systems. In this paper we propose a holistic model of context of knowledge-works and information created in course of their performances. We also demonstrate with an example how such a model can be used as basis for developing a formal, machine-deployable specification of activity context.


Pruning Techniques in Search and Planning

AAAI Conferences

Search algorithms often suffer from exploring areas which eventually are not part of the shortest path from the start to a goal. Usually it is the purpose of the heuristic function to guide the search algorithm such that it will ignore as much as possible of these areas. We consider other, non-heuristic methods that can be used to prune the search space to make search even faster. We present two algorithms: one for search in graphs that fit in memory, and in which we will need to perform many searches, and another, which improves the search time of planning problems that contain symmetries.


Learning a Kernel for Multi-Task Clustering

AAAI Conferences

Multi-task learning has received increasing attention in the past decade. Many supervised multi-task learning methods have been proposed, while unsupervised multi-task learning is still a rarely studied problem. In this paper, we propose to learn a kernel for multi-task clustering. Our goal is to learn a Reproducing Kernel Hilbert Space, in which the geometric structure of the data in each task is preserved, while the data distributions of any two tasks are as close as possible. This is formulated as a unified kernel learning framework, under which we study two types of kernel learning: nonparametric kernel learning and spectral kernel design. Both types of kernel learning can be solved by linear programming. Experiments on several cross-domain text data sets demonstrate that kernel k-means on the learned kernel can achieve better clustering results than traditional single-task clustering methods. It also outperforms the newly proposed multi-task clustering method.


Accelerating the Discovery of Data Quality Rules: A Case Study

AAAI Conferences

Poor quality data is a growing and costly problem that affects many enterprises across all aspects of their business ranging from operational efficiency to revenue protection. In this paper, we present an application -- Data Quality Rules Accelerator (DQRA) -- that accelerates Data Quality (DQ) efforts (e.g. data profiling and cleansing) by automatically discovering DQ rules for detecting inconsistencies in data. We then present two evaluations. The first evaluation compares DQRA to existing solutions; and shows that DQRA either outperformed or achieved performance comparable with these solutions on metrics such as precision, recall, and runtime. The second evaluation is a case study where DQRA was piloted at a large utilities company to improve data quality as part of a legacy migration effort. DQRA was able to discover rules that detected data inconsistencies directly impacting revenue and operational efficiency. Moreover, DQRA was able to significantly reduce the amount of effort required to develop these rules compared to the state of the practice. Finally, we describe ongoing efforts to deploy DQRA.


Abductive Inference for Combat: Using SCARE-S2 to Find High-Value Targets in Afghanistan

AAAI Conferences

Recently, geospatial abduction was introduced by the authors in [Shakarian et. al. 2010] as a way to infer unobserved geographic phenomena from a set of known observations and constraints between the two. In this paper, we introduce the SCARE-S2 software tool which applies geospatial abduction to the environment of Afghanistan. Unlike previous work, where we looked for small weapon caches supporting local attacks, here we look for insurgent high-value targets (HVT's), supporting insurgent operations in two provinces. These HVT's include the locations of insurgent leaders and major supply depots. Applying this method of inference to Afghanistan introduces several practical issues not addressed in previous work. Namely, we are conducting inference in a much larger area (24,940 sq km as compared to 675 sq km in previous work), on more varied terrain, and must consider the influence of many local tribes. We address all of these problems and evaluate our software on 6 months of real-world counter-insurgency data. We show that we are able to abduce regions of a relatively small area (on average, under 100 sq km and each containing, on average, 4.8 villages) that are more dense with HVT's (35 X more than the overall area considered).