Detecting Events and Patterns in Large-Scale User Generated Textual Streams with Statistical Learning Methods

arXiv.org Machine Learning

A vast amount of textual web streams is influenced by events or phenomena emerging in the real world. The social web forms an excellent modern paradigm, where unstructured user generated content is published on a regular basis and in most occasions is freely distributed. The present Ph.D. Thesis deals with the problem of inferring information - or patterns in general - about events emerging in real life based on the contents of this textual stream. We show that it is possible to extract valuable information about social phenomena, such as an epidemic or even rainfall rates, by automatic analysis of the content published in Social Media, and in particular Twitter, using Statistical Machine Learning methods. An important intermediate task regards the formation and identification of features which characterise a target event; we select and use those textual features in several linear, non-linear and hybrid inference approaches achieving a significantly good performance in terms of the applied loss function. By examining further this rich data set, we also propose methods for extracting various types of mood signals revealing how affective norms - at least within the social web's population - evolve during the day and how significant events emerging in the real world are influencing them. Lastly, we present some preliminary findings showing several spatiotemporal characteristics of this textual information as well as the potential of using it to tackle tasks such as the prediction of voting intentions.


Novel Exploration Techniques (NETs) for Malaria Policy Interventions

AAAI Conferences

The task of decision-making under uncertainty is daunting, especially for problems which have significant complexity. Healthcare policy makers across the globe are facing problems under challenging constraints, with limited tools to help them make data driven decisions. In this work we frame the process of finding an optimal malaria policy as a stochastic multi-armed bandit problem, and implement three agent based strategies to explore the policy space. We apply a Gaussian Process regression to the findings of each agent, both for comparison and to account for stochastic results from simulating the spread of malaria in a fixed population. The generated policy spaces are compared with published results to give a direct reference with human expert decisions for the same simulated population. Our novel approach provides a powerful resource for policy makers, and a platform which can be readily extended to capture future more nuanced policy spaces.


JK Rowling vs Donald Trump: 'Harry Potter' Author Uses Twitter To Bash President, Again

International Business Times

J.K. Rowling, author of the acclaimed "Harry Potter" series, has once again expressed her dislike of President Donald Trump via Twitter. Rowling has previously used Twitter to share her negative views towards Trump's leadership and his proposed policies. The latest installment of tweets comes as a reply to Trump's comments about London Mayor Sadiq Khan. After the Saturday terror attack in England, in which three men ran over pedestrians on the London Bridge and stabbed several of them, Khan issued a statement: "Londoners will see an increased police presence today and over the course of the next few days. There's no reason to be alarmed."


Order-Planning Neural Text Generation From Structured Data

AAAI Conferences

Generating texts from structured data (e.g., a table) is important for various natural language processing tasks such as question answering and dialog systems. In recent studies, researchers use neural language models and encoder-decoder frameworks for table-to-text generation. However, these neural network-based approaches typically do not model the order of content during text generation. When a human writes a summary based on a given table, he or she would probably consider the content order before wording. In this paper, we propose an order-planning text generation model, where order information is explicitly captured by link-based attention. Then a self-adaptive gate combines the link-based attention with traditional content-based attention. We conducted experiments on the WikiBio dataset and achieve higher performance than previous methods in terms of BLEU, ROUGE, and NIST scores; we also performed ablation tests to analyze each component of our model.


The 3rd International Planning Competition: Results and Analysis

AAAI Conferences

This paper reports the outcome of the third in the series of biennial international planning competitions, held in association with the International Conference on AI Planning and Scheduling (AIPS) in 2002. In addition to describing the domains, the planners and the objectives of the competition, the paper includes analysis of the results. The results are analysed from several perspectives, in order to address the questions of comparative performance between planners, comparative difficulty of domains, the degree of agreement between planners about the relative difficulty of individual problem instances and the question of how well planners scale relative to one another over increasingly difficult problems. The paper addresses these questions through statistical analysis of the raw results of the competition, in order to determine which results can be considered to be adequately supported by the data. The paper concludes with a discussion of some challenges for the future of the competition series.