Detecting Events and Patterns in Large-Scale User Generated Textual Streams with Statistical Learning Methods

arXiv.org Machine Learning

A vast amount of textual web streams is influenced by events or phenomena emerging in the real world. The social web forms an excellent modern paradigm, where unstructured user generated content is published on a regular basis and in most occasions is freely distributed. The present Ph.D. Thesis deals with the problem of inferring information - or patterns in general - about events emerging in real life based on the contents of this textual stream. We show that it is possible to extract valuable information about social phenomena, such as an epidemic or even rainfall rates, by automatic analysis of the content published in Social Media, and in particular Twitter, using Statistical Machine Learning methods. An important intermediate task regards the formation and identification of features which characterise a target event; we select and use those textual features in several linear, non-linear and hybrid inference approaches achieving a significantly good performance in terms of the applied loss function. By examining further this rich data set, we also propose methods for extracting various types of mood signals revealing how affective norms - at least within the social web's population - evolve during the day and how significant events emerging in the real world are influencing them. Lastly, we present some preliminary findings showing several spatiotemporal characteristics of this textual information as well as the potential of using it to tackle tasks such as the prediction of voting intentions.


Novel Exploration Techniques (NETs) for Malaria Policy Interventions

AAAI Conferences

The task of decision-making under uncertainty is daunting, especially for problems which have significant complexity. Healthcare policy makers across the globe are facing problems under challenging constraints, with limited tools to help them make data driven decisions. In this work we frame the process of finding an optimal malaria policy as a stochastic multi-armed bandit problem, and implement three agent based strategies to explore the policy space. We apply a Gaussian Process regression to the findings of each agent, both for comparison and to account for stochastic results from simulating the spread of malaria in a fixed population. The generated policy spaces are compared with published results to give a direct reference with human expert decisions for the same simulated population. Our novel approach provides a powerful resource for policy makers, and a platform which can be readily extended to capture future more nuanced policy spaces.


JK Rowling vs Donald Trump: 'Harry Potter' Author Uses Twitter To Bash President, Again

International Business Times

J.K. Rowling, author of the acclaimed "Harry Potter" series, has once again expressed her dislike of President Donald Trump via Twitter. Rowling has previously used Twitter to share her negative views towards Trump's leadership and his proposed policies. The latest installment of tweets comes as a reply to Trump's comments about London Mayor Sadiq Khan. After the Saturday terror attack in England, in which three men ran over pedestrians on the London Bridge and stabbed several of them, Khan issued a statement: "Londoners will see an increased police presence today and over the course of the next few days. There's no reason to be alarmed."


Accelerating cancer research with deep learning

#artificialintelligence

Despite steady progress in detection and treatment in recent decades, cancer remains the second leading cause of death in the United States, cutting short the lives of approximately 500,000 people each year. To better understand and combat this disease, medical researchers rely on cancer registry programs--a national network of organizations that systematically collect demographic and clinical information related to the diagnosis, treatment, and history of cancer incidence in the United States. The surveillance effort, coordinated by the National Cancer Institute (NCI) and the Centers for Disease Control and Prevention, enables researchers and clinicians to monitor cancer cases at the national, state, and local levels. Much of this data is drawn from electronic, text-based clinical reports that must be manually curated--a time-intensive process--before it can be used in research. For example, cancer pathology reports, text documents that describe cancerous tissue in detail, must be individually read and annotated by experts before becoming part of a cancer registry.


Order-Planning Neural Text Generation From Structured Data

AAAI Conferences

Generating texts from structured data (e.g., a table) is important for various natural language processing tasks such as question answering and dialog systems. In recent studies, researchers use neural language models and encoder-decoder frameworks for table-to-text generation. However, these neural network-based approaches typically do not model the order of content during text generation. When a human writes a summary based on a given table, he or she would probably consider the content order before wording. In this paper, we propose an order-planning text generation model, where order information is explicitly captured by link-based attention. Then a self-adaptive gate combines the link-based attention with traditional content-based attention. We conducted experiments on the WikiBio dataset and achieve higher performance than previous methods in terms of BLEU, ROUGE, and NIST scores; we also performed ablation tests to analyze each component of our model.