Developing a NLP based PR platform for the Canadian Elections


Elections are a vital part of democracy allowing people to vote for the candidate they think can best lead the country. A candidate's campaign aims to demonstrate to the public why they think they are the best choice. However, in this age of constant media coverage and digital communications, the candidate is scrutinized at every step. A single misquote or negative news about a candidate can be the difference between him winning or losing the election. It becomes crucial to have a public relations manager who can guide and direct the candidate's campaign by prioritizing specific campaign activities. One critical aspect of the PR manager's work is to understand the public perception of their candidate and improve public sentiment about the candidate.

Location-Based Twitter Sentiment Analysis for Predicting the U.S. 2016 Presidential Election

AAAI Conferences

We seek to determine the effectiveness of using location-based social media to predict the outcome of the 2016 presidential election. To this aim, we create a dataset consisting of approximately 3 million tweets ranging from September 22nd to November 8th related to either Donald Trump or Hillary Clinton. Twenty-one states are chosen, with eleven categorized as swing states, five as Clinton favored and five as Trump favored. We incorporate two metrics in polling voter opinion for election outcomes: tweet volume and positive sentiment. Our data is labeled via a convolutional neural network trained on the sentiment140 dataset. To determine whether Twitter is an indicator of election outcome, we compare our results to the election outcome per state and across the nation. We use two approaches for determining state victories: winner-take-all and shared elector count. Our results show tweet sentiment mirrors the close races in the swing states; however, the differences in distribution of positive sentiment and volume between Clinton and Trump are not significant using our approach. Thus, we conclude neither sentiment nor volume is an accurate predictor of election results using our collection of data and labeling process.

A provable SVD-based algorithm for learning topics in dominant admixture corpus Machine Learning

Topic models, such as Latent Dirichlet Allocation (LDA), posit that documents are drawn from admixtures of distributions over words, known as topics. The inference problem of recovering topics from admixtures, is NP-hard. Assuming separability, a strong assumption, [4] gave the first provable algorithm for inference. For LDA model, [6] gave a provable algorithm using tensor-methods. But [4,6] do not learn topic vectors with bounded $l_1$ error (a natural measure for probability vectors). Our aim is to develop a model which makes intuitive and empirically supported assumptions and to design an algorithm with natural, simple components such as SVD, which provably solves the inference problem for the model with bounded $l_1$ error. A topic in LDA and other models is essentially characterized by a group of co-occurring words. Motivated by this, we introduce topic specific Catchwords, group of words which occur with strictly greater frequency in a topic than any other topic individually and are required to have high frequency together rather than individually. A major contribution of the paper is to show that under this more realistic assumption, which is empirically verified on real corpora, a singular value decomposition (SVD) based algorithm with a crucial pre-processing step of thresholding, can provably recover the topics from a collection of documents drawn from Dominant admixtures. Dominant admixtures are convex combination of distributions in which one distribution has a significantly higher contribution than others. Apart from the simplicity of the algorithm, the sample complexity has near optimal dependence on $w_0$, the lowest probability that a topic is dominant, and is better than [4]. Empirical evidence shows that on several real world corpora, both Catchwords and Dominant admixture assumptions hold and the proposed algorithm substantially outperforms the state of the art [5].

US Air Force funds Explainable-AI for UAV tech


Z Advanced Computing, Inc. (ZAC) of Potomac, MD announced on August 27 that it is funded by the US Air Force, to use ZAC's detailed 3D image recognition technology, based on Explainable-AI, for drones (unmanned aerial vehicle or UAV) for aerial image/object recognition. ZAC is the first to demonstrate Explainable-AI, where various attributes and details of 3D (three dimensional) objects can be recognized from any view or angle. "With our superior approach, complex 3D objects can be recognized from any direction, using only a small number of training samples," said Dr. Saied Tadayon, CTO of ZAC. "For complex tasks, such as drone vision, you need ZAC's superior technology to handle detailed 3D image recognition." "You cannot do this with the other techniques, such as Deep Convolutional Neural Networks, even with an extremely large number of training samples. That's basically hitting the limits of the CNNs," continued Dr. Bijan Tadayon, CEO of ZAC.

Analyzing NIH Funding Patterns over Time with Statistical Text Analysis

AAAI Conferences

In the past few years various government funding organizations such as the U.S. National Institutes of Health and the U.S.\ National Science Foundation have provided access to large publicly-available online databases documenting the grants that they have funded over the past few decades. These databases provide an excellent opportunity for the application of statistical text analysis techniques to infer useful quantitative information about how funding patterns have changed over time. In this paper we analyze data from the National Cancer Institute (part of National Institutes of Health) and show how text classification techniques provide a useful starting point for analyzing how funding for cancer research has evolved over the past 20 years in the United States.