Information Extraction
Automatic Extraction of the Passing Strategies of Soccer Teams
Gyarmati, Laszlo, Anguera, Xavier
Technology offers new ways to measure the locations of the players and of the ball in sports. This translates to the trajectories the ball takes on the field as a result of the tactics the team applies. The challenge professionals in soccer are facing is to take the reverse path: given the trajectories of the ball is it possible to infer the underlying strategy/tactic of a team? We propose a method based on Dynamic Time Warping to reveal the tactics of a team through the analysis of repeating series of events. Based on the analysis of an entire season, we derive insights such as passing strategies for maintaining ball possession or counter attacks, and passing styles with a focus on the team or on the capabilities of the individual players.
A Subspace Learning Framework for Cross-Lingual Sentiment Classification with Partial Parallel Data
Zhou, Guangyou (Central China Normal University) | He, Tingting (Central China Normal University) | Zhao, Jun (National Laboratory of Pattern Recognition, CASIA) | Wu, Wensheng (University of Southern California)
Cross-lingual sentiment classification aims to automatically predict sentiment polarity (e.g., positive or negative) of data in a label-scarce target language by exploiting labeled data from a label-rich language. The fundamental challenge of cross-lingual learning stems from a lack of overlap between the feature spaces of the source language data and that of the target language data. To address this challenge, previous work in the literature mainly relies on the large amount of bilingual parallel corpora to bridge the language gap. In many real applications, however, it is often the case that we have some partial parallel data but it is an expensive and time-consuming job to acquire large amount of parallel data on different languages. In this paper, we propose a novel subspace learning framework by leveraging the partial parallel data for cross-lingual sentiment classification. The proposed approach is achieved by jointly learning the document-aligned review data and un-aligned data from the source language and the target language via a non-negative matrix factorization framework. We conduct a set of experiments with cross-lingual sentiment classification tasks on multilingual Amazon product reviews. Our experimental results demonstrate the efficacy of the proposed cross-lingual approach.
Automatic Extraction of References to Future Events from News Articles Using Semantic and Morphological Information
Nakajima, Yoko (Kitami Institute of Technology)
In my doctoral dissertation I investigate patterns appearing in sentences referring to the future. Such patterns are useful in predicting future events. I base the study on a multiple newspaper corpora. I firstly perform a preliminary study to find out that the patterns appearing in future-reference sentences often consist of disjointed elements within a sentence. Such patterns are also usually semantically and grammatically consistent, although lexically variant. Therefore, I propose a method for automatic extraction of such patterns, applying both grammatical (morphological) and semantic information to represent sentences in morphosemantic structure, and then extract frequent patterns, including those with disjointed elements. Next, I perform a series of experiments, in which I firstly train fourteen classifier versions and compare them to choose the best one. Next, I compare my method to the state-of-the-art, and verify the final performance of the method on a new dataset. I conclude that the proposed method is capable to automatically classify future-reference sentences, significantly outperforming state-of-the-art, and reaching 76% of F-score.
RoTuEl: A Semi-Automated Method for Labeling Political Tweets
Filho, Wilton de Paula (Federal Institute of Education) | Garcia, Ana Cristina Bicharra (Federal Fluminense University)
The latest research on prediction of the outcome of elections using Twitter data, the election tweets labeling area has hardly been explored. Therefore, the authors of this paper propose to develop a semi-automated model for labeling political tweets. The expected result of this study is to contribute to enhance the quality of the choice of messages used in the labeling process by reducing the time selection of messages and the efficiency of classifying the messages and, thus, to increase the accuracy of the models using this approach. The proposed method could label 2200 messages from the analysis of only 60 messages by 20 users. The first results obtained by the method were higher than the process carried out manually by humans.
Information Extraction of Texts in the Biomedical Domain
Cotik, Viviana (Universidad de Buenos Aires)
Automatic detection of relevant terms in medical reports is useful for educational purposes and for clinical research. Natural language processing techniques can be applied in order to identify them. The main goal of this research is to develop a method to identify whether medical reports of imaging studies (usually called radiology reports) written in Spanish are important (in the sense that they have non-negated pathological findings) or not. We also try to identify which finding is present and if possible its relationship with anatomical entities.
Feature Ensemble Plus Sample Selection: Domain Adaptation for Sentiment Classification (Extended Abstract)
Xia, Rui (Nanjing University of Science and Technology) | Zong, Chengqing (Chinese Academy of Sciences) | Hu, Xuelei (Nanjing University of Science and Technology) | Cambria, Erik (Nanyang Technological University)
The domain adaptation problem arises often in the field of sentiment classification. There are two distinct needs in domain adaptation, namely labeling adaptation and instance adaptation. Most of current research focuses on the former one, while neglects the latter one. In this work, we propose a joint approach, named feature ensemble plus sample selection (SS-FE), which takes both types of adaptation into account. A feature ensemble (FE) model is first proposed to learn a new labeling function in a feature re-weighting manner. Furthermore, a PCA-based sample selection (PCA-SS) method is proposed as an aid to FE for instance adaptation. Experimental results show that the proposed SS-FE approach could gain significant improvements, compared to individual FE and PCA-SS, due to its comprehensive consideration of both labeling adaptation and instance adaptation.
Developing Corpora for Sentiment Analysis: The Case of Irony and Senti-TUT (Extended Abstract)
Bosco, Cristina (Dipartimento di Informatica, Università di Torino) | Patti, Viviana (Dipartimento di Informatica, Università di Torino) | Bolioli, Andrea (CELI srl)
This paper focusses on the main issues related to the development of a corpus for opinion and sentiment analysis, with a special attention to irony, and presents as a case study Senti-TUT, a project for Italian aimed at investigating sentiment and irony in social media. We present the Senti-TUT corpus, a collection of texts from Twitter annotated with sentiment polarity. We describe the dataset, the annotation, the methodologies applied and our investigations on two important features of irony: polarity reversing and emotion expressions.
Looking at Mondrian's Victory Boogie-Woogie: What Do I Feel?
Sartori, Andreza (University of Trento and Telecom Italia) | Yan, Yan (University of Trento and UIUC, Singapore) | Özbal, Gözde (Fondazione Bruno Kessler) | Salah, Alkim Almila Akdag (Royal Netherlands Academy of Arts and Sciences) | Salah, Albert Ali (Boğaziçi University) | Sebe, Nicu (University of Trento)
Abstract artists use non-figurative elements (i.e. colours, lines, shapes, and textures) to convey emotions and often rely on the titles of their various compositions to generate (or enhance) an emotional reaction in the audience. Several psychological works observed that the metadata (i.e., titles, description and/or artist statements) associated with paintings increase the understanding and the aesthetic appreciation of artworks. In this paper we explore if the same metadata could facilitate the computational analysis of artworks, and reveal what kind of emotional responses they awake. To this end, we employ computer vision and sentiment analysis to learn statistical patterns associated with positive and negative emotions on abstract paintings. We propose a multimodal approach which combines both visual and metadata features in order to improve the machine performance. In particular, we propose a novel joint flexible Schatten p-norm model which can exploit the sharing patterns between visual and textual information for abstract painting emotion analysis. Moreover, we conduct a qualitative analysis on the cases in which metadata help improving the machine performance.
Unsupervised Sentiment Analysis for Social Media Images
Wang, Yilin (Arizona State University) | Wang, Suhang (Arizona State University) | Tang, Jiliang (Arizona State University) | Liu, Huan (Arizona State University) | Li, Baoxin (Arizona State University)
Current methods of sentiment analysis for social media images include low-level visual feature based approaches [Jia et Recently text-based sentiment prediction has been al., 2012; Yang et al., 2014], mid-level visual feature based extensively studied, while image-centric sentiment approaches [Borth et al., 2013; Yuan et al., 2013] and deep analysis receives much less attention. In this paper, learning based approaches [You et al., 2015]. The vast majority we study the problem of understanding human of existing methods are supervised, relying on labeled images sentiments from large-scale social media images, to train sentiment classifiers. Unfortunately, sentiment considering both visual content and contextual information, labels are in general unavailable for social media images, and such as comments on the images, captions, it is too labor-and time-intensive to obtain labeled sets large etc. The challenge of this problem lies in enough for robust training. In order to utilize the vast amount the "semantic gap" between low-level visual features of unlabeled social media images, an unsupervised approach and higher-level image sentiments. Moreover, would be much more desirable.
Tracking Political Elections on Social Media: Applications and Experience
Contractor, Danish (IBM Research) | Chawda, Bhupesh (IBM Research) | Mehta, Sameep (IBM Research) | Subramaniam, L Venkata (IBM Research) | Faruquie, Tanveer Afzal (IBM Research)
In recent times, social media has become a popular medium for many election campaigns. It not only allows candidates to reach out to a large section of the electorate, it is also a potent medium for people to express their opinion on the proposed policies and promises of candidates. Analyzing social media data is challenging as the text can be noisy, sparse and even multilingual. In addition, the information may not be completely trustworthy, particularly in the presence of propaganda, promotions and rumors. In this paper we describe our work for analyzing election campaigns using social media data. Using data from the 2012 US presidential elections and the 2013 Philippines General elections, we provide detailed experiments on our methods that use granger causality to identify topics that were most “causal” for public opinion and which in turn, give an interpretable insight into “elections topics” that were most important. Our system was deployed by the largest media organization in the Philippines during the 2013 General elections and using our work, the media house able to identify and report news stories much faster than competitors and reported higher TRP ratings during the election.