Media
Predicting Emotion Perception Across Domains: A Study of Singing and Speaking
Zhang, Biqiao (University of Michigan) | Provost, Emily Mower (University of Michigan) | Swedberg, Robert (University of Michigan) | Essl, Georg (University of Michigan)
Emotion affects our understanding of the opinions and sentiments of others. Research has demonstrated that humans are able to recognize emotions in various domains, including speech and music, and that there are potential shared features that shape the emotion in both domains. In this paper, we investigate acoustic and visual features that are relevant to emotion perception in the domains of singing and speaking. We train regression models using two paradigms: (1) within-domain, in which models are trained and tested on the same domain and (2) cross-domain, in which models are trained on one domain and tested on the other domain. This strategy allows us to analyze the similarities and differences underlying the relationship between audio-visual feature expression and emotion perception and how this relationship is affected by domain of expression. We use kernel density estimation to model emotion as a probability distribution over the perception associated with multiple evaluators on the valence-activation space. This allows us to model the variation inherent in the reported perception. Results suggest that activation can be modeled more accurately across domains, compared to valence. Furthermore, visual features capture cross-domain emotion more accurately than acoustic features. The results provide additional evidence for a shared mechanism underlying spoken and sung emotion perception.
Personalized Tag Recommendation through Nonlinear Tensor Factorization Using Gaussian Kernel
Fang, Xiaomin (Sun Yat-sen University) | Pan, Rong (Sun Yat-sen University) | Cao, Guoxiang (Huawei Technologies Co. Ltd) | He, Xiuqiang (Huawei Technologies Co. Ltd) | Dai, Wenyuan (Huawei Technologies Co. Ltd)
Personalized tag recommendation systems recommend a list of tags to a user when he is about to annotate an item. It exploits the individual preference and the characteristic of the items. Tensor factorization tech- niques have been applied to many applications, such as tag recommendation. Models based on Tucker Decomposition can achieve good performance but require a lot of computation power. On the other hand, mod- els based on Canonical Decomposition can run in linear time and are more feasible for online recommendation. In this paper, we propose a novel method for personalized tag recommendation, which can be considered as a nonlinear extension of Canonical Decomposition. Different from linear tensor factorization, we exploit Gaussian radial basis function to increase the model’s capacity. The experimental results show that our proposed method outperforms the state-of-the-art methods for tag recommendation on real datasets and perform well even with a small number of features, which verifies that our models can make better use of features.
Relating Romanized Comments to News Articles by Inferring Multi-Glyphic Topical Correspondence
Tholpadi, Goutham (Indian Institute of Science, Bangalore) | Das, Mrinal Kanti (Indian Institute of Science, Bangalore) | Bansal, Trapit (Indian Institute of Science, Bangalore) | Bhattacharyya, Chiranjib (Indian Institute of Science, Bangalore)
Commenting is a popular facility provided by news sites. Analyzing such user-generated content has recently attracted research interest. However, in multilingual societies such as India, analyzing such user-generated content is hard due to several reasons: (1) There are more than 20 official languages but linguistic resources are available mainly for Hindi. It is observed that people frequently use romanized text as it is easy and quick using an English keyboard, resulting in multi-glyphic comments, where the texts are in the same language but in different scripts. Such romanized texts are almost unexplored in machine learning so far. (2) In many cases, comments are made on a specific part of the article rather than the topic of the entire article. Off-the-shelf methods such as correspondence LDA are insufficient to model such relationships between articles and comments. In this paper, we extend the notion of correspondence to model multi-lingual, multi-script, and inter-lingual topics in a unified probabilistic model called the Multi-glyphic Correspondence Topic Model (MCTM). Using several metrics, we verify our approach and show that it improves over the state-of-the-art.
The Bayesian Case Model: A Generative Approach for Case-Based Reasoning and Prototype Classification
Kim, Been, Rudin, Cynthia, Shah, Julie
We present the Bayesian Case Model (BCM), a general framework for Bayesian case-based reasoning (CBR) and prototype classification and clustering. BCM brings the intuitive power of CBR to a Bayesian generative framework. The BCM learns prototypes, the "quintessential" observations that best represent clusters in a dataset, by performing joint inference on cluster labels, prototypes and important features. Simultaneously, BCM pursues sparsity by learning subspaces, the sets of features that play important roles in the characterization of the prototypes. The prototype and subspace representation provides quantitative benefits in interpretability while preserving classification accuracy. Human subject experiments verify statistically significant improvements to participants' understanding when using explanations produced by BCM, compared to those given by prior art.
Recognition of In-Field Frog Chorusing Using Bayesian Nonparametric Microphone Array Processing
Bando, Yoshiaki (Kyoto University) | Otsuka, Takuma (NTT Communication Science Laboratories) | Aihara, Ikkyu (Dosisha University) | Awano, Hiromitsu (Kyoto University) | Itoyama, Katsutoshi (Kyoto University) | Yoshii, Kazuyoshi (Kyoto University) | Okuno, Hiroshi Gitchang (Waseda University)
In this paper, we exploit Bayesian nonparametric microphone array processing (BNP-MAP) for analyzing the spatio-temporal patterns of the frog chorus. Such analysis in real environments is made more difficult due to unpredictable sound sources including calls of various species of animals. An application of conventional signal processing algorithms has been difficult because these algorithms usually require the number of sound sources in advance. BNP-MAP is developed to cope with auditory uncertainties such as reverberation or unknown number of sounds by using a unified model based on Bayesian nonparametrics. We exploit BNP-MAP for analyzing the sound data of 20 minutes captured by a 7-channel microphone array in a paddy rice field in Oki Island, Japan, and revealed that two individuals of Schlegel's green tree frog (Rhacophorus schlegelii) called alternately with anti-phase. This result is compared with the video data captured by a video camera with 18 units of sound-imaging devices called Firefly deployed along the bank of the rice field. The auditory result provides more detailed patterns of the frog chorus in higher temporal resolutions. This higher resolution enables to analyze fine temporal structures of the frog calls. For example, BNP-MAP reveals the trill-like calling pattern of R. schlegelii.
Teaching AI Ethics Using Science Fiction
Burton, Emanuelle (Center College) | Goldsmith, Judy (University of Kentucky) | Mattei, Nicholas (NICTA and University of New South Wales)
The cultural and political implications of modern AI research are not some far off concern, they are things that affect the world in the here and now. From advanced control systems with advanced visualizations and image processing techniques that drive the machines of the modern military to the slow creep of a mechanized workforce, ethical questions surround us. Part of dealing with these ethical questions is not just speculating on what could be but teaching our students how to engage with these ethical questions. We explore the use of science fiction as an appropriate tool to enable AI researchers to help engage students and the public on the current state and potential impacts of AI.
What Predicts Media Coverage of Health Science Articles?
Wallace, Byron C. (University of Texas at Austin) | Paul, Michael J. (Johns Hopkins University) | Elhadad, Noémie (Columbia University)
An important aspect of health science is communicating research findings to the public. The media is a critical instrument in disseminating research. Yet the process by which a scientific article becomes “newsworthy” is not well understood. In this study, we use large-scale text analysis to characterize the content features of articles that are predictive of newsworthiness. We experiment with two novel corpora: (i) 28,910 articles from a di- verse range of biomedical and health journals, of which 1,343 were covered by the news agency Reuters, and (ii) 10,760 articles from the JAMA journals, of which 846 were given press releases by the journal editors. We show that media coverage can be predicted reasonably well: logistic regression achieves mean AUCs of 0.783 and 0.882 on the Reuters and JAMA datasets, respec- tively. We present and discuss interesting findings con- cerning the most predictive content features.
Towards Detecting Rumours in Social Media
Zubiaga, Arkaitz (University of Warwick) | Liakata, Maria (University of Warwick) | Procter, Rob (University of Warwick) | Bontcheva, Kalina (University of Sheffield) | Tolmie, Peter (University of Warwick)
This is especially the media as an event unfolds. This methodology consists of case in emergency situations, where the spread of a false rumour three main steps: (i) collection of (source) tweets posted during can have dangerous consequences. For instance, in a an emergency situation, sampling in such a way that situation where a hurricane is hitting a region, or a terrorist it is manageable for human assessment, while generating attack occurs in a city, access to accurate information is a good number of rumourous tweets from multiple stories, crucial for finding out how to stay safe and for maximising (ii) collection of conversations associated with each of the citizens' wellbeing. This is even more important in cases source tweets, which includes a set of replies discussing the where users tend to pass on false information more often source tweet, and (iii) collection of human annotations on than real facts, as occurred with Hurricane Sandy in 2012 the tweets sampled. We provide a definition of a rumour (Zubiaga and Ji 2014). Hence, identifying rumours within a which informs the annotation process. Our definition draws social media stream can be of great help for the development on definitions from different sources, including dictionaries of tools that prevent the spread of inaccurate information.
Plagiarism Detection in Polyphonic Music using Monaural Signal Separation
De, Soham, Roy, Indradyumna, Prabhakar, Tarunima, Suneja, Kriti, Chaudhuri, Sourish, Singh, Rita, Raj, Bhiksha
Most current approaches to plagiarism detection are based on musical similarity measures, which typically ignore the issue of polyphony in music. We present a novel feature space for audio derived from compositional modelling techniques, commonly used in signal separation, that provides a mechanism to account for polyphony without incurring an inordinate amount of computational overhead. We employ this feature representation in conjunction with traditional audio feature representations in a classification framework which uses an ensemble of distance features to characterize pairs of songs as being plagiarized or not. Our experiments on a database of about 3000 musical track pairs show that the new feature space characterization produces significant improvements over standard baselines.
Using NLP to measure democracy
This paper uses natural language processing to create the first machine-coded democracy index, which I call Automated Democracy Scores (ADS). The ADS are based on 42 million news articles from 6,043 different sources and cover all independent countries in the 1993-2012 period. Unlike the democracy indices we have today the ADS are replicable and have standard errors small enough to actually distinguish between cases. The ADS are produced with supervised learning. Three approaches are tried: a) a combination of Latent Semantic Analysis and tree-based regression methods; b) a combination of Latent Dirichlet Allocation and tree-based regression methods; and c) the Wordscores algorithm. The Wordscores algorithm outperforms the alternatives, so it is the one on which the ADS are based. There is a web application where anyone can change the training set and see how the results change: democracy-scores.org