Qatar Computing Research Institute
Reports of the Workshops Held at the 2018 International AAAI Conference on Web and Social Media
Editor, Managing (AAAI) | An, Jisun (Qatar Computing Research Institute) | Chunara, Rumi (New York University) | Crandall, David J. (Indiana University) | Frajberg, Darian (Politecnico di Milano) | French, Megan (Stanford University) | Jansen, Bernard J. (Qatar Computing Research Institute) | Kulshrestha, Juhi (GESIS - Leibniz Institute for the Social Sciences) | Mejova, Yelena (Qatar Computing Research Institute) | Romero, Daniel M. (University of Michigan) | Salminen, Joni (Qatar Computing Research Institute) | Sharma, Amit (Microsoft Research India) | Sheth, Amit (Wright State University) | Tan, Chenhao (University of Colorado Boulder) | Taylor, Samuel Hardman (Cornell University) | Wijeratne, Sanjaya (Wright State University)
The Workshop Program of the Association for the Advancement of Artificial Intelligenceโs 12th International Conference on Web and Social Media (AAAI-18) was held at Stanford University, Stanford, California USA, on Monday, June 25, 2018. There were fourteen workshops in the program: Algorithmic Personalization and News: Risks and Opportunities; Beyond Online Data: Tackling Challenging Social Science Questions; Bridging the Gaps: Social Media, Use and Well-Being; Chatbot; Data-Driven Personas and Human-Driven Analytics: Automating Customer Insights in the Era of Social Media;ย Designed Data for Bridging the Lab and the Field: Tools, Methods, and Challenges in Social Media Experiments; Emoji Understanding and Applications in Social Media; Event Analytics Using Social Media Data; Exploring Ethical Trade-Offs in Social Media Research; Making Sense of Online Data for Population Research; News and Public Opinion; Social Media and Health: A Focus on Methods for Linking Online and Offline Data; Social Web for Environmental and Ecological Monitoring and The ICWSM Science Slam. Workshops were held on the first day of the conference. Workshop participants met and discussed issues with a selected focus โ providing an informal setting for active exchange among researchers, developers, and users on topics of current interest. Organizers from nine of theย workshops submitted reports, which are reproduced in this report. Brief summaries of the other five workshops have been reproduced from their website descriptions.
Graph Based Semi-Supervised Learning with Convolution Neural Networks to Classify Crisis Related Tweets
Alam, Firoj (Qatar Computing Research Institute) | Joty, Shafiq (Nanyang Technological University) | Imran, Muhammad (Qatar Computing Research Institute)
During time-critical situations such as natural disasters, rapid classification of data posted on social networks by affected people is useful for humanitarian organizations to gain situ- ational awareness and to plan response efforts. However, the scarcity of labeled data in the early hours of a crisis hinders machine learning tasks thus delays crisis response. In this work, we propose to use an inductive semi-supervised tech- nique to utilize unlabeled data, which is often abundant at the onset of a crisis event, along with fewer labeled data. Specif- ically, we adopt a graph-based deep learning framework to learn an inductive semi-supervised model. We use two real- world crisis datasets from Twitter to evaluate the proposed approach. Our results show significant improvements using unlabeled data as compared to only using labeled data.
Assessing the Accuracy of Four Popular Face Recognition Tools for Inferring Gender, Age, and Race
Jung, Soon-gyo (Qatar Computing Research Institute) | An, Jisun (Qatar Computing Research Institute) | Kwak, Haewoon (Qatar Computing Research Institute) | Salminen, Joni (Qatar Computing Research Institute) | Jansen, Bernard Jim (Qatar Computing Research Institute)
In this research, we evaluate four widely used face detection tools, which are Face++, IBM Bluemix Visual Recognition, AWS Rekognition, and Microsoft Azure Face API, using multiple datasets to determine their accuracy in inferring user attributes, including gender, race, and age. Results show that the tools are generally proficient at determining gender, with accuracy rates greater than 90%, except for IBM Bluemix. Concerning race, only one of the four tools provides this capability, Face++, with an accuracy rate of greater than 90%, although the evaluation was performed on a high-quality dataset. Inferring age appears to be a challenging problem, as all four tools performed poorly. The findings of our quantitative evaluation are helpful for future computational social science research using these tools, as their accuracy needs to be taken into account when applied to classifying individuals on social media and other contexts. Triangulation and manual verification are suggested for researchers employing these tools.
SAGA: A Submodular Greedy Algorithm for Group Recommendation
Parambath, Shameem A. Puthiya (Qatar Computing Research Institute) | Vijayakumar, Nishant (Apptopia Inc.) | Chawla, Sanjay (Qatar Computing Research Institute)
In this paper, we propose a unified framework and an algorithm for the problem of group recommendation where a fixed number of items or alternatives can be recommended to a group of users. The problem of group recommendation arises naturally in many real world contexts, and is closely related to the budgeted social choice problem studied in economics. We frame the group recommendation problem as choosing a subgraph with the largest group consensus score in a completely connected graph defined over the item affinity matrix. We propose a fast greedy algorithm with strong theoretical guarantees, and show that the proposed algorithm compares favorably to the state-of-the-art group recommendation algorithms according to commonly used relevance and coverage performance measures on benchmark dataset.
Robust Classification of Crisis-Related Data on Social Networks Using Convolutional Neural Networks
Nguyen, Dat Tien (Qatar Computing Research Institute) | Mannai, Kamela Ali Al (Qatar Computing Research Institute) | Joty, Shafiq (Qatar Computing Research Institute) | Sajjad, Hassan (Qatar Computing Research Institute) | Imran, Muhammad (Qatar Computing Research Institute) | Mitra, Prasenjit (Pennsylvania State University)
The role of social media, in particular microblogging platforms such as Twitter, as a conduit for actionable and tactical information during disasters is increasingly acknowledged. However, time-critical analysis of big crisis data on social media streams brings challenges to machine learning techniques, especially the ones that use supervised learning. The scarcity of labeled data, particularly in the early hours of a crisis, delays the learning process. Existing classification methods require a significant amount of labeled data specific to a particular event for training plus a lot of feature engineering to achieve best results. In this work, we introduce neural network based classification methods for identifying useful tweets during a crisis situation. At the onset of a disaster when no labeled data is available, our proposed method makes the best use of the out-of-event data and achieves good results.
You Are What Apps You Use: Demographic Prediction Based on User's Apps
Malmi, Eric (Verto Analytics and Aalto University) | Weber, Ingmar (Qatar Computing Research Institute)
Understanding the demographics of app users is crucial, for example, for app developers, who wish to target their advertisements more effectively. Our work addresses this need by studying the predictability of user demographics based on the list of a user's apps which is readily available to many app developers. We extend previous work on the problem on three frontiers: (1) We predict new demographics (age, race, and income) and analyze the most informative apps for four demographic attributes included in our analysis. The most predictable attribute is gender (82.3 % accuracy), whereas the hardest to predict is income (60.3 % accuracy). (2) We compare several dimensionality reduction methods for high-dimensional app data, finding out that an unsupervised method yields superior results compared to aggregating the apps at the app category level, but the best results are obtained simply by the raw list of apps. (3) We look into the effect of the training set size and the number of apps on the predictability and show that both of these factors have a large impact on the prediction accuracy. The predictability increases, or in other words, a user's privacy decreases, the more apps the user has used, but somewhat surprisingly, after 100 apps, the prediction accuracy starts to decrease.
Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs
Kwak, Haewoon (Qatar Computing Research Institute) | An, Jisun (Qatar Computing Research Institute)
In this work, we analyze more than two million news photos published in January 2016. We demonstrate i) which objects appear the most in news photos; ii) what the sentiments of news photos are; iii) whether the sentiment of news photos is aligned with the tone of the text; iv) how gender is treated; and v) how differently political candidates are portrayed. To our best knowledge, this is the first large-scale study of news photo contents using deep learning-based vision APIs.